setup
1. install Ollama & pull a model
ministral-3 is a good starting point (fast, works on most machines).
2. select Ollama in screenpipe
- open the screenpipe app
- click the AI preset selector (top of the chat/timeline)
- click Ollama
- pick your model from the dropdown (screenpipe auto-detects pulled models)
- start chatting
localhost:11434 automatically.
recommended models
| model | size | best for |
|---|---|---|
ministral-3 | ~2 GB | fast, general use, recommended starting point |
gemma3:4b | ~3 GB | strong quality for size, good for summaries |
qwen3:4b | ~3 GB | multilingual, good reasoning |
deepseek-r1:8b | ~5 GB | strong reasoning, needs 16 GB+ RAM |
requirements
- Ollama installed and running
- at least one model pulled
- screenpipe running
custom OpenAI-compatible endpoints
if you’re running a custom LLM server (Qwen, vLLM, Text Generation WebUI, etc.), screenpipe auto-detects the endpoint format:- first tries OpenAI-compatible format:
GET {endpoint}/v1/models - falls back to Ollama format:
GET {endpoint}/api/tags
- check what path your server uses for model listing (
/models,/v1/list, etc.) - if unsure, test with curl first:
curl {your-endpoint}/path-to-models - join our Discord — we can help troubleshoot custom setups
http://localhost:5000 with OpenAI-compatible API should work automatically. if screenpipe can’t find models, verify the server responds to: curl http://localhost:5000/v1/models
troubleshooting
“ollama not detected”- make sure Ollama is running:
ollama serve - check it’s responding:
curl http://localhost:11434/api/tags
- pull it first:
ollama pull ministral-3 - you can also type the model name manually in the input field
- try a smaller model (
ministral-3) - close other GPU-heavy apps
- ensure you have enough free RAM (model size + ~2 GB overhead)
troubleshooting Azure & custom OpenAI endpoints
Error: “unsupported tool use” or “does not support more than one tool call”
screenpipe sends multiple tool calls to the LLM for agentic features. some models (especially older Azure-hosted models like Phi-4, older Llama versions) don’t support this. fixes:- use a model that supports tool use:
gpt-4,gpt-4-turbo,claude-3-5-sonnet,gpt-oss-120b - or disable agentic features in your pipe prompts (remove tool calls, just ask for text summaries)
- on Azure, try switching to the latest model version available
Error: “max tokens is not supported”
your endpoint doesn’t recognize themax_tokens parameter that screenpipe sends.
fixes:
- verify your endpoint supports OpenAI-compatible API:
curl -H "Authorization: Bearer YOUR_KEY" https://your-endpoint/v1/models - if using Azure, ensure you’re using the OpenAI-compatible endpoint format (not the old REST API format)
- try a custom endpoint URL wrapper if your server needs parameter translation
API key not being passed to screenpipe API
if screenpipe says “unauthorized” when accessing the local API, but your custom LLM endpoint is configured: cause: screenpipe CLI doesn’t automatically share API credentials with the local REST API server. fix: configure your pipe or app to use the API key explicitly:Custom endpoint not responding / models not detected
screenpipe tries both OpenAI and Ollama formats. if neither works:-
test your endpoint manually:
(one should return a model list; if neither does, your server may use a different path)
-
check authorization:
- verify TLS/SSL: if using https, ensure your certificate is valid (self-signed certs need special config)
-
common endpoint paths:
- OpenAI-compatible:
/v1/models,/v1/chat/completions - Ollama-compatible:
/api/tags,/api/generate - vLLM:
/v1/models(OpenAI-compatible) - Text Generation WebUI:
/api/v1/models(may vary)
- OpenAI-compatible: