architecture overview
screenpipe's architecture handles continuous screen and audio capture, local data storage, and real-time processing. here's a breakdown of the key components:
diagram overview
- input: screen and audio data
- processing: ocr, stt, transcription, multimodal integration
- storage: sqlite database
- plugins: custom pipes
- integrations: ollama, deepgram, notion, whatsapp, etc.
this modular architecture makes screenpipe adaptable to various use cases, from personal productivity tracking to advanced business intelligence.
status
Alpha: runs on my computer Macbook pro m3 32 GB ram
and a $400 Windows laptop, 24/7.
Uses 600 MB, 10% CPU.
- Integrations
- ollama
- openai
- Friend wearable
- Fileorganizer2000 (opens in a new tab)
- mem0
- Brilliant Frames
- Vercel AI SDK
- supermemory
- deepgram
- unstructured
- excalidraw
- Obsidian
- Apple shortcut
- multion
- iPhone
- Android
- Camera
- Keyboard
- Browser
- Pipe Store (a list of "pipes" you can build, share & easily install to get more value out of your screen & mic data without effort). It runs in Deno Typescript engine within screenpipe on your computer
- screenshots + OCR with different engines to optimise privacy, quality, or energy consumption
- tesseract
- Windows native OCR
- Apple native OCR
- unstructured.io
- screenpipe screen/audio specialised LLM
- audio + STT (works with multi input devices, like your iPhone + mac mic, many STT engines)
- Linux, MacOS, Windows input & output devices
- iPhone microphone
- remote capture (opens in a new tab) (run screenpipe on your cloud and it capture your local machine, only tested on Linux) for example when you have low compute laptop
- optimised screen & audio recording (mp4 encoding, estimating 30 gb/m with default settings)
- sqlite local db
- local api
- Cross platform CLI, desktop app (opens in a new tab) (MacOS, Windows, Linux)
- Metal, CUDA
- TS SDK
- multimodal embeddings
- cloud storage options (s3, pgsql, etc.)
- cloud computing options (deepgram for audio, unstructured for OCR)
- custom storage settings: customizable capture settings (fps, resolution)
- security
- window specific capture (e.g. can decide to only capture specific tab of cursor, chrome, obsidian, or only specific app)
- encryption
- PII removal
- fast, optimised, energy-efficient modes
- webhooks/events (for automations)
- abstractions for multiplayer usage (e.g. aggregate sales team data, company team data, partner, etc.)