docs
architecture overview

architecture overview

screenpipe's architecture handles continuous screen and audio capture, local data storage, and real-time processing. here's a breakdown of the key components:

diagram overview

screenpipe diagram

  1. input: screen and audio data
  2. processing: ocr, stt, transcription, multimodal integration
  3. storage: sqlite database
  4. plugins: custom pipes
  5. integrations: ollama, deepgram, notion, whatsapp, etc.

this modular architecture makes screenpipe adaptable to various use cases, from personal productivity tracking to advanced business intelligence.

status

Alpha: runs on my computer Macbook pro m3 32 GB ram and a $400 Windows laptop, 24/7.

Uses 600 MB, 10% CPU.

  • Integrations
    • ollama
    • openai
    • Friend wearable
    • Fileorganizer2000 (opens in a new tab)
    • mem0
    • Brilliant Frames
    • Vercel AI SDK
    • supermemory
    • deepgram
    • unstructured
    • excalidraw
    • Obsidian
    • Apple shortcut
    • multion
    • iPhone
    • Android
    • Camera
    • Keyboard
    • Browser
    • Pipe Store (a list of "pipes" you can build, share & easily install to get more value out of your screen & mic data without effort). It runs in Deno Typescript engine within screenpipe on your computer
  • screenshots + OCR with different engines to optimise privacy, quality, or energy consumption
    • tesseract
    • Windows native OCR
    • Apple native OCR
    • unstructured.io
    • screenpipe screen/audio specialised LLM
  • audio + STT (works with multi input devices, like your iPhone + mac mic, many STT engines)
    • Linux, MacOS, Windows input & output devices
    • iPhone microphone
  • remote capture (opens in a new tab) (run screenpipe on your cloud and it capture your local machine, only tested on Linux) for example when you have low compute laptop
  • optimised screen & audio recording (mp4 encoding, estimating 30 gb/m with default settings)
  • sqlite local db
  • local api
  • Cross platform CLI, desktop app (opens in a new tab) (MacOS, Windows, Linux)
  • Metal, CUDA
  • TS SDK
  • multimodal embeddings
  • cloud storage options (s3, pgsql, etc.)
  • cloud computing options (deepgram for audio, unstructured for OCR)
  • custom storage settings: customizable capture settings (fps, resolution)
  • security
    • window specific capture (e.g. can decide to only capture specific tab of cursor, chrome, obsidian, or only specific app)
    • encryption
    • PII removal
  • fast, optimised, energy-efficient modes
  • webhooks/events (for automations)
  • abstractions for multiplayer usage (e.g. aggregate sales team data, company team data, partner, etc.)