Architecture

screenpipe’s architecture handles continuous screen and audio capture, local data storage, and real-time processing. here’s a breakdown of the key components:

conceptual overview

at its core, screenpipe acts as a bridge between your digital activities and AI systems, creating a memory layer that provides context for intelligent applications. here’s how to think about it:

capturing layer

screen recording: captures visual content at configurable frame rates
audio recording: captures spoken content from multiple sources
ui monitoring: (experimental) captures accessibility metadata about UI elements

processing layer

ocr engines: extract text from screen recordings (apple native, windows native, tesseract, unstructured)
stt engines: convert audio to text (whisper, deepgram)
speaker identification: identifies and labels different speakers
pii removal: optionally redacts sensitive information

storage layer

sqlite database: stores metadata, text, and references to media
media files: stores the actual mp4/mp3 recordings
embeddings: (coming soon) vector representations for semantic search

retrieval layer

search api: filtered content retrieval for applications
streaming apis: real-time access to new content
memory apis: structured access to historical context

extension layer (pipes)

pipes ecosystem: extensible plugins for building applications
pipe sdk: typescript interface for building custom pipes
pipe runtime: sandboxed execution environment for pipes

diagram overview

input: screen and audio data
processing: ocr, stt, transcription, multimodal integration
storage: sqlite database
plugins: custom pipes
integrations: ollama, deepgram, notion, whatsapp, etc.

this modular architecture makes screenpipe adaptable to various use cases, from personal productivity tracking to advanced business intelligence.

data flow & lifecycle

here’s the typical data flow through the screenpipe system:

capture
- screen is captured at the configured fps (default 1.0, or 0.5 on macos)
- audio is captured in chunks (default 30 seconds)
- ui events are optionally captured (macos only currently)
processing
- captured frames are processed through OCR to extract text
- audio chunks are processed through STT to generate transcriptions
- speaker identification is applied to audio transcriptions
storage
- processed data is stored in the local sqlite database
- raw media files are stored in the configured data directory
- metadata is indexed for efficient retrieval
retrieval
- applications query the database through the REST API
- real-time data can be streamed through SSE endpoints
- pipes can access data through the typescript SDK
extension
- pipes process the data to create higher-level abstractions
- pipes can integrate with external services (LLMs, etc.)
- pipes can control the system through the input API

data abstraction layers

screenpipe organizes data in concentric layers of abstraction, from raw data to high-level intelligence:

core (mp4 files): the innermost layer contains the raw screen recordings and audio captures in mp4 format
processing layer: contains the direct processing outputs
- OCR embeddings: vectorized text extracted from screen
- human id: anonymized user identification
- accessibility: metadata for improved data access
- transcripts: processed audio-to-text
AI memories: the outermost layer represents the highest level of abstraction where AI processes and synthesizes all lower-level data into meaningful insights
pipes enrich: custom processing modules that can interact with and enhance data at any layer

this layered approach enables both granular access to raw data and sophisticated AI-powered insights while maintaining data privacy and efficiency.

session and state management

screenpipe maintains several types of state:

session state
- managed by the core screenpipe server
- controls recording status, device selection, etc.
- accessible through the health API endpoint
configuration state
- stored in the settings database
- controls behavior of the core system
- accessible through the settings API
pipe state
- each pipe maintains its own state
- stored in the pipe’s local storage or in screenpipe’s settings
- isolated from other pipes for security

understanding the different state models is important for building robust applications. The health API (/health) is particularly useful for checking the system’s current state and ensuring services are running correctly.

database schema

screenpipe uses a SQLite database with the following main tables:

frames: stores metadata about captured screen frames
ocr_results: stores text extracted from frames
audio_chunks: stores metadata about audio recordings
transcriptions: stores text transcribed from audio
speakers: stores identified speakers and their metadata
ui_elements: stores UI elements captured from the screen
settings: stores application configuration
pipes: stores installed pipes and their configuration

detailed schema information is available by querying the database directly:

sqlite3 ~/.screenpipe/db.sqlite .schema

integration patterns

developers typically interact with screenpipe in one of these patterns:

retrieval pattern: query for relevant context based on the current task

const context = await pipe.queryScreenpipe({
  q: "meeting notes",
  contentType: "all",
  limit: 10
})

streaming pattern: process events as they occur

for await (const event of pipe.streamVision()) {
  // Process each new screen event
}

augmentation pattern: enhance user experience with context

// When user asks about a recent meeting
const meetingContext = await pipe.queryScreenpipe({
  q: "meeting",
  contentType: "audio"
})

// Use context to generate response
const response = await generateResponse(userQuery, meetingContext)

automation pattern: take actions based on context

// Monitor for specific content
for await (const event of pipe.streamVision()) {
  if (event.data.text.includes("meeting starting")) {
    // Take action like sending notification
  }
}

understanding these patterns will help you design effective applications that leverage screenpipe’s capabilities.

status

Alpha: runs on my computer Macbook pro m3 32 GB ram and a $400 Windows laptop, 24/7. Uses 600 MB, 10% CPU.

Integrations
- ollama
- openai
- Friend wearable
- Fileorganizer2000
- mem0
- Brilliant Frames
- Vercel AI SDK
- supermemory
- deepgram
- unstructured
- excalidraw
- Obsidian
- Apple shortcut
- multion
- iPhone
- Android
- Camera
- Keyboard
- Browser
- Pipe Store (a list of “pipes” you can build, share & easily install to get more value out of your screen & mic data without effort). It runs in Bun Typescript engine within screenpipe on your computer
screenshots + OCR with different engines to optimise privacy, quality, or energy consumption
- tesseract
- Windows native OCR
- Apple native OCR
- unstructured.io
- screenpipe screen/audio specialised LLM
audio + STT (works with multi input devices, like your iPhone + mac mic, many STT engines)
- Linux, MacOS, Windows input & output devices
- iPhone microphone
remote capture (run screenpipe on your cloud and it capture your local machine, only tested on Linux) for example when you have low compute laptop
optimised screen & audio recording (mp4 encoding, estimating 30 gb/m with default settings)
sqlite local db
local api
Cross platform CLI, desktop app (MacOS, Windows, Linux)
Metal, CUDA
TS SDK
multimodal embeddings
cloud storage options (s3, pgsql, etc.)
cloud computing options (deepgram for audio, unstructured for OCR)
custom storage settings: customizable capture settings (fps, resolution)
security
- window specific capture (e.g. can decide to only capture specific tab of cursor, chrome, obsidian, or only specific app)
- encryption
- PII removal
fast, optimised, energy-efficient modes
webhooks/events (for automations)
abstractions for multiplayer usage (e.g. aggregate sales team data, company team data, partner, etc.)

LLM links

paste these links into your Cursor chat for context:

Getting Started

Reference

Extend

Help

conceptual overview

capturing layer

processing layer

storage layer

retrieval layer

extension layer (pipes)

diagram overview

data flow & lifecycle

data abstraction layers

session and state management

database schema

integration patterns

status

LLM links

Getting Started

Reference

Extend

Help

​conceptual overview

​capturing layer

​processing layer

​storage layer

​retrieval layer

​extension layer (pipes)

​diagram overview

​data flow & lifecycle

​data abstraction layers

​session and state management

​database schema

​integration patterns

​status

​LLM links

conceptual overview

capturing layer

processing layer

storage layer

retrieval layer

extension layer (pipes)

diagram overview

data flow & lifecycle

data abstraction layers

session and state management

database schema

integration patterns

status

LLM links