Skip to main content

overview

screenpipe is a Rust application that captures your screen and audio using an event-driven architecture, processes them locally, and stores everything in a SQLite database. instead of recording every second, it listens for meaningful OS events and captures only when something actually changes — pairing each screenshot with accessibility tree data for maximum quality at minimal cost.

data flow

crates

screenpipe is a Rust workspace with specialized crates:

layers

1. event-driven capture

screenpipe listens for meaningful OS events instead of polling at a fixed FPS. when an event fires, it captures a screenshot and walks the accessibility tree together — same timestamp, same frame.
triggerdescription
app switchuser switched to a different application
window focusa new window gained focus
click / scrolluser interacted with the UI
typing pauseuser stopped typing (debounced)
clipboard copycontent copied to clipboard
idle fallbackperiodic capture every ~5s when nothing is happening
whathowcrate
screenevent-triggered screenshot of the active monitorscreenpipe-vision
text extractionaccessibility tree walk (structured text: buttons, labels, fields)screenpipe-accessibility
OCR fallbackwhen accessibility data is empty (remote desktops, games, some Linux apps)screenpipe-vision
audiomultiple input/output devices in configurable chunks (default 30s)screenpipe-audio

2. processing

enginetypeplatformwhen used
accessibility treetext extractionmacOS, Windowsprimary — used for every capture
Apple VisionOCRmacOSfallback when accessibility is empty
Windows nativeOCRWindowsfallback when accessibility is empty
TesseractOCRLinuxprimary (accessibility support varies)
Whisperspeech-to-textlocal, all platformsaudio transcription
Deepgramspeech-to-textcloud APIoptional cloud audio
additional processing: speaker identification, PII redaction, frame deduplication (skips identical frames).

3. storage

all data stays local on your machine:
  • SQLite at ~/.screenpipe/db.sqlite — metadata, accessibility text, OCR text, transcriptions, speakers, tags, UI elements
  • media at ~/.screenpipe/data/ — JPEG screenshots (event-driven frames), audio chunks

4. API

REST API on localhost:3030:
endpointdescription
/searchfiltered content retrieval (OCR, audio, accessibility)
/search/keywordkeyword search with text positions
/elementslightweight UI element search (accessibility tree data)
/frames/{id}access captured frames
/frames/{id}/contextaccessibility text + URLs + OCR fallback for a frame
/healthsystem status and metrics
/raw_sqldirect database queries
/ai/chat/completionsApple Intelligence (macOS 26+)
see API reference for the full endpoint list.

5. pipes

pipes are AI agents (.md prompt files) that run on your screen data. they’re executed by an AI agent that reads the prompt, queries the screenpipe API, and takes action. pipes live in ~/.screenpipe/pipes/{name}/ and run on cron-like schedules.

6. desktop app

the desktop app is built with Tauri (Rust backend) + Next.js (React frontend):

database schema

key tables:
tablestores
framescaptured screen frame metadata (includes snapshot_path, accessibility_text, capture_trigger)
ocr_textOCR fallback text extracted from frames
elementsUI elements from accessibility tree (buttons, labels, text fields) with FTS5 search
audio_chunksaudio recording metadata
audio_transcriptionstext from audio
speakersidentified speakers
ui_eventskeyboard, mouse, clipboard events
tagsuser-applied tags on content
inspect directly:
sqlite3 ~/.screenpipe/db.sqlite .schema

resource usage

runs 24/7 on a MacBook Pro M3 (32 GB) or a $400 Windows laptop:
metrictypical value
RAM~600 MB
CPU~5-10%
storage~5-10 GB/month (event-driven capture only stores frames when something changes)

source code