screenpipe’s architecture handles continuous screen and audio capture, local data storage, and real-time processing. here’s a breakdown of the key components:
at its core, screenpipe acts as a bridge between your digital activities and AI systems, creating a memory layer that provides context for intelligent applications. here’s how to think about it:
screenpipe organizes data in concentric layers of abstraction, from raw data to high-level intelligence:
core (mp4 files): the innermost layer contains the raw screen recordings and audio captures in mp4 format
processing layer: contains the direct processing outputs
OCR embeddings: vectorized text extracted from screen
human id: anonymized user identification
accessibility: metadata for improved data access
transcripts: processed audio-to-text
AI memories: the outermost layer represents the highest level of abstraction where AI processes and synthesizes all lower-level data into meaningful insights
pipes enrich: custom processing modules that can interact with and enhance data at any layer
this layered approach enables both granular access to raw data and sophisticated AI-powered insights while maintaining data privacy and efficiency.
stored in the pipe’s local storage or in screenpipe’s settings
isolated from other pipes for security
understanding the different state models is important for building robust applications. The health API (/health) is particularly useful for checking the system’s current state and ensuring services are running correctly.
for await (const event of pipe.streamVision()) { // Process each new screen event}
augmentation pattern: enhance user experience with context
Copy
Ask AI
// When user asks about a recent meetingconst meetingContext = await pipe.queryScreenpipe({ q: "meeting", contentType: "audio"})// Use context to generate responseconst response = await generateResponse(userQuery, meetingContext)
automation pattern: take actions based on context
Copy
Ask AI
// Monitor for specific contentfor await (const event of pipe.streamVision()) { if (event.data.text.includes("meeting starting")) { // Take action like sending notification }}
understanding these patterns will help you design effective applications that leverage screenpipe’s capabilities.
Pipe Store (a list of “pipes” you can build, share & easily install to get more value out of your screen & mic data without effort). It runs in Bun Typescript engine within screenpipe on your computer
screenshots + OCR with different engines to optimise privacy, quality, or energy consumption
tesseract
Windows native OCR
Apple native OCR
unstructured.io
screenpipe screen/audio specialised LLM
audio + STT (works with multi input devices, like your iPhone + mac mic, many STT engines)
Linux, MacOS, Windows input & output devices
iPhone microphone
remote capture (run screenpipe on your cloud and it capture your local machine, only tested on Linux) for example when you have low compute laptop