docs
api reference

api reference for screenpipe

below is the detailed api reference for screenpipe's core functionality.

search api

  • endpoint: /search
  • method: get
  • description: searches captured data (ocr, audio transcriptions, etc.) stored in screenpipe's local database.

query parameters:

  • q (string, optional): search term (a SINGLE word)
  • content_type (enum): type of content to search:
    • ocr: optical character recognition text
    • audio: audio transcriptions
    • ui: user interface elements
  • limit (int): max results per page (default: 20)
  • offset (int): pagination offset
  • start_time (timestamp, optional): filter by start timestamp
  • end_time (timestamp, optional): filter by end timestamp
  • app_name (string, optional): filter by application name
  • window_name (string, optional): filter by window name
  • include_frames (bool, optional): include base64 encoded frames
  • min_length (int, optional): minimum content length
  • max_length (int, optional): maximum content length
  • speaker_ids (int[], optional): filter by specific speaker ids

sample requests:

# Basic search
curl "http://localhost:3030/search?q=meeting&content_type=ocr&limit=10"
 
# Audio search with speaker filter
curl "http://localhost:3030/search?content_type=audio&speaker_ids=1,2"
 
# UI elements search
curl "http://localhost:3030/search?content_type=ui&app_name=chrome"

sample response:

{
  "data": [
    {
      "type": "OCR",
      "content": {
        "frame_id": 123,
        "text": "meeting notes",
        "timestamp": "2024-03-10T12:00:00Z",
        "file_path": "/frames/frame123.png",
        "offset_index": 0,
        "app_name": "chrome",
        "window_name": "meeting",
        "tags": ["meeting"],
        "frame": "base64_encoded_frame_data" 
      }
    }
  ],
  "pagination": {
    "limit": 5,
    "offset": 0,
    "total": 100
  }
}

audio devices api

  • endpoint: /audio/list
  • method: get
  • description: lists available audio input/output devices

sample response:

[
  {
    "name": "built-in microphone",
    "is_default": true
  }
]

monitors api

  • endpoint: /vision/list
  • method: post
  • description: lists available monitors/displays

sample response:

[
  {
    "id": 1,
    "name": "built-in display",
    "width": 2560,
    "height": 1600,
    "is_default": true
  }
]

tags api

  • endpoint: /tags/:content_type/:id
  • methods: post (add), delete (remove)
  • description: manage tags for content items
  • content_type: vision or audio

add tags request:

{
  "tags": ["important", "meeting"]
}

sample response:

{
  "success": true
}

pipes api

list pipes

  • endpoint: /pipes/list
  • method: get

download pipe

  • endpoint: /pipes/download
  • method: post
{
  "url": "https://github.com/user/repo/pipe-example"
}

enable pipe

  • endpoint: /pipes/enable
  • method: post
{
  "pipe_id": "pipe-example"
}

disable pipe

  • endpoint: /pipes/disable
  • method: post
{
  "pipe_id": "pipe-example" 
}

update pipe config

  • endpoint: /pipes/update
  • method: post
{
  "pipe_id": "pipe-example",
  "config": {
    "key": "value"
  }
}

speakers api

list unnamed speakers

  • endpoint: /speakers/unnamed
  • method: get
  • description: get list of speakers without names assigned
query parameters:
  • limit (int): max results
  • offset (int): pagination offset
  • speaker_ids (int[], optional): filter specific speaker ids
sample request:
curl "http://localhost:3030/speakers/unnamed?limit=10&offset=0"

search speakers

  • endpoint: /speakers/search
  • method: get
  • description: search speakers by name
query parameters:
  • name (string, optional): name prefix to search for
sample request:
curl "http://localhost:3030/speakers/search?name=john"

update speaker

  • endpoint: /speakers/update
  • method: post
  • description: update speaker name or metadata
request body:
{
  "id": 123,
  "name": "john doe",
  "metadata": "{\"role\": \"engineer\"}"
}

delete speaker

  • endpoint: /speakers/delete
  • method: post
  • description: delete a speaker and associated audio chunks
request body:
{
  "id": 123
}

get similar speakers

  • endpoint: /speakers/similar
  • method: get
  • description: find speakers with similar voice patterns
query parameters:
  • speaker_id (int): reference speaker id
  • limit (int): max results
sample request:
curl "http://localhost:3030/speakers/similar?speaker_id=123&limit=5"

merge speakers

  • endpoint: /speakers/merge
  • method: post
  • description: merge two speakers into one
request body:
{
  "speaker_to_keep_id": 123,
  "speaker_to_merge_id": 456
}

mark as hallucination

  • endpoint: /speakers/hallucination
  • method: post
  • description: mark a speaker as incorrectly identified
request body:
{
  "speaker_id": 123
}

health api

  • endpoint: /health
  • method: get
  • description: system health status

sample response:

{
  "status": "healthy",
  "last_frame_timestamp": "2024-03-10T12:00:00Z", 
  "last_audio_timestamp": "2024-03-10T12:00:00Z",
  "last_ui_timestamp": "2024-03-10T12:00:00Z",
  "frame_status": "ok",
  "audio_status": "ok",
  "ui_status": "ok",
  "message": "all systems functioning normally"
}

stream frames api

  • endpoint: /stream/frames
  • method: get
  • description: stream frames as server-sent events (sse)

query parameters:

  • start_time (timestamp): start time for frame stream
  • end_time (timestamp): end time for frame stream

sample request:

curl "http://localhost:3030/stream/frames?start_time=2024-03-10T12:00:00Z&end_time=2024-03-10T13:00:00Z"

sample event data:

{
  "timestamp": "2024-03-10T12:00:00Z",
  "devices": [
    {
      "device_id": "screen-1",
      "frame": "base64_encoded_frame_data"
    }
  ]
}

experimental api

merge frames

  • endpoint: /experimental/frames/merge
  • method: post
  • description: merges multiple video frames into a single video
request body:
{
  "video_paths": ["path/to/video1.mp4", "path/to/video2.mp4"]
}
sample response:
{
  "video_path": "/path/to/merged/video.mp4"
}

validate media

  • endpoint: /experimental/validate/media
  • method: get
  • description: validates media file format and integrity
query parameters:
  • file_path (string): path to media file to validate
sample response:
{
  "status": "valid media file"
}

input control (experimental feature)

  • endpoint: /experimental/input_control
  • method: post
  • description: control keyboard and mouse input programmatically
request body:
{
  "action": {
    "type": "KeyPress",
    "data": "enter"
  }
}

or

{
  "action": {
    "type": "MouseMove",
    "data": {
      "x": 100,
      "y": 200
    }
  }
}

or

{
  "action": {
    "type": "MouseClick",
    "data": "left"
  }
}

or

{
  "action": {
    "type": "WriteText",
    "data": "hello world"
  }
}

database api

execute raw sql

  • endpoint: /raw_sql
  • method: post
  • description: execute raw SQL queries against the database (use with caution)
request body:
{
  "query": "SELECT * FROM frames LIMIT 5"
}

add content

  • endpoint: /add
  • method: post
  • description: add new content (frames or transcriptions) to the database
request body:
{
  "device_name": "device1",
  "content": {
    "content_type": "frames",
    "data": {
      "frames": [
        {
          "file_path": "/path/to/frame.png",
          "timestamp": "2024-03-10T12:00:00Z",
          "app_name": "chrome",
          "window_name": "meeting",
          "ocr_results": [
            {
              "text": "detected text",
              "text_json": "{\"additional\": \"metadata\"}",
              "ocr_engine": "tesseract",
              "focused": true
            }
          ],
          "tags": ["meeting", "important"]
        }
      ]
    }
  }
}

or

{
  "device_name": "microphone1",
  "content": {
    "content_type": "transcription",
    "data": {
      "transcription": "transcribed text",
      "transcription_engine": "whisper"
    }
  }
}

realtime streaming api

transcription stream

  • endpoint: /sse/transcriptions
  • method: get
  • description: stream real-time transcriptions using server-sent events (SSE)
sample event data:
{
  "transcription": "live transcribed text",
  "timestamp": "2024-03-10T12:00:00Z",
  "device": "microphone1"
}

vision stream

  • endpoint: /sse/vision
  • method: get
  • description: stream real-time vision events using server-sent events (SSE)
query parameters:
  • images (bool, optional): include base64 encoded images in events
sample event data:
{
  "type": "Ocr",
  "text": "detected text",
  "timestamp": "2024-03-10T12:00:00Z",
  "image": "base64_encoded_image_data",
  "app_name": "chrome",
  "window_name": "meeting"
}