Skip to main content
screenpipe automatically transcribes all audio from your meetings, calls, and conversations. everything runs locally using Whisper.

setup

audio recording is enabled by default in the desktop app. configure audio devices and transcription engine in settings.
  • audio devices: select which microphones and system audio to capture
  • transcription engine: choose between local Whisper (private) or Deepgram (faster, cloud)

search transcriptions

# find discussions about a topic
curl "http://localhost:3030/search?q=budget+review&content_type=audio&limit=10"

# get today's meetings
curl "http://localhost:3030/search?content_type=audio&start_time=2026-02-11T00:00:00Z"

# filter by speaker
curl "http://localhost:3030/search?content_type=audio&speaker_ids=1,2"
curl "http://localhost:3030/search?content_type=audio&speaker_name=John"

speaker identification

screenpipe automatically identifies different speakers. manage them via API:
# get unnamed speakers for labeling
curl "http://localhost:3030/speakers/unnamed?limit=10"

# update a speaker's name
curl -X POST http://localhost:3030/speakers/update \
  -H "Content-Type: application/json" \
  -d '{"id": 1, "name": "John Smith"}'

# search speakers by name
curl "http://localhost:3030/speakers/search?name=john"

# merge duplicate speakers
curl -X POST http://localhost:3030/speakers/merge \
  -H "Content-Type: application/json" \
  -d '{"speaker_to_keep_id": 1, "speaker_to_merge_id": 2}'

# find similar speakers
curl "http://localhost:3030/speakers/similar?speaker_id=1"

tips

  • use a good microphone
  • reduce background noise
  • whisper-large-v3-turbo gives best accuracy
  • set language to English in settings if you only speak English (faster)

long meetings and batch sizing

by default, screenpipe batches audio for transcription in chunks:
  • Whisper/OpenAI: 600 seconds (10 minutes)
  • Deepgram: up to 5000 seconds (83 minutes)
if you notice meetings longer than one hour losing context between batches, you can customize the batch size in settings > advanced > batch_max_duration_secs. set to your meeting’s typical duration to preserve context across the entire recording. in smart/batch transcription mode, large meetings may be split across multiple transcription jobs. if you need full meeting context in a single batch, consider:
  • switching to realtime transcription (transcription happens immediately as audio is captured, trading cost/latency for guaranteed continuity)
  • increasing batch_max_duration_secs to match your meeting length (supported up to engine limits: 5000s for Deepgram, 3000s for OpenAI)
  • using retranscription API to re-process a full meeting with custom settings

privacy

  • all transcription runs locally on your device
  • audio files stored in ~/.screenpipe/data/
  • no audio sent to cloud unless you choose deepgram
  • disable audio recording in app settings
questions? join our discord.