Operator
the operator api allows for powerful desktop automation using accessibility roles and elements. it provides a robust way to interact with applications programmatically.
compatibility note:
some functions of the operator api currently works on macos only. on windows/linux, please use the pixel api
instead.
to understand roles better, open the macos accessibility inspector and examine the roles for any application.
feel free to use our docs as context in cursor agent through MCP
installation
this also works in node.js using @screenpipe/js
basic usage
core methods
the operator api provides a set of intuitive methods for automating desktop interactions:
method | description | compatibility | example |
---|---|---|---|
openApplication(name) | launches an application | macOS only | pipe.operator.openApplication("Chrome") |
openUrl(url, browser?) | opens a url in a browser | macOS only | pipe.operator.openUrl("github.com") |
getByRole(role, options) | finds elements by accessibility role | macOS only | pipe.operator.getByRole("button", {app: "Chrome"}) |
getById(id, options) | gets element by id | macOS only | pipe.operator.getById("element-123", {app: "Chrome"}) |
.click() | clicks an element | macOS only | pipe.operator.getById(id).click() |
.fill(text) | enters text in a field | macOS only | pipe.operator.getById(id).fill("hello") |
.scroll(direction, amount) | scrolls an element | macOS only | pipe.operator.getById(id).scroll("down", 300) |
pixel.type(text) | types text | all platforms | pipe.operator.pixel.type("hello world") |
pixel.press(key) | presses a keyboard key | all platforms | pipe.operator.pixel.press("enter") |
pixel.moveMouse(x, y) | moves mouse cursor to position | all platforms | pipe.operator.pixel.moveMouse(100, 200) |
pixel.click(button) | clicks mouse button | all platforms | pipe.operator.pixel.click("left") |
pixel api
pixel API is a higher level API that is useful for:
- controlling your iPhone through iPhone mirroring (because you cannot parse the screen of your iPhone)
- Windows and Linux which does not support yet the functions like
openApplication
,getByRole
,click
, etc.
common accessibility roles
to understand better roles, feel free to open MacOS Accessibility Inspector and see the roles for any application.
when using getByRole()
, you’ll need to specify the accessibility role. here are common ones:
"button"
- clickable buttons"textfield"
- text input fields"searchfield"
- search input fields"checkbox"
- checkbox elements"radiobutton"
- radio button elements"combobox"
- dropdown menus"link"
- hyperlinks"image"
- images"statictext"
- text labels"scrollarea"
- scrollable containers
advanced usage examples
automating form filling
automating app workflows
ai-powered automation
for more powerful automation, combine the operator api with vercel ai sdk to enable ai-driven desktop interactions:
for a complete implementation with automatic tool selection, see the hello-world-computer-use example pipe.
troubleshooting
if you’re having issues with the operator api:
- macos permissions: ensure screenpipe has accessibility permissions in system settings > privacy & security > accessibility
- app names: use exact app names as they appear in the applications folder
- timing issues: add delays between operations, as ui elements may take time to load
- debugging: log element ids and roles to help identify the right elements
- app focus: use the
activateApp: true
option to ensure the target app is in focus
for more detailed debugging, use the macos accessibility inspector to identify exact roles and properties of ui elements.
practical use cases
here are some real-world applications for the operator api:
-
messaging automation
- scrape whatsapp conversations and export them to spreadsheets
- auto-respond to common imessage inquiries when you’re busy
- batch message linkedin connections with personalized outreach
- track response rates across different messaging platforms
-
social media management
- schedule and post content across multiple platforms
- collect engagement metrics from twitter, instagram, or linkedin
- automate following/unfollowing based on specific criteria
- export comments and replies for sentiment analysis
-
data collection and research
- extract data from websites that don’t have accessible apis
- compile information across multiple applications into a single report
- monitor prices or availability of products across different sites
- build comprehensive research databases from scattered sources
-
personal productivity
- automate repetitive daily tasks (checking emails, organizing files)
- create custom workflows between applications that don’t normally integrate
- set up intelligent reminders based on content of messages or emails
- auto-fill forms with personal or business information
-
customer relationship management
- track conversations across multiple platforms for each contact
- automatically update crm systems with new interaction data
- generate follow-up reminders based on conversation content
- build comprehensive customer profiles from scattered data sources
-
content creation and editing
- automate screenshots or recordings of specific application states
- batch process images or documents using desktop applications
- extract text from images or pdfs for further processing
- organize and tag media files based on their content
these automation ideas become even more powerful when combined with ai for intelligent decision-making based on the content being processed.
examples:
Was this page helpful?