compatibility note:some functions of the operator api currently works on macos only. on windows/linux, please use the pixel apiwindows in progressinstead.
installation
@screenpipe/js
basic usage
core methods
the operator api provides a set of intuitive methods for automating desktop interactions:method | description | compatibility | example |
---|---|---|---|
openApplication(name) | launches an application | macOS only | pipe.operator.openApplication("Chrome") |
openUrl(url, browser?) | opens a url in a browser | macOS only | pipe.operator.openUrl("github.com") |
getByRole(role, options) | finds elements by accessibility role | macOS only | pipe.operator.getByRole("button", {app: "Chrome"}) |
getById(id, options) | gets element by id | macOS only | pipe.operator.getById("element-123", {app: "Chrome"}) |
.click() | clicks an element | macOS only | pipe.operator.getById(id).click() |
.fill(text) | enters text in a field | macOS only | pipe.operator.getById(id).fill("hello") |
.scroll(direction, amount) | scrolls an element | macOS only | pipe.operator.getById(id).scroll("down", 300) |
pixel.type(text) | types text | all platforms | pipe.operator.pixel.type("hello world") |
pixel.press(key) | presses a keyboard key | all platforms | pipe.operator.pixel.press("enter") |
pixel.moveMouse(x, y) | moves mouse cursor to position | all platforms | pipe.operator.pixel.moveMouse(100, 200) |
pixel.click(button) | clicks mouse button | all platforms | pipe.operator.pixel.click("left") |
pixel api
pixel API is a higher level API that is useful for:- controlling your iPhone through iPhone mirroring (because you cannot parse the screen of your iPhone)
- Windows and Linux which does not support yet the functions like
openApplication
,getByRole
,click
, etc.
common accessibility roles
to understand better roles, feel free to open MacOS Accessibility Inspector and see the roles for any application. when usinggetByRole()
, you’ll need to specify the accessibility role. here are common ones:
"button"
- clickable buttons"textfield"
- text input fields"searchfield"
- search input fields"checkbox"
- checkbox elements"radiobutton"
- radio button elements"combobox"
- dropdown menus"link"
- hyperlinks"image"
- images"statictext"
- text labels"scrollarea"
- scrollable containers
advanced usage examples
automating form filling
automating app workflows
ai-powered automation
for more powerful automation, combine the operator api with vercel ai sdk to enable ai-driven desktop interactions:troubleshooting
if you’re having issues with the operator api:- macos permissions: ensure screenpipe has accessibility permissions in system settings > privacy & security > accessibility
- app names: use exact app names as they appear in the applications folder
- timing issues: add delays between operations, as ui elements may take time to load
- debugging: log element ids and roles to help identify the right elements
- app focus: use the
activateApp: true
option to ensure the target app is in focus
practical use cases
here are some real-world applications for the operator api:-
messaging automation
- scrape whatsapp conversations and export them to spreadsheets
- auto-respond to common imessage inquiries when you’re busy
- track response rates across different messaging platforms
-
social media management
- schedule and post content across multiple platforms
- collect engagement metrics from twitter, instagram, or linkedin
- automate following/unfollowing based on specific criteria
- export comments and replies for sentiment analysis
-
data collection and research
- extract data from websites that don’t have accessible apis
- compile information across multiple applications into a single report
- monitor prices or availability of products across different sites
- build comprehensive research databases from scattered sources
-
personal productivity
- automate repetitive daily tasks (checking emails, organizing files)
- create custom workflows between applications that don’t normally integrate
- set up intelligent reminders based on content of messages or emails
- auto-fill forms with personal or business information
-
customer relationship management
- track conversations across multiple platforms for each contact
- automatically update crm systems with new interaction data
- generate follow-up reminders based on conversation content
- build comprehensive customer profiles from scattered data sources
-
content creation and editing
- automate screenshots or recordings of specific application states
- batch process images or documents using desktop applications
- extract text from images or pdfs for further processing
- organize and tag media files based on their content