CLI to control iOS and Android devices for AI agents influenced by Vercel’s agent-browser.
The project is in early development and considered experimental. Pull requests are welcome!
- Platforms: iOS (simulator + limited device support) and Android (emulator + device).
- Core commands:
open,back,home,app-switcher,press,long-press,focus,type,fill,scroll,scrollintoview,wait,alert,screenshot,close. - Inspection commands:
snapshot(accessibility tree). - Device tooling:
adb(Android),simctl/devicectl(iOS via Xcode). - Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).
npm install -g agent-deviceOr use it without installing:
npx agent-device open SampleAppagent-device open Contacts --platform ios # creates session on iOS Simulator
agent-device snapshot
agent-device click @e5
agent-device fill @e6 "John"
agent-device fill @e7 "Doe"
agent-device click @e3
agent-device closeagent-device <command> [args] [--json]Basic flow:
agent-device open SampleApp
agent-device snapshot
agent-device click @e7
agent-device fill @e8 "hello"
agent-device close SampleAppDebug flow:
agent-device trace start
agent-device snapshot -s "Sample App"
agent-device find label "Wi-Fi" click
agent-device trace stop ./trace.logCoordinates:
- All coordinate-based commands (
press,long-press,focus,fill) use device coordinates with origin at top-left. - X increases to the right, Y increases downward.
open,close,home,back,app-switchersnapshot,find,getclick,focus,type,fill,press,long-press,scroll,scrollintoviewalert,wait,screenshottrace start,trace stopsettings wifi|airplane|location on|offappstate,apps,devices,session list
| Backend | Speed | Accuracy | Requirements |
|---|---|---|---|
xctest |
Fast | High | No Accessibility permission required |
ax |
Fast | Medium | Accessibility permission for the terminal app, not recommended |
Notes:
- Default backend is
xcteston iOS. - Scope snapshots with
-s "<label>"or-s @ref. - If XCTest returns 0 nodes (e.g., foreground app changed), agent-device falls back to AX when available.
Flags:
--platform ios|android--device <name>--udid <udid>(iOS)--serial <serial>(Android)--activity <component>(Android; package/Activity or package/.Activity)--session <name>--verbosefor daemon and runner logs--jsonfor structured output--backend ax|xctest(snapshot only; defaults toxcteston iOS)
Install the automation skills listed in SKILL.md.
npx skills add https://github.com/callstackincubator/agent-device --skill agent-deviceSessions:
openstarts a session. Without args boots/activates the target device/simulator without launching an app.- All interaction commands require an open session.
- If a session is already open,
open <app>switches the active app and updates the session app bundle. closestops the session and releases device resources. Pass an app to close it explicitly, or omit to just close the session.- Use
--session <name>to manage multiple sessions. - Session logs are written to
~/.agent-device/sessions/<session>-<timestamp>.ad. - With
--record-json, JSON logs are written to~/.agent-device/sessions/<session>-<timestamp>.jsonby default.
Find (semantic):
find <text> <action> [value]finds by any text (label/value/identifier) using a scoped snapshot.find text|label|value|role|id <value> <action> [value]for specific locators.- Actions:
click(default),fill,type,focus,get text,get attrs,wait [timeout],exists.
Android fill reliability:
fillclears the current value, then enters text.typeenters text into the focused field without clearing.fillnow verifies the entered value on Android.- If value does not match, agent-device clears the field and retries once with slower typing.
- This reduces IME-related character swaps on long strings (e.g. emails and IDs).
Settings helpers (simulators):
settings wifi on|offsettings airplane on|offsettings location on|off(iOS uses per-app permission for the current session app) Note: iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.
App state:
appstateshows the foreground app/activity (Android). On iOS it uses the current session app when available, otherwise it falls back to a snapshot-based guess (AX first, XCTest if AX can’t identify).apps --metadatareturns app list with minimal metadata.
agent-device trace startagent-device trace stop ./trace.log- The trace log includes snapshot logs and XCTest runner logs for the session.
- Built-in retries cover transient runner connection failures, AX snapshot hiccups, and Android UI dumps.
- For snapshot issues (missing elements), compare with
--rawflag for unaltered output and scope with-s "<label>".
- Bundle/package identifiers are accepted directly (e.g.,
com.apple.Preferences). - Human-readable names are resolved when possible (e.g.,
Settings). - Built-in aliases include
Settingsfor both platforms.
- Input commands (
press,type,scroll, etc.) are supported only on simulators in v1 and use the XCTest runner. alertandscrollintoviewuse the XCTest runner and are simulator-only in v1.- Real device support (including snapshots) is on the roadmap for iOS.
pnpm testpnpm buildEnvironment selectors:
ANDROID_DEVICE=Pixel_9_Pro_XLorANDROID_SERIAL=emulator-5554IOS_DEVICE="iPhone 17 Pro"orIOS_UDID=<udid>
Test screenshots are written to:
test/screenshots/android-settings.pngtest/screenshots/ios-settings.png
See CONTRIBUTING.md.
agent-device is an open source project and will always remain free to use. Callstack is a group of React and React Native geeks. Contact us at hello@callstack.com if you need any help with these technologies or just want to say hi.