Agent clicks a button. Tool returns {"success": true}. Screen doesn't change. Agent has no idea.
farscry augment detects this inline and tells the agent immediately, zero model changes, zero retraining, three lines of MCP config.
Screenshot tooling returns descriptions. Workflows guess where to click, miss the target, and fail. OSWorld benchmarks show a ~88% overall task failure rate (GPT-4V baseline, arXiv:2404.07972; Claude CUA: ~15% success).
Agents that guess coordinates fail. Agents with exact coordinates act.
farscry returns typed elements with exact pixel coordinates. The workflow knows the button is at (400,300). It clicks. It succeeds.
Screenshot, error printout, Figma export, phone photo of a screen. Offline. No GPU.
After an action, farscry diffs before → after. No re-upload. No tokens wasted on pixels that didn't change.
farscry augment compares visual state before and after every action. When nothing changed, it tells the agent inline, before it wastes more steps.
click(450, 320){"success": true}farscry_mark_action()click(450, 320)farscry_extract(screenshot)Runs local for all four. No server, no GPU, no account.
The output is typed UI elements with exact pixel coordinates, not a description. Pipe to any agent, MCP client, or CLI tool.
In MCP mode, the daemon tracks state automatically, no extra command needed.
After each farscry_extract call,
the next call returns what changed since the last one.
Via CLI, pass two screenshots directly.
Capture a region with your system shortcut. Pipe straight to your agent. Zero files. Zero friction.
After farscry setup, Cmd+V detects what's in your clipboard, image or text, and routes it. No command. No alias. Just paste.
Same screenshot, bounding boxes drawn. Proves accuracy. Shareable. Zero guessing.
Every detected element gets a colored bounding box. Affordances get a thicker border.
Models (~12MB English) download to ~/.farscry/models/ on first run. No account needed.
VASP (Visual Application State Protocol) defines how workflows receive visual context - as typed coordinates with positions, not descriptions. farscry is the reference implementation.
Like MCP standardized tool connectivity, VASP standardizes visual context for workflows. One format, any framework, any workflow.
vasp-protocol.github.io/spec -> Local docs ->
Fleet-level AER visibility for AI Ops teams. Every agent deployment, one dashboard. Zero pixels ever leave your machine.
Track action effect rate per deployment, model, and environment over time. Spot regressions before users notice them.
Define the minimum AER you accept. Get notified the moment a deploy crosses the line. PagerDuty, Slack, or webhook.
Compare your agent's AER against similar deployments in the field. Know if you're above or below the sector average.
v0.5.0 ships farscry augment: silent failure detection inline in every MCP response. farscry cloud is on the horizon.
farscry hook: zero-friction terminal recordingpack + timelinefarscry augment: inline silent failure detection in MCP responsesfarscry_mark_action: explicit action marker MCP toolfarscry analyze: AER and VLR metrics across sessionsfarscry mark-action: CLI action marker for terminal hookfarscry diff --json: structured diff output for toolingfarscry install-lang: multilingual OCR via CDNfarscry serveOS accessibility tree scraped into local SQLite. Agents query elements by role and name. Coordinates are pixel-perfect — not OCR estimates.
When farscry detects a silent failure and the action is reversible, it injects a deterministic recovery before returning control to the agent. No extra model call.
Changelog and full history: github.com/teles-forge/farscry/CHANGELOG.md