v0.5.0 · open source

Computer-use agents fail silently.
farscry tells them when it happens.

Agent clicks a button. Tool returns {"success": true}. Screen doesn't change. Agent has no idea.
farscry augment detects this inline and tells the agent immediately, zero model changes, zero retraining, three lines of MCP config.

farscry extract screen.png
$ farscry extract screen.png === farscry visual context === source: screen.png screen_type: config state_id: phash:8f4a2c9d1e3b7f6a confidence: high agent_context: "Payment settings - Save available" ---   [middle-right] button "Save Changes" enabled:true [middle-center] input "Card number" empty:true [middle-center] input "Expiry" value:"12/26" [bottom] error "Value must be ≤ 10000"   affordances: click "Save Changes" at (400,300) enabled:true type "Card number" at (300,200) current:""

Coordinates, not captions

Screenshot tooling returns descriptions. Workflows guess where to click, miss the target, and fail. OSWorld benchmarks show a ~88% overall task failure rate (GPT-4V baseline, arXiv:2404.07972; Claude CUA: ~15% success).

Agents that guess coordinates fail. Agents with exact coordinates act.

farscry returns typed elements with exact pixel coordinates. The workflow knows the button is at (400,300). It clicks. It succeeds.

Screenshot, error printout, Figma export, phone photo of a screen. Offline. No GPU.

farscry output
[middle-right] button "Submit" enabled:true at (640,480) [middle-center] input "Username" empty:true at (400,200) [middle-center] input "Password" empty:true at (400,280) [bottom] error "Invalid password" at (400,340) [bottom] link "Forgot password?" at (400,380)   affordances: type "Username" at (400,200) type "Password" at (400,280) click "Submit" at (640,480)

Performance
65×
faster than Tesseract on 4K screens
38ms warm · $0 · offline · N=223, ScreenSpot-Pro (MIT)
65× faster · 4K screens vs Tesseract · 38ms warm
~9× fewer tokens · 1080p vs Claude Vision (measured)
~16× fewer tokens · 4K screens vs Claude Vision · N=223
Speed
vs Tesseract 5.5 (4K) ~2,500ms 38ms 65×
vs Cloud Vision API ~2-5s 38ms 65–130×
Token usage per image
vs Claude Vision · 1080p 1,568 tokens ~175 tokens ~9×
vs Claude Vision · 4K (N=223) 1,568 tokens ~97 tokens ~16×
Cost · 10,000 images / day
Claude Sonnet 4.6 · $3/MTok $17,000/year $0
Claude Opus 4.7 · $5/MTok (4K) $90,000/year $0
Methodology → github.com/teles-forge/farscry/benchmarks

Automatic Diff

38ms instead of 5 seconds, every action.

After an action, farscry diffs before → after. No re-upload. No tokens wasted on pixels that didn't change.

without
action re-screenshot 1,568 tokens to cloud wait 2-5s read result
~$0.0047 · ~3 seconds
farscry
action farscry diff 38ms read result
$0

Silent Failure Detection

Agents now know when their actions fail.

farscry augment compares visual state before and after every action. When nothing changed, it tells the agent inline, before it wastes more steps.

Without farscry augment
1Agent calls click(450, 320)
2Tool returns {"success": true}
Screen didn't change. Agent has no idea.
8 more steps on the same broken state
~12,000 tokens wasted per loop
With farscry augment
1farscry_mark_action()
2Agent calls click(450, 320)
3farscry_extract(screenshot)
Agent receives inline warning. Pivots immediately.
detected at step 3 · zero extra tokens
farscry_extract response when action had no effect
=== farscry visual context === state_id: phash:8f4a2c9d1e3b7f6a ---   [middle-right] button "Save Changes" enabled:true   ⚠ SILENT_FAILURE DETECTED action had no visual effect state_id_before: phash:8f4a2c9d1e3b7f6a state_id_after: phash:8f4a2c9d1e3b7f6a recommendation: try a different approach
Three lines to enable farscry augment
// add to your MCP config, works with Claude Code, Cursor, any MCP client
"farscry": "command": "farscry", "args": ["serve", "--mcp"]

One binary. Four ways to use it.

Runs local for all four. No server, no GPU, no account.

mode 1: describe Any image → typed coordinates

The output is typed UI elements with exact pixel coordinates, not a description. Pipe to any agent, MCP client, or CLI tool.

farscry extract checkout.png
$ farscry extract checkout.png === farscry visual context === screen_type: form ---   [top-center] heading "Checkout" [middle-center] input "Name" empty:true [middle-center] input "Card number" empty:true [middle-center] input "Expiry" value:"12/26" [middle-right] input "CVV" masked:true [bottom] button "Pay $24.99" enabled:true [bottom] button "Cancel" enabled:true   affordances: type "Name" at (340,180) type "Card number" at (340,240) click "Pay $24.99" at (400,420)
mode 2: diff After every action, only what changed

In MCP mode, the daemon tracks state automatically, no extra command needed. After each farscry_extract call, the next call returns what changed since the last one. Via CLI, pass two screenshots directly.

farscry diff before.png after.png
$ farscry diff before.png after.png === farscry diff === from: phash:8f4a2c3d to: phash:3d9b1e7a ---   appeared: error "Card declined" at (0,48) changed: button "Submit" enabled:true → false changed: button "Pay $24.99" loading:true removed: spinner at (450,200)   38ms · 3 of 12 elements changed
mode 3: pipe Cmd+Shift+4, pipe, done

Capture a region with your system shortcut. Pipe straight to your agent. Zero files. Zero friction.

farscry extract --from-clipboard | claude -p "fix this"

Smart Paste

Cmd+V becomes image-aware.

After farscry setup, Cmd+V detects what's in your clipboard, image or text, and routes it. No command. No alias. Just paste.

farscry setup
$farscry setup
Configure smart Cmd+V? [y/N]: y
✓ ~/.farscry/smart-paste.sh created
iTerm2 · Warp · Kitty · Gnome Terminal · Windows Terminal
Screenshot → Cmd+V → image detected → sent to your agent.
Text in clipboard falls back to normal paste.

Visual Debug

See exactly what farscry sees.

Same screenshot, bounding boxes drawn. Proves accuracy. Shareable. Zero guessing.

mode 5: annotate Clipboard → annotated image

Every detected element gets a colored bounding box. Affordances get a thicker border.

farscry annotate --from-clipboard -o out.png
button input error heading label

Install

Up in 30 seconds.

npm npm install -g farscry
pip pip install farscry
homebrew brew install teles-forge/tap/farscry
cargo cargo install farscry
curl curl -fsSL https://farscry.dev/install | sh
$farscry setup
auto-detect MCP clients, wire up in 30 seconds

Models (~12MB English) download to ~/.farscry/models/ on first run. No account needed.


The open protocol behind farscry

VASP (Visual Application State Protocol) defines how workflows receive visual context - as typed coordinates with positions, not descriptions. farscry is the reference implementation.

Like MCP standardized tool connectivity, VASP standardizes visual context for workflows. One format, any framework, any workflow.

vasp-protocol.github.io/spec ->    Local docs ->

MCP = how workflows connect to tools
VASP = how workflows understand visual state
farscry setup
$ farscry setup farscry v0.4.0   Detected: Claude Code, Cursor   ── Claude Code ── Config file: ~/.claude/mcp.json "mcpServers": "farscry": "command": "farscry", "args": ["serve", "--mcp"]   farscry never modifies your config files automatically.

Coming soon

farscry cloud

Fleet-level AER visibility for AI Ops teams. Every agent deployment, one dashboard. Zero pixels ever leave your machine.

AER Dashboard

Deployments, not sessions

Track action effect rate per deployment, model, and environment over time. Spot regressions before users notice them.

Threshold Alerts

Your AER floor, your rules

Define the minimum AER you accept. Get notified the moment a deploy crosses the line. PagerDuty, Slack, or webhook.

Sector Baseline

Industry benchmark, opt-in

Compare your agent's AER against similar deployments in the field. Know if you're above or below the sector average.

What farscry cloud receives
metric payload — nothing else
{ "aer": 0.47, "sf_count": 18, "session_id": "abc123", "model": "claude-sonnet-4-5", "timestamp": 1716148440 }
Privacy model
No screenshots. No pixels. No user data. Everything is processed locally by farscry. Only aggregated metrics travel to the cloud. GDPR-safe and compliance-ready by design — teams in regulated industries can use it on day one.
Join early access → Free during early access  ·  no credit card

Roadmap

What comes next.

v0.5.0 ships farscry augment: silent failure detection inline in every MCP response. farscry cloud is on the horizon.

v0.4.0 current
  • Extract, diff, annotate: full VASP pipeline
  • MCP server: 38ms warm response
  • farscry hook: zero-friction terminal recording
  • VASF sessions: pack + timeline
  • Zero-copy pHash, 22MB daemon RSS (macOS)
  • Linux: Docker + X11, 11MB VmRSS
  • Global daemon: N terminals, one process
  • npm, pip, Homebrew, crates.io
v0.5.0 current
  • farscry augment: inline silent failure detection in MCP responses
  • farscry_mark_action: explicit action marker MCP tool
  • farscry analyze: AER and VLR metrics across sessions
  • farscry mark-action: CLI action marker for terminal hook
  • Headless Linux: Xvfb auto-start, works on any server
  • farscry diff --json: structured diff output for tooling
v0.6.0 planned
  • VASP adapters: native Playwright and OpenAI Vision support
  • farscry install-lang: multilingual OCR via CDN
  • Per-window capture when minimized (SCContentFilter)
  • Screen-lock awareness in farscry serve
v0.7.0 — a11y grounding

Queries, not guesses

OS accessibility tree scraped into local SQLite. Agents query elements by role and name. Coordinates are pixel-perfect — not OCR estimates.

farscry_query("SELECT x,y FROM elements
  WHERE role='button'
  AND name='Save'"
)
(640, 480) in <1ms
v0.8.0 — rollback

Automatic recovery, no inference

When farscry detects a silent failure and the action is reversible, it injects a deterministic recovery before returning control to the agent. No extra model call.

SF detected → modal opened
farscry injects Escape via a11y
agent receives clean state

Changelog and full history: github.com/teles-forge/farscry/CHANGELOG.md