Skip to content

Roadmap

The foundation. Local OCR pipeline, typed VASP output, MCP server, smart paste.

FeatureStatus
farscry extract - screenshot to VASP textReleased
farscry diff - semantic delta between two screenshotsReleased
farscry serve --mcp - 38ms warm daemonReleased
farscry setup - agent config + smart pasteReleased
npm, pip, Homebrew, crates.io distributionReleased
VASP 1.0-draft open RFCReleased

Four targeted features. Each ships independently.

farscry install-lang por currently returns an error. v0.2.0 makes it work.

PP-OCRv5 has per-language ONNX recognition models. v0.2.0 downloads, verifies, and loads them on demand.

Terminal window
farscry install-lang por # Portuguese
farscry install-lang deu # German
farscry install-lang jpn # Japanese
farscry extract screen.png --lang por
farscry extract screen.png --lang eng+por

Takes a screenshot and returns the same image with bounding boxes drawn over detected elements, labels, and element types.

Terminal window
farscry annotate screen.png -o annotated.png

Each element type gets a distinct color. Affordances (clickable, typeable) are highlighted differently from labels and headings. The output image is shareable and self-documenting.

This is primarily a debugging and demo tool. When you can see the boxes, you can verify the coordinates are correct before sending them to your agent.

farscry extract --from-clipboard is not implemented on Windows. v0.2.0 completes the platform story.

Tools that convert other formats to VASP without requiring farscry’s OCR pipeline.

For teams already using Claude computer-use, Playwright, or OpenAI vision: they get VASP output without changing their extraction pipeline.

Terminal window
farscry convert --from claude-computer-use --input result.json
farscry convert --from playwright-a11y --input snapshot.json
farscry convert --from openai-vision --input response.json

This is the protocol adoption path. Other tools join VASP without rewriting their extraction layer.


Monitors a screen region continuously. Emits a VASP diff each time something changes. No polling required from the agent.

Terminal window
farscry watch --region 0,0,1920,1080
# streams VASP diffs to stdout as UI state changes

The daemon tracks state_id history. If the same state appears twice, context_changed: false is emitted and the agent is notified it may be in a loop.

Useful for automation that gets stuck repeating the same action without effect.

The npm and pip SDKs currently wrap the CLI binary via subprocess. v0.3.0 turns them into proper async clients that connect directly to the daemon socket.

Lower latency. No subprocess overhead. Persistent connection.


  • VASP validator: verifies any VASP output against the spec schema
  • VASP stream: Server-Sent Events endpoint for real-time state monitoring
  • Third-party implementations: guide + badge + registry for VASP-compatible tools

  • Cloud inference (farscry is local-only by design)
  • GUI app (CLI and MCP are the interface)
  • Plugin ecosystem (premature until core protocol is stable)

Full spike documentation for v0.2.0 features: github.com/teles-forge/farscry/blob/main/docs/projects/roadmap-v0.2.0.md