Roadmap
v0.1.0 - Released
Section titled “v0.1.0 - Released”The foundation. Local OCR pipeline, typed VASP output, MCP server, smart paste.
| Feature | Status |
|---|---|
farscry extract - screenshot to VASP text | Released |
farscry diff - semantic delta between two screenshots | Released |
farscry serve --mcp - 38ms warm daemon | Released |
farscry setup - agent config + smart paste | Released |
| npm, pip, Homebrew, crates.io distribution | Released |
| VASP 1.0-draft open RFC | Released |
v0.2.0 - In planning
Section titled “v0.2.0 - In planning”Four targeted features. Each ships independently.
Multi-language OCR
Section titled “Multi-language OCR”farscry install-lang por currently returns an error. v0.2.0 makes it work.
PP-OCRv5 has per-language ONNX recognition models. v0.2.0 downloads, verifies, and loads them on demand.
farscry install-lang por # Portuguesefarscry install-lang deu # Germanfarscry install-lang jpn # Japanese
farscry extract screen.png --lang porfarscry extract screen.png --lang eng+porfarscry annotate
Section titled “farscry annotate”Takes a screenshot and returns the same image with bounding boxes drawn over detected elements, labels, and element types.
farscry annotate screen.png -o annotated.pngEach element type gets a distinct color. Affordances (clickable, typeable) are highlighted differently from labels and headings. The output image is shareable and self-documenting.
This is primarily a debugging and demo tool. When you can see the boxes, you can verify the coordinates are correct before sending them to your agent.
Windows clipboard
Section titled “Windows clipboard”farscry extract --from-clipboard is not implemented on Windows.
v0.2.0 completes the platform story.
VASP adapters
Section titled “VASP adapters”Tools that convert other formats to VASP without requiring farscry’s OCR pipeline.
For teams already using Claude computer-use, Playwright, or OpenAI vision: they get VASP output without changing their extraction pipeline.
farscry convert --from claude-computer-use --input result.jsonfarscry convert --from playwright-a11y --input snapshot.jsonfarscry convert --from openai-vision --input response.jsonThis is the protocol adoption path. Other tools join VASP without rewriting their extraction layer.
v0.3.0 - Planned
Section titled “v0.3.0 - Planned”farscry watch
Section titled “farscry watch”Monitors a screen region continuously. Emits a VASP diff each time something changes. No polling required from the agent.
farscry watch --region 0,0,1920,1080# streams VASP diffs to stdout as UI state changesLoop detection in daemon
Section titled “Loop detection in daemon”The daemon tracks state_id history. If the same state appears twice,
context_changed: false is emitted and the agent is notified it may be
in a loop.
Useful for automation that gets stuck repeating the same action without effect.
SDK native clients
Section titled “SDK native clients”The npm and pip SDKs currently wrap the CLI binary via subprocess. v0.3.0 turns them into proper async clients that connect directly to the daemon socket.
Lower latency. No subprocess overhead. Persistent connection.
v1.0.0 - Future
Section titled “v1.0.0 - Future”- VASP validator: verifies any VASP output against the spec schema
- VASP stream: Server-Sent Events endpoint for real-time state monitoring
- Third-party implementations: guide + badge + registry for VASP-compatible tools
What is NOT on the roadmap
Section titled “What is NOT on the roadmap”- Cloud inference (farscry is local-only by design)
- GUI app (CLI and MCP are the interface)
- Plugin ecosystem (premature until core protocol is stable)
Full spike documentation for v0.2.0 features: github.com/teles-forge/farscry/blob/main/docs/projects/roadmap-v0.2.0.md