Quick Start

Install farscry
Terminal window
npm install -g farscry
Terminal window
pip install farscry
Terminal window
brew install teles-forge/tap/farscry
Terminal window
cargo install farscry
Terminal window
curl -fsSL https://farscry.dev/install | sh
Wire up your agent
Terminal window
```
farscry setup
```
Detects Claude Code, Devin, Codex, and Aider. Shows MCP config for Claude Code, Cursor, Windsurf, and Zed. farscry never modifies your files.

Extract from a screenshot

farscry extract screenshot.png

Output:

=== farscry visual context ===
source: screenshot.png
screen_type: config
state_id: phash:3d9b1e7a...
confidence: high
lang: eng
agent_context: "Payment Settings - 3 editable fields, Save available"
---

[top-left]      heading  "Payment Settings"
[middle-left]   label    "Max Value:"
[middle-center] input    "1500"
                value="1500"
                enabled:true
[middle-right]  button   "Save Changes"
                enabled:true
[bottom-left]   error    "Value must be <= 10000"

affordances:
  click: "Save Changes"   enabled: true
  type:  input "Max Value"  current: "1500"

Diff two screenshots

farscry diff before.png after.png

Output:

=== farscry diff ===
state_id:   phash:3d9b1e7a...
delta_from: phash:8f4a2c3d...
context_similarity: 0.847
context_changed: true
---

appeared:  error    "Card declined"
changed:   button   "Submit" -> disabled
removed:   label    "spinner"
unchanged: [9 elements]

Token savings: ~312 tokens saved vs re-sending both images

Pipe to an agent

farscry extract screen.png | claude -p "fix this"
farscry extract --from-clipboard | claude -p "fix this"

farscry writes to stdout. Pipe anywhere.

Visual debugging with annotate

farscry annotate shows you exactly what farscry sees: same screenshot with bounding boxes drawn over each detected element.

farscry annotate screenshot.png -o annotated.png
# or from clipboard:
farscry annotate --from-clipboard -o /tmp/out.png

Use this to:

Verify farscry is detecting elements correctly before wiring your agent
Debug agent failures: did farscry miss the button?
Share annotated screenshots with your team

Add the fannot alias for one-command workflow:

echo "alias fannot='farscry annotate --from-clipboard -o /tmp/farscry_annotated.png && open /tmp/farscry_annotated.png'" >> ~/.zshrc && source ~/.zshrc

Then: screenshot -> fannot -> annotated image opens automatically.

Zero-friction workflow

The fastest way to use farscry. One command, every time.

farscry setup

Detects claude, devin, codex, aider. Shows the alias to add and MCP config to paste. Offers to create ~/.farscry/smart-paste.sh and show terminal key binding instructions.

Then add the short alias:

echo "alias fp='farscry paste'" >> ~/.zshrc && source ~/.zshrc

Now: screenshot → fp → done.

Smart paste: Cmd+V auto-detects images

After running farscry setup, answer y to “Configure smart paste?” to create the script and see instructions for your terminal.

The script (~/.farscry/smart-paste.sh) checks whether the clipboard contains an image:

Image in clipboard → runs farscry paste → sends to your agent
Text in clipboard → falls back to normal paste (pbpaste / xclip / Get-Clipboard)

macOS (iTerm2):

Preferences → Keys → Key Bindings → +
Shortcut: Cmd+V
Action: Run Command
Command: ~/.farscry/smart-paste.sh

macOS (Warp):

Settings → Features → Custom Key Bindings
Key: Cmd+V
Action: Run Command: ~/.farscry/smart-paste.sh

macOS (Terminal.app): Not supported natively. Use fp instead.

Linux (Gnome Terminal): Add to ~/.bashrc:

bind -x '"\C-v": ~/.farscry/smart-paste.sh'

Linux (Kitty) (~/.config/kitty/kitty.conf):

map ctrl+v launch --stdin-source=@last_cmd_output ~/.farscry/smart-paste.sh

Windows Terminal:

Settings → Actions → Add new
Command: wt.exe new-tab powershell -Command ~/.farscry/smart-paste.ps1
Keys: ctrl+v

Result: Screenshot with any tool → press Cmd+V in your terminal → farscry detects the image and sends it to your agent. No command to type.

Agent integrations

Claude Code

farscry extract screen.png | claude -p "fix this"
farscry extract --from-clipboard | claude -p "fix this"

Devin

devin -p "$(farscry extract screen.png): fix this"
devin -p "$(farscry extract --from-clipboard): fix this"

Codex

farscry extract screen.png | codex exec "fix this:"
farscry extract --from-clipboard | codex exec "fix this:"

MCP (all agents, recommended)

farscry serve --mcp

Supports multiple images via image_paths parameter.

Supported image formats

PNG, JPEG, GIF, WEBP, TIFF. From clipboard, file, or stdin. From clipboard: Cmd+Shift+4, Shottr, or Cmd+C on an image file in Finder.

Common flags

Flag	Description
`--json`	JSON output instead of VASP text
`--affordances`	Show only interactive elements
`--context`	One-line `agent_context` summary
`--lang por`	Explicit language (default: auto-detect)
`-v`	Verbose, show processing steps

Known limitations in v0.1.0

Scenario	Status	Notes
Text-heavy UIs (terminal, config, forms)	Works well	Core use case
Icon-only toolbars	Partial	Buttons without text labels are missed
Charts, graphs, images	Not supported	OCR extracts no structured data
`--from-clipboard` on Linux	Requires `xclip`	`apt install xclip`
Windows	Untested in v0.1.0	Binary ships, not CI-validated

Next steps

CLI Reference, extract
CLI Reference, diff
MCP Server, keep OCR warm, integrate with MCP-compatible agents
VASP Format, the output schema