Skip to content

Quick Start

  1. Install farscry

    Terminal window
    npm install -g farscry
  2. Wire up your agent

    Terminal window
    farscry setup

    Detects Claude Code, Devin, Codex, and Aider. Shows MCP config for Claude Code, Cursor, Windsurf, and Zed. farscry never modifies your files.

  3. Extract from a screenshot

    Terminal window
    farscry extract screenshot.png

    Output:

    === farscry visual context ===
    source: screenshot.png
    screen_type: config
    state_id: phash:3d9b1e7a...
    confidence: high
    lang: eng
    agent_context: "Payment Settings - 3 editable fields, Save available"
    ---
    [top-left] heading "Payment Settings"
    [middle-left] label "Max Value:"
    [middle-center] input "1500"
    value="1500"
    enabled:true
    [middle-right] button "Save Changes"
    enabled:true
    [bottom-left] error "Value must be <= 10000"
    affordances:
    click: "Save Changes" enabled: true
    type: input "Max Value" current: "1500"
  4. Diff two screenshots

    Terminal window
    farscry diff before.png after.png

    Output:

    === farscry diff ===
    state_id: phash:3d9b1e7a...
    delta_from: phash:8f4a2c3d...
    context_similarity: 0.847
    context_changed: true
    ---
    appeared: error "Card declined"
    changed: button "Submit" -> disabled
    removed: label "spinner"
    unchanged: [9 elements]
    Token savings: ~312 tokens saved vs re-sending both images
  5. Pipe to an agent

    Terminal window
    farscry extract screen.png | claude -p "fix this"
    farscry extract --from-clipboard | claude -p "fix this"

    farscry writes to stdout. Pipe anywhere.

farscry annotate shows you exactly what farscry sees: same screenshot with bounding boxes drawn over each detected element.

Terminal window
farscry annotate screenshot.png -o annotated.png
# or from clipboard:
farscry annotate --from-clipboard -o /tmp/out.png

Use this to:

  • Verify farscry is detecting elements correctly before wiring your agent
  • Debug agent failures: did farscry miss the button?
  • Share annotated screenshots with your team

Add the fannot alias for one-command workflow:

Terminal window
echo "alias fannot='farscry annotate --from-clipboard -o /tmp/farscry_annotated.png && open /tmp/farscry_annotated.png'" >> ~/.zshrc && source ~/.zshrc

Then: screenshot -> fannot -> annotated image opens automatically.

The fastest way to use farscry. One command, every time.

Terminal window
farscry setup

Detects claude, devin, codex, aider. Shows the alias to add and MCP config to paste. Offers to create ~/.farscry/smart-paste.sh and show terminal key binding instructions.

Then add the short alias:

Terminal window
echo "alias fp='farscry paste'" >> ~/.zshrc && source ~/.zshrc

Now: screenshot → fp → done.

After running farscry setup, answer y to “Configure smart paste?” to create the script and see instructions for your terminal.

The script (~/.farscry/smart-paste.sh) checks whether the clipboard contains an image:

  • Image in clipboard → runs farscry paste → sends to your agent
  • Text in clipboard → falls back to normal paste (pbpaste / xclip / Get-Clipboard)

macOS (iTerm2):

Preferences → Keys → Key Bindings → +
Shortcut: Cmd+V
Action: Run Command
Command: ~/.farscry/smart-paste.sh

macOS (Warp):

Settings → Features → Custom Key Bindings
Key: Cmd+V
Action: Run Command: ~/.farscry/smart-paste.sh

macOS (Terminal.app): Not supported natively. Use fp instead.

Linux (Gnome Terminal): Add to ~/.bashrc:

Terminal window
bind -x '"\C-v": ~/.farscry/smart-paste.sh'

Linux (Kitty) (~/.config/kitty/kitty.conf):

map ctrl+v launch --stdin-source=@last_cmd_output ~/.farscry/smart-paste.sh

Windows Terminal:

Settings → Actions → Add new
Command: wt.exe new-tab powershell -Command ~/.farscry/smart-paste.ps1
Keys: ctrl+v

Result: Screenshot with any tool → press Cmd+V in your terminal → farscry detects the image and sends it to your agent. No command to type.

Terminal window
farscry extract screen.png | claude -p "fix this"
farscry extract --from-clipboard | claude -p "fix this"
Terminal window
devin -p "$(farscry extract screen.png): fix this"
devin -p "$(farscry extract --from-clipboard): fix this"
Terminal window
farscry extract screen.png | codex exec "fix this:"
farscry extract --from-clipboard | codex exec "fix this:"
Terminal window
farscry serve --mcp

Supports multiple images via image_paths parameter.

PNG, JPEG, GIF, WEBP, TIFF. From clipboard, file, or stdin. From clipboard: Cmd+Shift+4, Shottr, or Cmd+C on an image file in Finder.

FlagDescription
--jsonJSON output instead of VASP text
--affordancesShow only interactive elements
--contextOne-line agent_context summary
--lang porExplicit language (default: auto-detect)
-vVerbose, show processing steps
ScenarioStatusNotes
Text-heavy UIs (terminal, config, forms)Works wellCore use case
Icon-only toolbarsPartialButtons without text labels are missed
Charts, graphs, imagesNot supportedOCR extracts no structured data
--from-clipboard on LinuxRequires xclipapt install xclip
WindowsUntested in v0.1.0Binary ships, not CI-validated