Terminal.skills
Skills/image-to-text
>

image-to-text

Extract text and structured data from images using Vision AI (OCR). Use when: reading text from screenshots, extracting data from scanned documents, converting images of tables/forms/charts to structured text.

#ocr#image-to-text#vision-ai#text-extraction#document
terminal-skillsv1.0.0
Works with:claude-codeopenai-codexgemini-clicursor
Source

Usage

$
✓ Installed image-to-text v1.0.0

Getting Started

  1. Install the skill using the command above
  2. Open your AI coding agent (Claude Code, Codex, Gemini CLI, or Cursor)
  3. Reference the skill in your prompt
  4. The AI will use the skill's capabilities automatically

Example Prompts

  • "Analyze the sales data in revenue.csv and identify trends"
  • "Create a visualization comparing Q1 vs Q2 performance metrics"

Information

Version
1.0.0
Author
terminal-skills
Category
Data & AI
License
MIT

Documentation

Overview

Extract all readable text from an image using OCR (Tesseract). Returns the full text content along with word-level bounding boxes and confidence scores.

  • Reading text content from a screenshot or design mockup
  • Extracting UI copy (labels, buttons, headings) so you don't have to retype it
  • Getting text positions and bounding boxes from a design image

Instructions

  1. The image is passed to Tesseract.js for optical character recognition
  2. Tesseract segments the image into lines and words
  3. Returns the full text plus word-level details (position, confidence)

Run the extraction script:

bash
bash <skill-path>/scripts/image-to-text.sh <image-path> [language]

Arguments:

  • image-path — Path to the image file (required)
  • language — OCR language code (optional, defaults to eng). Common: eng, fra, deu, spa, chi_sim, jpn

The script outputs JSON with extracted text and metadata:

json
{
  "text": "Request work\nSuggestions\nPlumbing\nHVAC\nCleaning\nElectrical",
  "confidence": 87.4,
  "words": [
    {
      "text": "Request",
      "confidence": 94.2,
      "bbox": { "x0": 142, "y0": 180, "x1": 268, "y1": 204 }
    }
  ],
  "lines": [
    {
      "text": "Request work",
      "confidence": 95.1,
      "bbox": { "x0": 142, "y0": 180, "x1": 332, "y1": 204 }
    }
  ]
}

After extracting text, present the content grouped by lines and use the extracted text directly when implementing UI copy from a design.

Examples

Example 1: Extract text from a mobile app screenshot

bash
bash <skill-path>/scripts/image-to-text.sh ./screenshot.png

Output:

Extracted text (87.4% confidence):

  Request work
  Suggestions
  Plumbing
  HVAC
  Cleaning
  Electrical

Found 6 lines, 6 words.

Example 2: Extract French text from a scanned invoice

bash
bash <skill-path>/scripts/image-to-text.sh ./invoice-scan.png fra

Tesseract uses the French language model to correctly recognize accented characters and French-specific formatting. The extracted text can then be parsed for invoice fields like total, date, and line items.

Guidelines

  • Tesseract works best with clean, high-contrast text. Screenshots of rendered UI work well. Photos of text at angles or with noise may produce poor results.
  • Pass the correct language code as the second argument when processing non-English text. Tesseract needs the right language model to recognize characters.
  • First run is slow because Tesseract downloads language data (~4MB for English). Subsequent runs are faster.
  • For structured documents (tables, forms), post-process the extracted text to parse it into JSON or CSV format.