firecrawl
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
Usage
Getting Started
- Install the skill using the command above
- Open your AI coding agent (Claude Code, Codex, Gemini CLI, or Cursor)
- Reference the skill in your prompt
- The AI will use the skill's capabilities automatically
Example Prompts
- "Analyze the sales data in revenue.csv and identify trends"
- "Create a visualization comparing Q1 vs Q2 performance metrics"
Documentation
Overview
Firecrawl is an API that scrapes websites and returns clean, LLM-ready content. Point it at any URL and get back markdown, HTML, or structured data — no selectors to write, no anti-bot handling, no browser management. It handles JavaScript rendering, proxy rotation, and content extraction automatically. Built for feeding web content into LLMs, RAG pipelines, and data workflows.
When to Use
- Extracting website content for RAG (Retrieval-Augmented Generation)
- Converting web pages to clean markdown for LLM consumption
- Crawling entire sites and getting structured content
- Scraping without managing browsers, proxies, or anti-bot
- Extracting structured data (products, articles) with LLM-powered extraction
Instructions
Setup
npm install @mendable/firecrawl-js
# Or Python: pip install firecrawl-py
# Self-hosted: docker run -p 3002:3002 mendableai/firecrawl
Single Page Scrape
// scrape.ts — Convert any URL to clean markdown
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY,
// apiUrl: "http://localhost:3002" // For self-hosted
});
// Scrape a single page
const result = await firecrawl.scrapeUrl("https://docs.example.com/getting-started", {
formats: ["markdown", "html"], // Get both formats
});
console.log(result.markdown); // Clean markdown content
console.log(result.metadata); // Title, description, language, etc.
Full Site Crawl
// crawl.ts — Crawl an entire site
const crawlResult = await firecrawl.crawlUrl("https://docs.example.com", {
limit: 100, // Max pages to crawl
scrapeOptions: {
formats: ["markdown"],
},
});
// Process all pages
for (const page of crawlResult.data) {
console.log(`${page.metadata.title}: ${page.markdown.length} chars`);
// Feed into your RAG pipeline, vector DB, etc.
}
Structured Data Extraction
// extract.ts — Extract structured data using LLM
import { z } from "zod";
const ProductSchema = z.object({
name: z.string(),
price: z.number(),
currency: z.string(),
rating: z.number().optional(),
inStock: z.boolean(),
features: z.array(z.string()),
});
const result = await firecrawl.scrapeUrl("https://shop.example.com/product/123", {
formats: ["extract"],
extract: {
schema: ProductSchema,
},
});
console.log(result.extract);
// { name: "Widget Pro", price: 49.99, currency: "USD", rating: 4.5, inStock: true, features: [...] }
Build a RAG Knowledge Base
// rag-ingest.ts — Crawl docs site and ingest into vector DB
import FirecrawlApp from "@mendable/firecrawl-js";
import { ChromaClient } from "chromadb";
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const chroma = new ChromaClient();
const collection = await chroma.getOrCreateCollection({ name: "docs" });
// Crawl documentation site
const crawl = await firecrawl.crawlUrl("https://docs.myproduct.com", {
limit: 500,
scrapeOptions: { formats: ["markdown"] },
});
// Chunk and store in vector DB
for (const page of crawl.data) {
const chunks = splitIntoChunks(page.markdown, 1000); // 1000 char chunks
await collection.add({
ids: chunks.map((_, i) => `${page.metadata.sourceURL}-chunk-${i}`),
documents: chunks,
metadatas: chunks.map(() => ({
source: page.metadata.sourceURL,
title: page.metadata.title,
})),
});
}
function splitIntoChunks(text: string, size: number): string[] {
const chunks: string[] = [];
for (let i = 0; i < text.length; i += size) {
chunks.push(text.slice(i, i + size));
}
return chunks;
}
Examples
Example 1: Build a docs chatbot
User prompt: "I want a chatbot that answers questions about my product documentation."
The agent will use Firecrawl to crawl the docs site, convert to markdown, chunk the content, store in a vector database, and build a RAG query pipeline.
Example 2: Monitor competitor content changes
User prompt: "Track when our competitor updates their pricing page."
The agent will schedule periodic Firecrawl scrapes, compare markdown diffs between runs, and alert on significant changes.
Guidelines
scrapeUrlfor single pages — fast, returns markdown + metadatacrawlUrlfor entire sites — follows links, respects limits- Markdown is the best LLM format — cleaner than HTML, preserves structure
- Structured extraction for data — use Zod/JSON schema to extract typed data
- Self-host for privacy —
docker run mendableai/firecrawlfor sensitive data - Rate limits on cloud API — 500 pages/min on free tier
- Chunk markdown for RAG — 500-1500 char chunks with overlap work best
- Cache results — don't re-scrape unchanged pages
formatsarray — request only what you need (markdown, html, extract)
Information
- Version
- 1.0.0
- Author
- terminal-skills
- Category
- Data & AI
- License
- Apache-2.0