Scrape the web for AI with Firecrawl
Firecrawl turns any website into LLM-ready markdown. It handles JavaScript rendering, anti-bot evasion, and sitemap discovery.
Prerequisites
- +Firecrawl account and API key
- +Node 20+ or Python 3.10+
Step-by-Step
- 1
Install the SDK
Both TypeScript and Python SDKs are first-class. The same API surface across both.
pnpm add @mendable/firecrawl-js - 2
Scrape a single page
Returns clean markdown. JS-rendered SPAs work out of the box - no Puppeteer setup.
import FirecrawlApp from '@mendable/firecrawl-js'; const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY }); const res = await app.scrapeUrl('https://docs.example.com/quickstart', { formats: ['markdown'] }); console.log(res.markdown); - 3
Map a site
Map returns every URL on a domain. Filter by path before crawling to keep costs down.
const map = await app.mapUrl('https://docs.example.com', { search: 'api' }); console.log(map.links); - 4
Crawl in bulk
Crawl extracts content from every reachable page under a path. Set maxDepth and limit to avoid runaway jobs.
const job = await app.crawlUrl('https://docs.example.com', { limit: 100, maxDepth: 3, scrapeOptions: { formats: ['markdown'] } }); console.log(job.data.length, 'pages'); - 5
Extract structured data
Pass a Zod schema and Firecrawl's LLM extraction returns typed JSON instead of markdown.
import { z } from 'zod'; const schema = z.object({ title: z.string(), price: z.number() }); const res = await app.scrapeUrl(url, { formats: ['extract'], extract: { schema } }); - 6
Pipe into your vector store
Take the markdown, chunk it, embed it, store it. Now you have a RAG pipeline over a site you do not own.
Common Pitfalls
- !Crawling without a limit racks up bills fast. Always cap.
- !Some sites block aggressive crawlers. Respect robots.txt.
- !Markdown is great for prose, terrible for table-heavy pages.
Video Clipper
AI-powered video clipping with smart moment detection. Turn long videos into shareable clips.
What's Next
- ->Schedule recurring crawls so your RAG index stays fresh.
- ->Combine with /interact for sites behind logins or pagination.
