Scrape the web for AI with Playwright
Playwright is Microsoft's browser automation library. It is overkill for static pages but unbeatable for sites with logins, complex JS, or DOM-heavy interactions.
Prerequisites
- +Node 20+
- +Comfortable with async/await
- +A target site you have permission to scrape
Step-by-Step
- 1
Install Playwright
The init command also downloads browser binaries. Pick chromium for fastest cold starts.
pnpm dlx create-playwright@latest --quiet --browser=chromium - 2
Launch and navigate
Always use a context, not just a browser - contexts isolate cookies and storage.
import { chromium } from 'playwright'; const browser = await chromium.launch(); const context = await browser.newContext(); const page = await context.newPage(); await page.goto('https://example.com/login'); - 3
Authenticate once, reuse
storageState saves auth cookies to disk. Future runs start logged in - critical for avoiding rate limits.
await page.fill('#email', process.env.EMAIL!); await page.fill('#password', process.env.PASSWORD!); await page.click('button[type=submit]'); await context.storageState({ path: 'auth.json' }); - 4
Extract data
Use locators - they auto-wait, retry, and are immune to race conditions that plague raw selectors.
const rows = await page.locator('table tr').all(); const data = await Promise.all(rows.map(async (r) => ({ name: await r.locator('td:nth-child(1)').innerText(), price: await r.locator('td:nth-child(2)').innerText(), }))); - 5
Handle pagination
Loop until the next button disappears. waitForResponse keeps you in lockstep with the API the page calls.
while (await page.locator('button:has-text("Next")').isVisible()) { await page.click('button:has-text("Next")'); await page.waitForLoadState('networkidle'); } - 6
Run headless in CI
Playwright's Docker image is ready to go. Drop it into a GitHub Action for scheduled scrapes.
Common Pitfalls
- !Using waitForTimeout instead of locator auto-wait creates flaky tests.
- !Running headed in production wastes resources.
- !Not respecting robots.txt or ToS - know your legal exposure.
Video Clipper
AI-powered video clipping with smart moment detection. Turn long videos into shareable clips.
What's Next
- ->Distribute work via Playwright's parallel test runner for high-volume scrapes.
- ->Pair with a queue (BullMQ) for resumable jobs.
