
TL;DR
Transformers.js lets you run machine learning models in the browser with zero backend. Here is how to use it for text generation, speech recognition, image classification, and semantic search.
Every AI workflow you have seen runs on a server somewhere. You send a prompt, wait for a response, and pay per token. Transformers.js flips that model. It runs machine learning models directly in the browser using WebAssembly and WebGPU. No API keys. No server. No per-token billing.
The library is built by Hugging Face and mirrors their Python transformers library. Transformers.js v3 shipped in October 2024 with WebGPU support (up to 100x faster than WASM), 120 supported architectures, and over 1,200 pre-converted models on the Hugging Face Hub. V4 is now available with even more models - the community has already shipped browser demos for LFM2.5 1.2B reasoning models, Voxtral real-time speech transcription, and Nemotron Nano.
Under the hood, Transformers.js uses the ONNX runtime to run models. Any model converted to ONNX format works, and Hugging Face Hub has thousands of compatible models tagged with transformers.js.
This guide covers the practical use cases that matter for web developers.
npm install @huggingface/transformersThat is it. No Python, no Docker, no GPU drivers. The models are downloaded as ONNX files and cached in the browser on first use.
Every task in Transformers.js starts with pipeline(). You pick a task type, specify a model, and call the resulting function with your input.
import { pipeline } from "@huggingface/transformers";
const classifier = await pipeline(
"sentiment-analysis",
"Xenova/distilbert-base-uncased-finetuned-sst-2-english"
);
const result = await classifier("I love building with AI tools.");
// [{ label: "POSITIVE", score: 0.9998 }]
The first call downloads and caches the model. Subsequent calls are instant. Models range from 5MB to 500MB+ depending on the architecture.
WebGPU gives you GPU-accelerated inference in the browser. Add device: "webgpu" to your pipeline options.
const extractor = await pipeline(
"feature-extraction",
"mixedbread-ai/mxbai-embed-xsmall-v1",
{ device: "webgpu" }
);
WebGPU support is around 70% globally. Chrome and Edge support it natively. Firefox requires the dom.webgpu.enabled flag. Safari requires the WebGPU feature flag. The library falls back to WebAssembly automatically when WebGPU is not available, so your code works everywhere - it just runs faster with WebGPU.
This is the killer feature for web developers. Instead of keyword matching with libraries like fuse.js, you can embed your content and search by meaning.
import { pipeline } from "@huggingface/transformers";
const extractor = await pipeline(
"feature-extraction",
"mixedbread-ai/mxbai-embed-xsmall-v1",
{ device: "webgpu" }
);
// Embed your content (do this once, cache the vectors)
const docs = [
"How to set up Claude Code with CLAUDE.md",
"Building REST APIs with Express and TypeScript",
"Running Whisper locally for speech recognition",
];
const docEmbeddings = await extractor(docs, {
pooling: "mean",
normalize: true,
});
// Embed the search query
const query = "configure AI coding agent";
const queryEmbedding = await extractor([query], {
pooling: "mean",
normalize: true,
});
// Compute cosine similarity and rank
function cosineSimilarity(a: number[], b: number[]): number {
return a.reduce((sum, val, i) => sum + val * b[i], 0);
}
const queryVec = queryEmbedding.tolist()[0];
const scores = docEmbeddings.tolist().map((vec: number[], i: number) => ({
doc: docs[i],
score: cosineSimilarity(queryVec, vec),
}));
scores.sort((a, b) => b.score - a.score);
// "How to set up Claude Code with CLAUDE.md" ranks first
The user searches for "configure AI coding agent" and the Claude Code article ranks first, even though no keywords match. That is semantic search.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
Run OpenAI's Whisper model in the browser. Users record audio, and you transcribe it without sending anything to a server.
const transcriber = await pipeline(
"automatic-speech-recognition",
"onnx-community/whisper-tiny.en",
{ device: "webgpu" }
);
const result = await transcriber(audioBlob);
console.log(result.text);
// "The quick brown fox jumps over the lazy dog"
The whisper-tiny.en model is 40MB. For better accuracy, use whisper-small.en at 240MB. Both run in real time on modern hardware with WebGPU.
Classify images without uploading them to a server. Useful for content moderation, auto-tagging, or building visual search.
const classifier = await pipeline(
"image-classification",
"onnx-community/mobilenetv4_conv_small.e2400_r224_in1k",
{ device: "webgpu" }
);
const result = await classifier(imageElement);
// [{ label: "laptop", score: 0.87 }, { label: "keyboard", score: 0.06 }]
The MobileNet model is under 20MB and classifies images in milliseconds.
Run small language models directly in the browser. This is not GPT-4 class, but it is useful for autocomplete, content suggestions, and creative features that do not need to be perfect.
import { pipeline } from "@huggingface/transformers";
const generator = await pipeline(
"text-generation",
"HuggingFaceTB/SmolLM2-360M-Instruct"
);
const output = await generator("Explain WebGPU in one sentence:", {
max_new_tokens: 50,
temperature: 0.7,
});
SmolLM2 at 360M parameters is small enough for the browser and smart enough for light tasks. For the Vercel AI SDK, there is a dedicated provider:
import { streamText } from "ai";
import { transformersJS } from "@browser-ai/transformers-js";
const result = streamText({
model: transformersJS("HuggingFaceTB/SmolLM2-360M-Instruct"),
prompt: "Explain WebGPU in one sentence.",
});
Classify text into categories you define at runtime, without any training data.
const classifier = await pipeline(
"zero-shot-classification",
"Xenova/mobilebert-uncased-mnli"
);
const result = await classifier(
"How do I deploy a Next.js app to Vercel?",
["deployment", "authentication", "database", "testing"]
);
// { labels: ["deployment", ...], scores: [0.94, ...] }
This is useful for auto-routing support questions, categorizing user feedback, or building smart content filters.
Model size matters. A 50MB model download on first visit is fine for a tool page. It is not fine for a landing page. Lazy-load models after the page renders, and show a loading state.
Cache aggressively. Models are cached in the browser's Cache API after first download. Subsequent visits load from cache in milliseconds. Set proper cache headers if you are self-hosting models.
WebGPU is not everywhere. Always provide a WebAssembly fallback. Transformers.js does this automatically, but inference will be slower on CPU.
Quantization reduces size. Most models on Hugging Face Hub have quantized variants (q4, q8, fp16). Use the smallest quantization that meets your accuracy needs.
const pipe = await pipeline("feature-extraction", "model-name", {
dtype: "q4", // Quantized to 4-bit
});
Do not replace your API for complex tasks. Transformers.js is excellent for embeddings, classification, and small generative tasks. For complex multi-step reasoning, you still want Claude or GPT on the server. That said, V4 demos are pushing the boundary - Hugging Face's community has shipped 1.2B parameter reasoning models and real-time speech transcription running entirely in the browser.
The pattern that works for production web apps:
This hybrid approach gives you the best of both worlds: powerful reasoning from cloud APIs and instant, private, zero-cost inference for everything else.
Yes. Import it in client components ("use client") and load models after the component mounts. Server-side rendering will fail since the library needs browser APIs. Use dynamic imports with ssr: false for pages that depend on it.
Model sizes range from 5MB (tiny classifiers) to 500MB+ (large language models). For most browser use cases, you want models under 100MB. Embedding models like mxbai-embed-xsmall-v1 are around 30MB. Whisper tiny is 40MB. There are over 1,200 pre-converted models on the Hugging Face Hub ready to use.
No. Transformers.js falls back to WebAssembly automatically. WebGPU makes inference faster (often 5-10x), but everything works without it. Chrome and Edge support WebGPU today.
No. Transformers.js is inference-only. Fine-tune your model using the Python transformers library, then convert to ONNX format using Optimum and load it in Transformers.js for inference. Many models on Hugging Face Hub are already converted and tagged with transformers.js.
Transformers.js focuses specifically on transformer models from Hugging Face Hub. TensorFlow.js is a general-purpose ML framework. If you want to run pretrained NLP, vision, or audio models, Transformers.js is simpler and has better model support. If you need custom model architectures or training in the browser, use TensorFlow.js.
Further Reading:
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
LLM data framework for connecting custom data sources to language models. Best-in-class RAG, data connectors, and query...
View ToolStackBlitz's in-browser AI app builder. Full-stack apps from a prompt - runs Node.js, installs packages, and deploys....
View ToolNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Open-source AI pair programming in your terminal. Works with any LLM - Claude, GPT, Gemini, local models. Git-aware ed...
Install Ollama, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedInstall the dd CLI and scaffold your first AI-powered app in under a minute.
Getting StartedConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI Agents
OpenAI AI has launched their first browser called ChatGPT Atlas, which incorporates ChatGPT for enhanced functionality. This browser allows users to interact with their documents using natural...

Getting Started with OpenAI's New TypeScript Agents SDK: A Comprehensive Guide OpenAI has recently unveiled their Agents SDK within TypeScript, and this video provides a detailed walkthrough...

OpenAI Enhances Speech Models: New Text-to-Speech & Speech-to-Text Innovations In today's video, we delve into OpenAI's latest release of three new audio models. Discover the enhanced speech-to-te...

A step-by-step guide to building Model Context Protocol servers in TypeScript. Project setup, tool registration, resourc...
How RAG works, why it matters, and how to implement it in TypeScript. The technique that lets AI models use your data wi...
A practical comparison of the five major AI agent frameworks in 2026 - architecture, code examples, and a decision matri...