Build a RAG pipeline with LlamaIndex
LlamaIndex is the data framework for LLM apps. It excels at structured ingestion: parsing PDFs, websites, Notion, and SQL into a queryable index. This guide builds a TypeScript RAG pipeline with the LlamaIndex.TS SDK.
Prerequisites
- +Node 20+
- +OpenAI API key
- +A directory of source documents
Step-by-Step
- 1
Install LlamaIndex.TS
The TypeScript SDK ships ESM-first. Make sure your tsconfig has module: nodenext.
pnpm add llamaindex - 2
Load documents
SimpleDirectoryReader auto-detects file types. For PDFs, install pdf-parse alongside.
import { SimpleDirectoryReader, VectorStoreIndex } from 'llamaindex'; const reader = new SimpleDirectoryReader(); const documents = await reader.loadData({ directoryPath: './docs' }); - 3
Build the index
VectorStoreIndex.fromDocuments handles chunking, embedding, and vector storage in one call. Persist it to disk so you do not pay for re-embeds.
import { storageContextFromDefaults } from 'llamaindex'; const storageContext = await storageContextFromDefaults({ persistDir: './storage' }); const index = await VectorStoreIndex.fromDocuments(documents, { storageContext }); - 4
Create a query engine
The query engine wires retriever, response synthesizer, and post-processors together. Use similarityTopK to control recall.
const queryEngine = index.asQueryEngine({ similarityTopK: 5 }); const response = await queryEngine.query({ query: 'Summarize the onboarding flow' }); console.log(response.toString()); - 5
Add metadata filters
Tag documents at ingest time so you can scope queries to a subset. This is where LlamaIndex shines vs raw vector stores.
documents.forEach((d) => { d.metadata = { ...d.metadata, tenant: 'acme' }; }); - 6
Stream responses
Streaming makes the app feel instant. Pipe tokens to your UI as they arrive.
const stream = await queryEngine.query({ query: 'Explain pricing', stream: true }); for await (const chunk of stream) process.stdout.write(chunk.response);
Common Pitfalls
- !Re-loading documents on every dev run wastes embedding spend. Always persist storageContext.
- !PDFs with scanned images need OCR first. LlamaIndex will not magically read them.
- !similarityTopK too high (>10) drowns the LLM in irrelevant context.
DevDigest Academy
Structured AI engineering courses with hands-on labs. Build production-ready apps faster.
What's Next
- ->Switch storage to Postgres + pgvector for production.
- ->Add a routing query engine to fan queries across multiple indices.
- ->Wrap the engine in a chat agent for multi-turn use.
