RAG

Build a RAG pipeline with LlamaIndex

LlamaIndex is the data framework for LLM apps. It excels at structured ingestion: parsing PDFs, websites, Notion, and SQL into a queryable index. This guide builds a TypeScript RAG pipeline with the LlamaIndex.TS SDK.

Prerequisites

+Node 20+
+OpenAI API key
+A directory of source documents

Step-by-Step

1
Install LlamaIndex.TS
The TypeScript SDK ships ESM-first. Make sure your tsconfig has module: nodenext.
```
pnpm add llamaindex
```

Load documents

SimpleDirectoryReader auto-detects file types. For PDFs, install pdf-parse alongside.

import { SimpleDirectoryReader, VectorStoreIndex } from 'llamaindex';

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({ directoryPath: './docs' });

Build the index

VectorStoreIndex.fromDocuments handles chunking, embedding, and vector storage in one call. Persist it to disk so you do not pay for re-embeds.

import { storageContextFromDefaults } from 'llamaindex';

const storageContext = await storageContextFromDefaults({ persistDir: './storage' });
const index = await VectorStoreIndex.fromDocuments(documents, { storageContext });

Create a query engine

The query engine wires retriever, response synthesizer, and post-processors together. Use similarityTopK to control recall.

const queryEngine = index.asQueryEngine({ similarityTopK: 5 });
const response = await queryEngine.query({ query: 'Summarize the onboarding flow' });
console.log(response.toString());

5
Add metadata filters
Tag documents at ingest time so you can scope queries to a subset. This is where LlamaIndex shines vs raw vector stores.
```
documents.forEach((d) => { d.metadata = { ...d.metadata, tenant: 'acme' }; });
```

Stream responses

Streaming makes the app feel instant. Pipe tokens to your UI as they arrive.

const stream = await queryEngine.query({ query: 'Explain pricing', stream: true });
for await (const chunk of stream) process.stdout.write(chunk.response);

Common Pitfalls

!Re-loading documents on every dev run wastes embedding spend. Always persist storageContext.
!PDFs with scanned images need OCR first. LlamaIndex will not magically read them.
!similarityTopK too high (>10) drowns the LLM in irrelevant context.

From the Developers Digest stack

DevDigest Academy

Structured AI engineering courses with hands-on labs. Build production-ready apps faster.

Explore DevDigest Academy Watch on YouTube

What's Next

->Switch storage to Postgres + pgvector for production.
->Add a routing query engine to fan queries across multiple indices.
->Wrap the engine in a chat agent for multi-turn use.

Glossary

Compare Tools

LlamaIndex vs LangChain->

More Build a RAG pipeline

RAG

Step-by-Step

Install LlamaIndex.TS

The TypeScript SDK ships ESM-first. Make sure your tsconfig has module: nodenext.

pnpm add llamaindex

Load documents

SimpleDirectoryReader auto-detects file types. For PDFs, install pdf-parse alongside.

import { SimpleDirectoryReader, VectorStoreIndex } from 'llamaindex';

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({ directoryPath: './docs' });

Build the index

VectorStoreIndex.fromDocuments handles chunking, embedding, and vector storage in one call. Persist it to disk so you do not pay for re-embeds.

import { storageContextFromDefaults } from 'llamaindex';

const storageContext = await storageContextFromDefaults({ persistDir: './storage' });
const index = await VectorStoreIndex.fromDocuments(documents, { storageContext });

Create a query engine

The query engine wires retriever, response synthesizer, and post-processors together. Use similarityTopK to control recall.

const queryEngine = index.asQueryEngine({ similarityTopK: 5 });
const response = await queryEngine.query({ query: 'Summarize the onboarding flow' });
console.log(response.toString());

Add metadata filters

Tag documents at ingest time so you can scope queries to a subset. This is where LlamaIndex shines vs raw vector stores.

documents.forEach((d) => { d.metadata = { ...d.metadata, tenant: 'acme' }; });

Stream responses

Streaming makes the app feel instant. Pipe tokens to your UI as they arrive.

const stream = await queryEngine.query({ query: 'Explain pricing', stream: true });
for await (const chunk of stream) process.stdout.write(chunk.response);

Build a RAG pipeline with LlamaIndex

Prerequisites

Step-by-Step

Install LlamaIndex.TS

Load documents

Build the index

Create a query engine

Add metadata filters

Stream responses

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Build a RAG pipeline

LangChain

Pinecone

pgvector

Get Smarter About AI Dev

Build a RAG pipeline with LlamaIndex

Prerequisites

Step-by-Step

Install LlamaIndex.TS

Load documents

Build the index

Create a query engine

Add metadata filters

Stream responses

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Build a RAG pipeline

LangChain

Pinecone

pgvector

Get Smarter About AI Dev