RAG

Build a RAG pipeline with LangChain

LangChain gives you battle-tested abstractions for retrieval-augmented generation: document loaders, text splitters, vector stores, and retrievers. This walkthrough builds a working RAG pipeline in TypeScript that ingests your docs and answers questions over them with citations.

Prerequisites

+Node 20+ and pnpm or npm
+An OpenAI or Anthropic API key in .env
+A folder of markdown or PDF documents to index

Step-by-Step

1
Install LangChain and a vector store
Pull in the core packages plus a local vector store so you can run end-to-end without external infra.
```
pnpm add langchain @langchain/openai @langchain/community
pnpm add hnswlib-node
```

Load and split your documents

Document loaders normalize input formats. The recursive splitter keeps semantic chunks roughly even and preserves overlap so citations stay coherent.

import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
import { TextLoader } from 'langchain/document_loaders/fs/text';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

const loader = new DirectoryLoader('./docs', { '.md': (p) => new TextLoader(p) });
const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 150 });
const chunks = await splitter.splitDocuments(docs);

Embed and store vectors

Embed each chunk and persist to disk. Cache the index so repeat dev runs do not re-embed the same content.

import { OpenAIEmbeddings } from '@langchain/openai';
import { HNSWLib } from '@langchain/community/vectorstores/hnswlib';

const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });
const store = await HNSWLib.fromDocuments(chunks, embeddings);
await store.save('./index');

Wire up the retrieval chain

Compose the retriever with a chat model. Pass top-k chunks into the prompt and ask the LLM to cite source filenames inline.

import { ChatOpenAI } from '@langchain/openai';
import { createRetrievalChain } from 'langchain/chains/retrieval';
import { createStuffDocumentsChain } from 'langchain/chains/combine_documents';
import { ChatPromptTemplate } from '@langchain/core/prompts';

const retriever = store.asRetriever({ k: 4 });
const llm = new ChatOpenAI({ model: 'gpt-4o-mini', temperature: 0 });
const prompt = ChatPromptTemplate.fromTemplate(
  'Answer using only the context. Cite sources by filename.\n\nContext: {context}\n\nQ: {input}'
);
const combine = await createStuffDocumentsChain({ llm, prompt });
const chain = await createRetrievalChain({ retriever, combineDocsChain: combine });

5
Query and verify citations
Run a real question and inspect the returned context. If citations look off, tune chunk size and k before reaching for a reranker.
```
const res = await chain.invoke({ input: 'What is our refund policy?' });
console.log(res.answer);
console.log(res.context.map((d) => d.metadata.source));
```
6
Productionize
Swap HNSWLib for a hosted vector DB once your corpus exceeds a few hundred MB. Add a cache for embeddings and a reranker for harder queries.

Common Pitfalls

!Chunks too small (under 300 chars) wreck recall. Aim for 800-1200 with 100-200 overlap.
!Forgetting to deduplicate near-identical chunks bloats your index and confuses retrieval.
!Embedding model and query model must match. Mixing models silently degrades results.
!No eval set means you cannot measure quality. Build a 50-question gold set before tuning.

From the Developers Digest stack

DevDigest Academy

Structured AI engineering courses with hands-on labs. Build production-ready apps faster.

Explore DevDigest Academy Watch on YouTube

What's Next

->Add a reranker like Cohere Rerank for multi-hop queries.
->Move to a hosted vector store once you cross 100k chunks.
->Layer in conversational memory for multi-turn chats.

Glossary

Compare Tools

More Build a RAG pipeline

RAG

Step-by-Step

Install LangChain and a vector store

Pull in the core packages plus a local vector store so you can run end-to-end without external infra.

pnpm add langchain @langchain/openai @langchain/community
pnpm add hnswlib-node

Load and split your documents

Document loaders normalize input formats. The recursive splitter keeps semantic chunks roughly even and preserves overlap so citations stay coherent.

import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
import { TextLoader } from 'langchain/document_loaders/fs/text';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

const loader = new DirectoryLoader('./docs', { '.md': (p) => new TextLoader(p) });
const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 150 });
const chunks = await splitter.splitDocuments(docs);

Embed and store vectors

Embed each chunk and persist to disk. Cache the index so repeat dev runs do not re-embed the same content.

import { OpenAIEmbeddings } from '@langchain/openai';
import { HNSWLib } from '@langchain/community/vectorstores/hnswlib';

const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });
const store = await HNSWLib.fromDocuments(chunks, embeddings);
await store.save('./index');

Wire up the retrieval chain

Compose the retriever with a chat model. Pass top-k chunks into the prompt and ask the LLM to cite source filenames inline.

import { ChatOpenAI } from '@langchain/openai';
import { createRetrievalChain } from 'langchain/chains/retrieval';
import { createStuffDocumentsChain } from 'langchain/chains/combine_documents';
import { ChatPromptTemplate } from '@langchain/core/prompts';

const retriever = store.asRetriever({ k: 4 });
const llm = new ChatOpenAI({ model: 'gpt-4o-mini', temperature: 0 });
const prompt = ChatPromptTemplate.fromTemplate(
  'Answer using only the context. Cite sources by filename.\n\nContext: {context}\n\nQ: {input}'
);
const combine = await createStuffDocumentsChain({ llm, prompt });
const chain = await createRetrievalChain({ retriever, combineDocsChain: combine });

Query and verify citations

Run a real question and inspect the returned context. If citations look off, tune chunk size and k before reaching for a reranker.

const res = await chain.invoke({ input: 'What is our refund policy?' });
console.log(res.answer);
console.log(res.context.map((d) => d.metadata.source));

Productionize

Swap HNSWLib for a hosted vector DB once your corpus exceeds a few hundred MB. Add a cache for embeddings and a reranker for harder queries.

Common Pitfalls

!Chunks too small (under 300 chars) wreck recall. Aim for 800-1200 with 100-200 overlap.
!Forgetting to deduplicate near-identical chunks bloats your index and confuses retrieval.
!Embedding model and query model must match. Mixing models silently degrades results.
!No eval set means you cannot measure quality. Build a 50-question gold set before tuning.

Build a RAG pipeline with LangChain

Prerequisites

Step-by-Step

Install LangChain and a vector store

Load and split your documents

Embed and store vectors

Wire up the retrieval chain

Query and verify citations

Productionize

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Build a RAG pipeline

LlamaIndex

Pinecone

pgvector

Get Smarter About AI Dev

Build a RAG pipeline with LangChain

Prerequisites

Step-by-Step

Install LangChain and a vector store

Load and split your documents

Embed and store vectors

Wire up the retrieval chain

Query and verify citations

Productionize

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Build a RAG pipeline

LlamaIndex

Pinecone

pgvector

Get Smarter About AI Dev