Build a RAG pipeline with LangChain
LangChain gives you battle-tested abstractions for retrieval-augmented generation: document loaders, text splitters, vector stores, and retrievers. This walkthrough builds a working RAG pipeline in TypeScript that ingests your docs and answers questions over them with citations.
Prerequisites
- +Node 20+ and pnpm or npm
- +An OpenAI or Anthropic API key in .env
- +A folder of markdown or PDF documents to index
Step-by-Step
- 1
Install LangChain and a vector store
Pull in the core packages plus a local vector store so you can run end-to-end without external infra.
pnpm add langchain @langchain/openai @langchain/community pnpm add hnswlib-node - 2
Load and split your documents
Document loaders normalize input formats. The recursive splitter keeps semantic chunks roughly even and preserves overlap so citations stay coherent.
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory'; import { TextLoader } from 'langchain/document_loaders/fs/text'; import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'; const loader = new DirectoryLoader('./docs', { '.md': (p) => new TextLoader(p) }); const docs = await loader.load(); const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 150 }); const chunks = await splitter.splitDocuments(docs); - 3
Embed and store vectors
Embed each chunk and persist to disk. Cache the index so repeat dev runs do not re-embed the same content.
import { OpenAIEmbeddings } from '@langchain/openai'; import { HNSWLib } from '@langchain/community/vectorstores/hnswlib'; const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' }); const store = await HNSWLib.fromDocuments(chunks, embeddings); await store.save('./index'); - 4
Wire up the retrieval chain
Compose the retriever with a chat model. Pass top-k chunks into the prompt and ask the LLM to cite source filenames inline.
import { ChatOpenAI } from '@langchain/openai'; import { createRetrievalChain } from 'langchain/chains/retrieval'; import { createStuffDocumentsChain } from 'langchain/chains/combine_documents'; import { ChatPromptTemplate } from '@langchain/core/prompts'; const retriever = store.asRetriever({ k: 4 }); const llm = new ChatOpenAI({ model: 'gpt-4o-mini', temperature: 0 }); const prompt = ChatPromptTemplate.fromTemplate( 'Answer using only the context. Cite sources by filename.\n\nContext: {context}\n\nQ: {input}' ); const combine = await createStuffDocumentsChain({ llm, prompt }); const chain = await createRetrievalChain({ retriever, combineDocsChain: combine }); - 5
Query and verify citations
Run a real question and inspect the returned context. If citations look off, tune chunk size and k before reaching for a reranker.
const res = await chain.invoke({ input: 'What is our refund policy?' }); console.log(res.answer); console.log(res.context.map((d) => d.metadata.source)); - 6
Productionize
Swap HNSWLib for a hosted vector DB once your corpus exceeds a few hundred MB. Add a cache for embeddings and a reranker for harder queries.
Common Pitfalls
- !Chunks too small (under 300 chars) wreck recall. Aim for 800-1200 with 100-200 overlap.
- !Forgetting to deduplicate near-identical chunks bloats your index and confuses retrieval.
- !Embedding model and query model must match. Mixing models silently degrades results.
- !No eval set means you cannot measure quality. Build a 50-question gold set before tuning.
DevDigest Academy
Structured AI engineering courses with hands-on labs. Build production-ready apps faster.
What's Next
- ->Add a reranker like Cohere Rerank for multi-hop queries.
- ->Move to a hosted vector store once you cross 100k chunks.
- ->Layer in conversational memory for multi-turn chats.
