RAG

Build a RAG pipeline with Pinecone

Pinecone is a managed vector database built for scale. You get sub-100ms queries over billions of vectors without operating any infra.

Prerequisites

+Pinecone account and API key
+Node 20+
+OpenAI API key for embeddings

Step-by-Step

1
Create a serverless index
Serverless indices auto-scale and bill per query. Pick the dimension to match your embedding model (1536 for text-embedding-3-small).
```
pnpm add @pinecone-database/pinecone openai
```

Initialize the client

Use a single client instance for the lifetime of your app.

import { Pinecone } from '@pinecone-database/pinecone';
const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pc.index('docs');

Embed and upsert

Batch upserts in groups of 100 vectors. Always include rich metadata so you can filter at query time.

import OpenAI from 'openai';
const openai = new OpenAI();
const { data } = await openai.embeddings.create({ model: 'text-embedding-3-small', input: chunks.map((c) => c.text) });
await index.upsert(
  data.map((d, i) => ({ id: chunks[i].id, values: d.embedding, metadata: { source: chunks[i].source } }))
);

Query with filters

Metadata filters narrow results before vector similarity runs - critical for multi-tenant apps.

const q = await openai.embeddings.create({ model: 'text-embedding-3-small', input: 'reset password' });
const res = await index.query({ vector: q.data[0].embedding, topK: 5, filter: { source: { $eq: 'help-docs' } }, includeMetadata: true });

5
Add hybrid search
Pinecone supports sparse-dense hybrid via SPLADE or BM25 vectors. Use it when keyword matches matter (proper nouns, codes).
6
Monitor and tune
Watch the Pinecone dashboard for read units consumed. If costs balloon, lower topK or add a cache layer in front.

Common Pitfalls

!Wrong dimension at index creation. You cannot change it without rebuilding.
!Putting raw text in metadata blows past the 40KB metadata limit.
!Free pods pause after 7 days inactive. Use serverless if you need always-on.

From the Developers Digest stack

DevDigest Academy

Structured AI engineering courses with hands-on labs. Build production-ready apps faster.

Explore DevDigest Academy Watch on YouTube

What's Next

->Add a reranker to squeeze more relevance out of topK results.
->Set up a nightly job to re-embed docs that drifted.

Glossary

Compare Tools

More Build a RAG pipeline

RAG

Step-by-Step

Create a serverless index

Serverless indices auto-scale and bill per query. Pick the dimension to match your embedding model (1536 for text-embedding-3-small).

pnpm add @pinecone-database/pinecone openai

Initialize the client

Use a single client instance for the lifetime of your app.

import { Pinecone } from '@pinecone-database/pinecone';
const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pc.index('docs');

Embed and upsert

Batch upserts in groups of 100 vectors. Always include rich metadata so you can filter at query time.

import OpenAI from 'openai';
const openai = new OpenAI();
const { data } = await openai.embeddings.create({ model: 'text-embedding-3-small', input: chunks.map((c) => c.text) });
await index.upsert(
  data.map((d, i) => ({ id: chunks[i].id, values: d.embedding, metadata: { source: chunks[i].source } }))
);

Query with filters

Metadata filters narrow results before vector similarity runs - critical for multi-tenant apps.

const q = await openai.embeddings.create({ model: 'text-embedding-3-small', input: 'reset password' });
const res = await index.query({ vector: q.data[0].embedding, topK: 5, filter: { source: { $eq: 'help-docs' } }, includeMetadata: true });

Add hybrid search

Pinecone supports sparse-dense hybrid via SPLADE or BM25 vectors. Use it when keyword matches matter (proper nouns, codes).

Monitor and tune

Watch the Pinecone dashboard for read units consumed. If costs balloon, lower topK or add a cache layer in front.

Prerequisites

Step-by-Step

Create a serverless index

Initialize the client

Embed and upsert

Query with filters

Add hybrid search

Monitor and tune

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Build a RAG pipeline

LangChain

LlamaIndex

pgvector

Get Smarter About AI Dev

Prerequisites

Step-by-Step

Create a serverless index

Initialize the client

Embed and upsert

Query with filters

Add hybrid search

Monitor and tune

Common Pitfalls

DevDigest Academy

What's Next

Glossary

Compare Tools

More Build a RAG pipeline

LangChain

LlamaIndex

pgvector

Get Smarter About AI Dev