How to Use LangChain with Pinecone: Step-by-Step Guide for Retrieval-Augmented Generation
LangChain and Pinecone are two powerful tools in the generative AI ecosystem. When combined, they enable developers to build advanced Retrieval-Augmented Generation (RAG) systems that allow large language models (LLMs) to reference external knowledge stored in vector databases.
This guide walks you through how to use LangChain with Pinecone to build scalable, context-aware applications.
Pinecone is a managed vector database that enables fast and scalable similarity search. It's ideal for storing and retrieving document embeddings in applications like semantic search, chatbots, and recommendation systems.
LangChain is a framework that simplifies LLM workflows. By integrating Pinecone as a vector store, you can:
Store semantic embeddings of your documents
Retrieve relevant information based on user queries
Feed retrieved content into an LLM for more accurate responses
This architecture is central to building Retrieval-Augmented Generation (RAG) pipelines.
Before getting started, you’ll need:
An OpenAI API key (or another embedding model key)
A Pinecone API key and environment
The following packages installed:
bashpip install langchain pinecone-client openai langchain-pinecone
Start by securely setting your API keys:
pythonimport os os.environ["OPENAI_API_KEY"] = "your-openai-key" os.environ["PINECONE_API_KEY"] = "your-pinecone-key" os.environ["PINECONE_ENVIRONMENT"] = "your-pinecone-environment"
pythonimport pinecone pinecone.init( api_key=os.getenv("PINECONE_API_KEY"), environment=os.getenv("PINECONE_ENVIRONMENT") ) index_name = "langchain-index" if index_name not in pinecone.list_indexes(): pinecone.create_index(name=index_name, dimension=1536, metric="cosine") index = pinecone.Index(index_name)
Note: Most OpenAI embeddings have a 1536-dimensional vector size.
pythonfrom langchain_openai import OpenAIEmbeddings from langchain.vectorstores import Pinecone embeddings = OpenAIEmbeddings() vectorstore = Pinecone(index, embeddings.embed_query, "text")
This wraps Pinecone in a LangChain-compatible interface.
Split your documents into chunks and upload them to Pinecone:
pythonfrom langchain.text_splitter import CharacterTextSplitter from langchain.schema import Document raw_texts = [ Document(page_content="Quantum computing uses qubits...", metadata={"source": "doc1"}), Document(page_content="Entanglement is a core quantum principle...", metadata={"source": "doc2"}) ] splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50) docs = splitter.split_documents(raw_texts) vectorstore.add_documents(docs)
pythonfrom langchain.chains import RetrievalQA from langchain.llms import OpenAI retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) llm = OpenAI(model_name="gpt-4o", temperature=0) qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, chain_type="stuff") response = qa_chain.run("What is quantum entanglement?") print(response)
LangChain uses Pinecone to retrieve the top matching documents and injects them into the LLM prompt for a more informed answer.
Control the relevance of results using score thresholds:
pythonretriever = vectorstore.as_retriever( search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.4, "k": 5} )
This prevents low-relevance documents from polluting your LLM prompt.
Step | What You Did |
---|---|
Set Environment Variables | Secured API keys for OpenAI and Pinecone |
Initialized Pinecone | Created and accessed a vector index |
Created Embeddings | Used OpenAI to embed documents |
Added Documents to Index | Split and stored document chunks |
Built QA Chain | Used LangChain to create a retrieval-based chain |
Queried System | Used LLM + retriever for accurate response |
To integrate Pinecone with LangChain, follow these essential steps:
Step 1: Initialize Pinecone
pythonimport pinecone pinecone.init(api_key="YOUR_API_KEY", environment="us-east-1")
Step 2: Create a Vector Index
Choose a name, dimension (e.g., 1536 for OpenAI's text-embedding-ada-002
), and similarity metric.
pythonpinecone.create_index("langchain-index", dimension=1536, metric="cosine")
Step 3: Connect to the Index
pythonindex = pinecone.Index("langchain-index")
Step 4: Use with LangChain
pythonfrom langchain.vectorstores import Pinecone from langchain.embeddings.openai import OpenAIEmbeddings embed_model = OpenAIEmbeddings() vectorstore = Pinecone(index, embed_model.embed_query, "text")
Choosing the right embedding model depends on your use case:
Model | Dimension | Best For | Cost |
---|---|---|---|
text-embedding-ada-002 (OpenAI) |
1536 | General-purpose, high quality | Low |
e5-base-v2 (Hugging Face) |
768 | Open-source, good performance | Free (self-hosted) |
Cohere embed-v3 |
1024 | High-quality semantic embeddings | Medium |
Tips:
Use Ada-002 if you're on OpenAI already.
Use E5 or Cohere if you want to reduce dependency or cost.
Ensure your Pinecone index matches the model's embedding dimension.
Key options that influence speed and relevance:
Similarity Metric: cosine
(recommended), dotproduct
, or euclidean
.
Dimension Size: Higher dimensions may increase retrieval quality but affect performance.
Index Type:
Serverless: Fast, scalable, great for real-time use.
Pod-based: Customizable performance with higher throughput.
k (top_k): Number of results returned (e.g., k=5
).
Score Threshold: Filter out weak matches by setting a minimum similarity score.
To control cost effectively:
Batch Embeddings: Group multiple documents in a single API call.
Avoid Reprocessing: Don’t re-embed unchanged content.
Use Short Chunks: Optimal chunk size (e.g., 300–500 tokens) helps balance accuracy and API usage.
Consider Open-Source Models: Use E5 or MiniLM for lower/no cost alternatives.
Limit Retrievals: Use low k
and set score thresholds to reduce Pinecone usage.
Dimension Mismatch: The vector dimension must match the embedding model’s output (e.g., 1536 for Ada).
Incorrect API Init: Always initialize with pinecone.init()
before using any index.
Forgetting to Upsert: Ensure data is inserted (via .upsert
) before retrieval.
Model Drift: Be consistent with embedding models across indexing and querying.
Index Not Ready: Wait for the index to become active before querying.
Overloaded Index: Avoid uploading too many documents at once; batch them.
Task | Tip/Recommendation |
---|---|
Index Setup | Use cosine , 1536 dim for Ada |
Embedding Model | Use OpenAI Ada or E5-base-v2 |
Performance Tuning | Adjust k , use thresholds, select serverless |
Cost Optimization | Batch embeddings, avoid re-embedding |
Avoid Pitfalls | Ensure correct dimension, upsert, model sync |
By integrating LangChain with Pinecone, you gain access to a powerful RAG architecture that:
Enhances LLM accuracy with external knowledge
Enables semantic search and personalized retrieval
Scales reliably with your growing data
This combination is ideal for building chatbots, AI assistants, knowledge systems, and production-grade LLM apps.