Pinecone Adapter¶
Fully managed serverless vector database for production without infrastructure management.
Overview¶
The Pinecone adapter provides persistent vector storage using Pinecone, a fully managed serverless vector database. Perfect for production when you want performance without managing infrastructure.
Key Features: - ✓ Fully managed service - ✓ Serverless architecture - ✓ Global deployment - ✓ Auto-scaling - ✓ High availability built-in - ✓ No infrastructure management - ✓ Production-ready out of the box
Installation¶
# Install Pinecone client
pip install pinecone-client>=3.0.0
# Or with axon-sdk
pip install "axon-sdk[all]"
# Get API key from: https://app.pinecone.io
Basic Usage¶
from axon import MemorySystem
from axon.core.config import MemoryConfig
from axon.core.policies import PersistentPolicy
config = MemoryConfig(
persistent=PersistentPolicy(
adapter_type="pinecone",
compaction_threshold=20000
)
)
memory = MemorySystem(config)
# Store with serverless scaling
await memory.store("Production knowledge", importance=0.8)
Configuration¶
API Key Setup¶
from axon.adapters.pinecone import PineconeAdapter
# Basic configuration
adapter = PineconeAdapter(
api_key="your-api-key",
index_name="memories",
environment="us-east1-gcp" # or your preferred region
)
Environment Variables¶
export PINECONE_API_KEY=your-api-key
export PINECONE_ENVIRONMENT=us-east1-gcp
export PINECONE_INDEX=memories
Using with Templates¶
from axon.core.templates import PINECONE_CONFIG
# PINECONE_CONFIG uses Pinecone for persistent tier
memory = MemorySystem(PINECONE_CONFIG)
Features¶
Serverless Architecture¶
No servers to manage:
# Pinecone handles all infrastructure
# - Auto-scaling
# - Load balancing
# - High availability
# - Backups
# - Monitoring
# You just use it
results = await memory.recall(
"query",
k=50,
tier="persistent"
)
Global Deployment¶
Choose your region:
# US East (GCP)
adapter = PineconeAdapter(
api_key="key",
index_name="memories",
environment="us-east1-gcp"
)
# EU West (AWS)
adapter = PineconeAdapter(
api_key="key",
index_name="memories",
environment="eu-west1-aws"
)
# Asia Pacific (GCP)
adapter = PineconeAdapter(
api_key="key",
index_name="memories",
environment="asia-southeast1-gcp"
)
Metadata Filtering¶
Advanced filtering capabilities:
from axon.models.filter import Filter
results = await memory.recall(
"query",
filter=Filter(
tags=["verified"],
min_importance=0.7,
metadata={"category": "technical", "verified": True}
),
k=20
)
Namespaces¶
Multi-tenancy support:
# Different namespaces for isolation
tenant1_adapter = PineconeAdapter(
api_key="key",
index_name="memories",
namespace="tenant_123"
)
tenant2_adapter = PineconeAdapter(
api_key="key",
index_name="memories",
namespace="tenant_456"
)
Use Cases¶
✅ Perfect For¶
- Persistent Tier: Production knowledge base
- Startups and MVPs (fast time-to-market)
- Applications requiring global reach
- Teams without DevOps resources
- High-availability requirements
- Unpredictable scaling needs
- Cost-effective at small-medium scale
❌ Not Suitable For¶
- Cost-sensitive at massive scale (>10M vectors)
- On-premise requirements
- Full control over infrastructure
- Ephemeral/session tiers (use Redis)
Examples¶
Production Knowledge Base¶
# Serverless knowledge base
for i in range(100000):
await memory.store(
f"Knowledge entry {i}: {content}",
importance=0.8,
tier="persistent",
tags=["knowledge", category]
)
# Global fast search
results = await memory.recall(
"What is machine learning?",
k=50,
tier="persistent"
)
Multi-Tenant SaaS¶
from axon.adapters.pinecone import PineconeAdapter
# Tenant isolation with namespaces
class TenantMemory:
def __init__(self, tenant_id: str):
self.adapter = PineconeAdapter(
api_key="key",
index_name="saas_memories",
namespace=f"tenant_{tenant_id}"
)
config = MemoryConfig(
persistent=PersistentPolicy(adapter=self.adapter)
)
self.memory = MemorySystem(config)
async def store(self, text: str, **kwargs):
return await self.memory.store(text, **kwargs)
async def recall(self, query: str, **kwargs):
return await self.memory.recall(query, **kwargs)
# Usage
tenant_123 = TenantMemory("123")
await tenant_123.store("Tenant-specific data")
RAG Application¶
# Serverless RAG with Pinecone
async def build_rag_system(documents: list[str]):
# Ingest documents
for doc in documents:
await memory.store(
doc,
importance=0.8,
tier="persistent"
)
# Query function
async def answer(question: str) -> str:
# Retrieve context (serverless, auto-scaled)
context = await memory.recall(
question,
k=5,
tier="persistent"
)
# Generate answer
context_text = "\n".join([c.text for c in context])
return await llm.generate(
f"Context:\n{context_text}\n\nQuestion: {question}"
)
return answer
Performance¶
| Operation | Latency | Throughput | Scale |
|---|---|---|---|
| save() | 20-150ms | 200-1000 ops/sec | Unlimited |
| query() | 20-100ms | 100-500 ops/sec | Unlimited |
| get() | 20-100ms | 200-1000 ops/sec | Unlimited |
| delete() | 20-100ms | 200-1000 ops/sec | Unlimited |
Note: Latency includes network overhead. Auto-scales to handle traffic spikes.
Production Deployment¶
Application Setup¶
# app.py - Production configuration
import os
from axon import MemorySystem
from axon.core.templates import PINECONE_CONFIG
# Load API key from environment
os.environ['PINECONE_API_KEY'] = os.getenv('PINECONE_API_KEY')
# Initialize memory system
memory = MemorySystem(PINECONE_CONFIG)
# Use in your application
@app.route('/store', methods=['POST'])
async def store_memory():
text = request.json['text']
await memory.store(text, importance=0.8)
return {'status': 'success'}
@app.route('/recall', methods=['POST'])
async def recall_memories():
query = request.json['query']
results = await memory.recall(query, k=10)
return {'results': [r.dict() for r in results]}
Environment Configuration¶
# .env file
PINECONE_API_KEY=your-production-api-key
PINECONE_ENVIRONMENT=us-east1-gcp
PINECONE_INDEX=prod-memories
# Load in application
from dotenv import load_dotenv
load_dotenv()
Docker Deployment¶
# Dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
ENV PINECONE_API_KEY=${PINECONE_API_KEY}
CMD ["python", "app.py"]
# docker-compose.yml
version: '3.8'
services:
app:
build: .
environment:
- PINECONE_API_KEY=${PINECONE_API_KEY}
- PINECONE_ENVIRONMENT=us-east1-gcp
- PINECONE_INDEX=memories
ports:
- "8000:8000"
Best Practices¶
1. Use for Persistent Tier¶
# ✓ Good: Managed persistent storage
persistent=PersistentPolicy(adapter_type="pinecone")
# ✗ Bad: Expensive for ephemeral (use Redis)
ephemeral=EphemeralPolicy(adapter_type="pinecone")
2. Optimize Batch Operations¶
# Batch upserts for better throughput
entries = [
MemoryEntry(text=f"Entry {i}", ...)
for i in range(100)
]
# Pinecone batches internally, but still batch on your end
for batch in chunks(entries, 100):
for entry in batch:
await adapter.save(entry)
3. Use Namespaces for Isolation¶
# Multi-tenant isolation
tenant_adapter = PineconeAdapter(
api_key="key",
index_name="memories",
namespace=f"tenant_{tenant_id}" # Isolated namespace
)
4. Monitor Usage¶
# Check index stats
from pinecone import Pinecone
pc = Pinecone(api_key="key")
index = pc.Index("memories")
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
print(f"Namespaces: {stats.namespaces}")
Troubleshooting¶
API Key Issues¶
# Test connection
from pinecone import Pinecone
pc = Pinecone(api_key="your-key")
print(pc.list_indexes()) # Should list indexes
# If fails, check:
# 1. API key is correct
# 2. API key has permissions
# 3. Network allows outbound HTTPS
Index Not Found¶
# Create index if needed
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="key")
# Check if exists
indexes = pc.list_indexes()
if "memories" not in [i.name for i in indexes]:
# Create index
pc.create_index(
name="memories",
dimension=1536, # Match your embedding model
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
Slow Queries¶
# Add more specific filters
results = await memory.recall(
"query",
filter=Filter(
tags=["specific"], # Reduces search space
metadata={"category": "narrow"}
),
k=10 # Fewer results
)
# Or use namespaces for partitioning
adapter = PineconeAdapter(
api_key="key",
index_name="memories",
namespace="specific_partition"
)
Migration¶
From Qdrant to Pinecone¶
# Export from Qdrant
qdrant_memory = MemorySystem(QDRANT_CONFIG)
entries = await qdrant_memory.export(tier="persistent")
# Import to Pinecone
pinecone_config = MemoryConfig(
persistent=PersistentPolicy(adapter_type="pinecone")
)
pinecone_memory = MemorySystem(pinecone_config)
await pinecone_memory.import_data(entries, tier="persistent")
From ChromaDB to Pinecone¶
# Export from ChromaDB
chroma_memory = MemorySystem(STANDARD_CONFIG)
entries = await chroma_memory.export(tier="persistent")
# Import to Pinecone (with batching)
pinecone_memory = MemorySystem(PINECONE_CONFIG)
for batch in chunks(entries, 100):
await pinecone_memory.import_data(batch, tier="persistent")
print(f"Imported {len(batch)} entries")
Cost Analysis¶
Pricing Model (as of 2024)¶
| Plan | Monthly Cost | Included | Per Vector/Month |
|---|---|---|---|
| Starter | $0 | 100K vectors | Free |
| Standard | $70 | 100K vectors | $0.0012 |
| Enterprise | Custom | Custom | Discounted |
Cost Optimization¶
# Estimate costs
def estimate_pinecone_cost(num_vectors: int) -> float:
"""Estimate monthly Pinecone cost."""
if num_vectors <= 100000:
return 0 # Free tier
# Standard tier
base_cost = 70 # First 100K included
extra_vectors = num_vectors - 100000
extra_cost = extra_vectors * 0.0012
return base_cost + extra_cost
# Examples
print(f"1M vectors: ${estimate_pinecone_cost(1_000_000)}/month")
# Output: $1,150/month
print(f"10M vectors: ${estimate_pinecone_cost(10_000_000)}/month")
# Output: $11,950/month
Cost vs Qdrant¶
| Scale | Pinecone | Qdrant (Self-Hosted) | Winner |
|---|---|---|---|
| 100K vectors | $0 | $50 | Pinecone |
| 1M vectors | $1,150 | $100 | Qdrant |
| 10M vectors | $11,950 | $500 | Qdrant |
Recommendation: - < 1M vectors: Pinecone (simplicity + free tier) - > 1M vectors: Consider Qdrant (cost-effective at scale)
Comparison¶
Pinecone vs Other Adapters¶
| Feature | Pinecone | Qdrant | ChromaDB | Redis |
|---|---|---|---|---|
| Management | Fully managed | Self-hosted | Embedded | Self/Managed |
| Setup Time | < 5 min | 30+ min | < 1 min | 10-30 min |
| Scaling | Auto | Manual | Single node | Manual |
| Cost (1M) | $1,150/mo | $100/mo | Free | $50/mo |
| Global | Yes | Manual | No | Yes |
| Best For | Startups | Large scale | Development | Caching |
Next Steps¶
-
Custom Adapter
Build your own storage adapter.
-
Performance Comparison
Compare all adapter performance.
-
Production Deployment
Deploy to production with best practices.