LangChain Integration
arango-typed provides seamless integration with LangChain, enabling you to build powerful RAG (Retrieval-Augmented Generation) systems, vector stores, and AI applications using ArangoDB as your vector database backend.
What is LangChain Integration?
LangChain is a framework for developing applications powered by language models. arango-typed's LangChain integration provides:
- VectorStore Implementation: Compatible with LangChain's VectorStore interface for storing and retrieving embeddings
- RAG Support: Built-in RAG (Retrieval-Augmented Generation) implementation for context-aware AI applications
- MCP Support: Model Context Protocol implementation for unified LLM interactions
- Retriever Interface: LangChain-compatible retrievers for use in chains
Prerequisites
Before using LangChain integration, ensure you have:
- arango-typed and arangojs installed
- An embeddings provider (OpenAI, HuggingFace, local models, etc.)
- LangChain core packages installed
- ArangoDB instance running and connected
npm install arango-typed arangojs
npm install @langchain/core @langchain/textsplitters
# Optional: For OpenAI embeddings
npm install @langchain/openai
# Optional: For other embedding providers
npm install @langchain/communityInstallation and Setup
First, connect to your ArangoDB database:
import { connect, getDatabase } from 'arango-typed';
// Connect to ArangoDB
await connect({
url: 'http://localhost:8529',
database: 'myapp',
username: 'root',
password: ''
});
const db = getDatabase();ArangoLangChainStore - VectorStore Implementation
ArangoLangChainStore is a LangChain-compatible VectorStore implementation that uses ArangoDB for storing and retrieving document embeddings.
Creating a VectorStore
You can create a vector store in several ways:
import { ArangoLangChainStore } from 'arango-typed/integrations/langchain';
import { OpenAIEmbeddings } from '@langchain/openai';
import { getDatabase } from 'arango-typed';
const db = getDatabase();
// Option 1: Create from texts
const store = await ArangoLangChainStore.fromTexts(
['Document 1 text', 'Document 2 text'],
[{ source: 'doc1' }, { source: 'doc2' }],
new OpenAIEmbeddings({ openAIApiKey: 'your-key' }),
{ database: db, collectionName: 'documents' }
);
// Option 2: Create from documents
const documents = [
{ pageContent: 'Document 1', metadata: { source: 'doc1' } },
{ pageContent: 'Document 2', metadata: { source: 'doc2' } }
];
const store2 = await ArangoLangChainStore.fromDocuments(
documents,
new OpenAIEmbeddings({ openAIApiKey: 'your-key' }),
{ database: db, collectionName: 'documents' }
);
// Option 3: Create with custom options
const store3 = new ArangoLangChainStore(
new OpenAIEmbeddings({ openAIApiKey: 'your-key' }),
{ database: db, collectionName: 'documents' },
{
vectorField: 'embedding', // Field name for embeddings (default: 'embedding')
textField: 'text', // Field name for text content (default: 'text')
metadataFields: ['source', 'author'] // Fields to preserve as metadata
}
);Adding Documents
Add documents to the vector store:
// Add documents (embeddings generated automatically)
const ids = await store.addDocuments([
{ pageContent: 'New document text', metadata: { source: 'new-doc' } }
]);
// Add vectors directly (if you already have embeddings)
const vectors = [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]];
const ids2 = await store.addVectors(vectors, [
{ pageContent: 'Document 1', metadata: {} },
{ pageContent: 'Document 2', metadata: {} }
]);Similarity Search
Search for similar documents:
// Basic similarity search
const results = await store.similaritySearch('query text', 5);
// Similarity search with scores
const resultsWithScores = await store.similaritySearchWithScore('query text', 5);
// Similarity search with metadata filtering
const filteredResults = await store.similaritySearch(
'query text',
5,
{ source: 'doc1', category: 'tech' }
);Using with Models
You can use arango-typed Models with the vector store for automatic validation and hooks:
import { model, Schema } from 'arango-typed';
const DocumentSchema = new Schema({
text: { type: String, required: true },
embedding: { type: Array, required: true },
source: String,
metadata: Object
});
const Document = model('documents', DocumentSchema);
const storeWithModel = new ArangoLangChainStore(
new OpenAIEmbeddings({ openAIApiKey: 'your-key' }),
{
database: db,
collectionName: 'documents',
model: Document // Use model for automatic validation
}
);ArangoRAG - RAG Implementation
ArangoRAG provides a complete RAG (Retrieval-Augmented Generation) implementation with support for reranking, hybrid search, and metadata filtering.
Creating a RAG Instance
import { ArangoRAG } from 'arango-typed/integrations/langchain';
import { OpenAIEmbeddings } from '@langchain/openai';
const rag = new ArangoRAG(
new OpenAIEmbeddings({ openAIApiKey: 'your-key' }),
db,
{
collectionName: 'documents',
vectorField: 'embedding',
textField: 'text',
topK: 5, // Number of documents to retrieve
scoreThreshold: 0.7, // Minimum similarity score
reranker: async (docs) => { // Optional reranker function
// Custom reranking logic
return docs.sort((a, b) => /* your logic */);
}
}
);Retrieving Documents
// Basic retrieval
const documents = await rag.retrieve('user query');
// Retrieval with metadata filtering
const filteredDocs = await rag.retrieveWithMetadata(
'user query',
{ category: 'tech', author: 'John' }
);
// Hybrid search (vector + keyword)
const hybridResults = await rag.hybridRetrieve(
'user query',
'keyword search terms',
{ category: 'tech' }
);Creating a Retriever for LangChain Chains
// Create a retriever
const retriever = rag.createRetriever({ category: 'tech' });
// Use in LangChain chain
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { ChatOpenAI } from '@langchain/openai';
const prompt = ChatPromptTemplate.fromTemplate(`
Use the following context to answer the question:
{context}
Question: {question}
`);
const chain = prompt
.pipe(retriever)
.pipe(new ChatOpenAI({ modelName: 'gpt-4' }));
const result = await chain.invoke({ question: 'What is TypeScript?' });ArangoMCP - Model Context Protocol
ArangoMCP provides a unified interface for LLMs to interact with ArangoDB, combining vector search and graph capabilities.
Creating an MCP Instance
import { ArangoMCP } from 'arango-typed/integrations/langchain';
// Basic MCP (vector search only)
const mcp = new ArangoMCP(db);
// MCP with graph support
const mcpWithGraph = new ArangoMCP(db, 'myGraph');Getting Context
// Get context with embeddings
const context = await mcp.getContext({
query: 'user question',
embeddings: [0.1, 0.2, 0.3, ...], // Query embeddings
metadata: { category: 'tech' },
graphTraversal: true // Enable graph context if graph is configured
});
// Get context with graph paths
const pathContext = await mcpWithGraph.getContextWithPaths(
'users/123', // Start vertex
'users/456', // End vertex (optional)
3 // Max depth
);Storing Context
// Store single context
const docId = await mcp.storeContext(
'context_collection',
'document text',
[0.1, 0.2, 0.3, ...], // Embeddings
{ source: 'doc1', category: 'tech' }
);
// Batch store contexts
const docIds = await mcp.storeContexts(
'context_collection',
[
{ text: 'Doc 1', embeddings: [0.1, 0.2], metadata: { source: 'doc1' } },
{ text: 'Doc 2', embeddings: [0.3, 0.4], metadata: { source: 'doc2' } }
]
);
// Update context
await mcp.updateContext('context_collection', docId, {
updatedAt: new Date(),
category: 'updated-category'
});
// Delete context
await mcp.deleteContext('context_collection', docId);Complete RAG Example
Here's a complete example of building a RAG system:
import { connect, getDatabase, model, Schema } from 'arango-typed';
import { ArangoRAG } from 'arango-typed/integrations/langchain';
import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
import { ArangoLangChainStore } from 'arango-typed/integrations/langchain';
// 1. Connect to database
await connect({
url: 'http://localhost:8529',
database: 'rag_app',
username: 'root',
password: ''
});
const db = getDatabase();
// 2. Create embeddings instance
const embeddings = new OpenAIEmbeddings({ openAIApiKey: process.env.OPENAI_API_KEY });
// 3. Create document schema
const DocumentSchema = new Schema({
text: { type: String, required: true },
embedding: { type: Array, required: true },
source: String,
metadata: Object,
createdAt: { type: Date, default: Date.now }
});
const Document = model('documents', DocumentSchema);
// 4. Index documents
async function indexDocuments(texts: string[], metadata: Record[]) {
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 150
});
const allChunks: string[] = [];
for (const text of texts) {
const chunks = await splitter.splitText(text);
allChunks.push(...chunks);
}
const documents = allChunks.map((chunk, i) => ({
pageContent: chunk,
metadata: metadata[i] || {}
}));
const store = await ArangoLangChainStore.fromDocuments(
documents,
embeddings,
{ database: db, collectionName: 'documents', model: Document }
);
return store;
}
// 5. Create RAG instance
const rag = new ArangoRAG(embeddings, db, {
collectionName: 'documents',
topK: 5,
scoreThreshold: 0.7
});
// 6. Create LangChain chain
const prompt = ChatPromptTemplate.fromTemplate(`
You are a helpful assistant. Use the following context to answer the question.
If you don't know the answer, say so.
Context:
{context}
Question: {question}
Answer:
`);
const llm = new ChatOpenAI({
modelName: 'gpt-4',
temperature: 0.7
});
const retriever = rag.createRetriever();
// 7. Query function
async function askQuestion(question: string) {
// Retrieve relevant documents
const docs = await retriever.getRelevantDocuments(question);
const context = docs.map(d => d.pageContent).join('\n\n');
// Generate answer
const result = await prompt.pipe(llm).invoke({ context, question });
return result.content;
}
// Usage
const answer = await askQuestion('What is TypeScript?');
console.log(answer); Advanced Usage
Custom Embeddings Provider
You can use any embeddings provider that implements the LangChainEmbeddings interface:
interface LangChainEmbeddings {
embedDocuments(texts: string[]): Promise;
embedQuery(text: string): Promise;
}
// Example: Custom embeddings
class CustomEmbeddings implements LangChainEmbeddings {
async embedDocuments(texts: string[]): Promise {
// Your embedding logic
return texts.map(text => this.embed(text));
}
async embedQuery(text: string): Promise {
return this.embed(text);
}
private embed(text: string): number[] {
// Your embedding implementation
return [];
}
}
const store = new ArangoLangChainStore(
new CustomEmbeddings(),
{ database: db, collectionName: 'documents' }
); Reranking
Implement custom reranking for better relevance:
const rag = new ArangoRAG(embeddings, db, {
collectionName: 'documents',
topK: 10, // Retrieve more initially
reranker: async (docs) => {
// Custom reranking logic
// Example: Boost documents with certain metadata
return docs.sort((a, b) => {
const scoreA = a.metadata.priority || 0;
const scoreB = b.metadata.priority || 0;
return scoreB - scoreA;
}).slice(0, 5); // Return top 5 after reranking
}
});Multi-Tenancy Support
Combine with multi-tenancy for isolated document storage:
import { tenantMiddleware } from 'arango-typed/integrations/express';
// Enable multi-tenancy
app.use(tenantMiddleware({ extractFrom: 'header' }));
const Document = model('documents', DocumentSchema, { tenantEnabled: true });
// Documents are automatically filtered by tenant
const store = new ArangoLangChainStore(
embeddings,
{ database: db, collectionName: 'documents', model: Document }
);
// Retrieval automatically respects tenant context
const docs = await store.similaritySearch('query', 5);Best Practices
- Chunking Strategy: Use 800-1200 tokens per chunk with 10-20% overlap for optimal retrieval quality
- Metadata Filtering: Add rich metadata (source, category, date, etc.) and use filters during retrieval for better relevance
- Hybrid Search: Use
hybridRetrievewhen keyword precision is important alongside semantic similarity - Score Thresholds: Set appropriate
scoreThresholdvalues to filter out low-relevance results - Reranking: Implement custom reranking for domain-specific relevance improvements
- Performance: Enable precomputed magnitudes for vector search, index metadata fields, and use connection pooling
- Freshness: Maintain
updatedAtfields and periodically re-embed changed content - Error Handling: Always handle embedding generation failures and database connection errors gracefully
Common Use Cases
- Document Q&A Systems: Build question-answering systems over large document collections
- Code Search: Semantic search over codebases for finding similar code patterns
- Customer Support: RAG-powered chatbots with knowledge base retrieval
- Research Assistants: Academic paper search and summarization systems
- Content Recommendations: Similar content discovery based on semantic similarity
Troubleshooting
Common Issues
- Embedding Dimension Mismatch: Ensure all embeddings use the same dimension (e.g., 1536 for OpenAI)
- Collection Not Found: Make sure the collection exists or enable auto-creation in connection options
- Low Retrieval Quality: Try adjusting chunk size, overlap, or implementing reranking
- Performance Issues: Enable precomputed magnitudes, add indexes on metadata fields, and use connection pooling
API Reference
For detailed API reference, see LangChain Module API Reference.