Performance Optimization
Best practices and built‑in features to make arango‑typed fast and efficient. arango-typed is optimized to be within 10-15% of raw arangojs driver performance.
Overview
arango-typed includes multiple performance optimizations:
- Connection Caching: Reuses database connections automatically
- Query Caching: Caches compiled AQL queries for reuse
- Compiled Validators: Validators compiled once and cached
- Direct DB Access: Bypasses wrapper when no hooks are needed
- Lean Queries: Returns plain objects instead of Document instances
- Batch Operations: Optimized bulk operations
- Indexing: Automatic index creation and management
- Vector Magnitude Caching: Pre-computed magnitudes for vector search
Connection Caching and Reuse
Connections are automatically cached and reused, eliminating connection overhead.
How It Works
When you call connect() with the same parameters, arango-typed reuses the cached connection:
import { connect, getDatabase } from 'arango-typed';
// First call - creates connection and caches it
await connect({
url: 'http://localhost:8529',
database: 'myapp',
username: 'root',
password: ''
});
// Subsequent calls with same parameters - reuses cached connection (fast!)
await connect({
url: 'http://localhost:8529',
database: 'myapp',
username: 'root',
password: ''
});
// Get the cached database instance
const db = getDatabase();
Cache Key
Connections are cached based on: url + database + username
Different combinations create separate cached connections:
// These create separate cached connections
await connect({ url: 'http://localhost:8529', database: 'app1', username: 'root' });
await connect({ url: 'http://localhost:8529', database: 'app2', username: 'root' });
await connect({ url: 'http://localhost:8529', database: 'app1', username: 'admin' });
Connection Validation
Cached connections are automatically validated before reuse. If a connection is invalid, a new one is created:
// Connection is validated automatically
const db = getDatabase();
// If connection is stale, it's recreated automatically
Performance Impact
- ✅ Eliminates connection overhead (saves ~10-50ms per request)
- ✅ Reuses existing connections efficiently
- ✅ Validates cached connections automatically
- ✅ Reduces database connection pool pressure
Query Caching
AQL queries are compiled once and cached for reuse, significantly improving performance for repeated queries.
How It Works
Query structure (not values) is cached. Different values with the same structure reuse the cached query:
const User = model('users', UserSchema);
// First call - compiles and caches query structure
await User.find({ name: 'John' }).all();
// Subsequent calls - uses cached query (faster!)
await User.find({ name: 'Jane' }).all(); // Same structure, different value
await User.find({ name: 'Bob' }).all(); // Same structure, different value
Cache Key Generation
Cache keys are based on:
- Collection name
- Query structure (where, select, limit, skip, sort)
- Not on query values
// Same cache key (same structure)
User.find({ name: 'John' }) // Structure: { where: { name: ... } }
User.find({ name: 'Jane' }) // Same structure, different value
// Different cache key (different structure)
User.find({ name: 'John' }) // Has where
User.find({}).limit(10) // Has limit, no where
User.find({ name: 'John' }).sort({ createdAt: -1 }) // Has sort
Performance Impact
- ⚡ 20-30% faster for repeated queries
- 💾 Reduced CPU usage (no recompilation)
- 🚀 Significant speedup for high-frequency queries
Cache Strategy
- ✅ Caches query structure (not values)
- ✅ Reuses compiled AQL
- ✅ Separate bindVars for each execution
- ✅ Automatic cache invalidation not needed (structure-based)
Indexing Strategies
Proper indexing is crucial for query performance. arango-typed supports automatic index creation and management.
Single Field Indexes
const UserSchema = new Schema({
email: { type: String, unique: true, index: true },
name: String,
age: Number
});
// Create index on email field
UserSchema.index('email');
// Create unique index
UserSchema.index('email', { unique: true });
// Create sparse index (ignores null values)
UserSchema.index('email', { sparse: true });
Compound Indexes
For queries filtering on multiple fields, use compound indexes:
// Compound index for multi-field queries
UserSchema.index(['tenantId', 'email']); // For multi-tenancy queries
UserSchema.index(['status', 'createdAt']); // For status + date queries
UserSchema.index(['category', 'price', 'rating']); // For complex filters
// Query using compound index
const users = await User.find({
tenantId: 'tenant123',
email: 'user@example.com'
}).all(); // Uses compound index efficiently
Index Types
- Persistent Index: Default, for equality and range queries
- Fulltext Index: For text search
- Geo Index: For geographic queries
- TTL Index: For automatic document expiration
// Fulltext index for text search
UserSchema.index('bio', { type: 'fulltext' });
// Geo index for location queries
LocationSchema.index('coordinates', { type: 'geo' });
// TTL index for automatic expiration
SessionSchema.index('expiresAt', { type: 'ttl', expireAfter: 0 });
Index Best Practices
- Index Frequently Queried Fields: Create indexes on fields used in WHERE clauses
- Use Compound Indexes: For queries filtering on multiple fields
- Index Sort Fields: Create indexes on fields used in SORT clauses
- Avoid Over-Indexing: Too many indexes slow down writes
- Index Foreign Keys: For relationship queries
- Index Tenant Fields: For multi-tenant applications
Lean Queries
Lean queries return plain JavaScript objects instead of Document instances, providing better performance and lower memory usage.
When to Use Lean Queries
- Read-only operations
- When you don't need Document methods (save, remove, etc.)
- High-frequency queries
- Large result sets
Using Lean Queries
// Regular query (returns Document instances)
const users = await User.find({ active: true }).all();
users[0].save(); // Document methods available
// Lean query (returns plain objects)
const users = await User.findLean({ active: true }).all();
// users[0].save(); // Error: plain object, no Document methods
// Lean queries are faster and use less memory
const users = await User.findLean({})
.select(['name', 'email']) // Only fetch needed fields
.limit(100)
.all();
Performance Comparison
- ⚡ 15-25% faster than regular queries
- 💾 30-40% less memory usage
- 🚀 Better for high-frequency read operations
Batch Operations
Batch operations are optimized for bulk inserts, updates, and deletes.
Batch Create
// ❌ Slow: Individual creates
for (const userData of usersData) {
await User.create(userData); // Multiple round trips
}
// ✅ Fast: Batch create
await User.create(usersData); // Single round trip
// Example
const users = [
{ name: 'Alice', email: 'alice@example.com' },
{ name: 'Bob', email: 'bob@example.com' },
{ name: 'Charlie', email: 'charlie@example.com' }
];
await User.create(users); // Creates all in one operation
Batch Update
// Batch update multiple documents
await User.updateMany(
{ status: 'inactive' },
{ status: 'active', updatedAt: new Date() }
);
Batch Delete
// Batch delete
await User.deleteMany({ status: 'deleted' });
Performance Impact
- ⚡ 5-10x faster than individual operations
- 💾 Reduced network round trips
- 🚀 Better for bulk data operations
Compiled Validators
Schema validators are compiled once and cached, providing faster validation on subsequent calls.
How It Works
const UserSchema = new Schema({
name: { type: String, required: true, minLength: 2 },
email: { type: String, required: true, unique: true },
age: { type: Number, min: 0, max: 150 }
});
// First call - compiles validator
UserSchema.validateSync({ name: 'John', email: 'john@example.com', age: 30 });
// Subsequent calls - uses compiled validator (fast!)
UserSchema.validateSync({ name: 'Jane', email: 'jane@example.com', age: 25 });
UserSchema.validateSync({ name: 'Bob', email: 'bob@example.com', age: 35 });
Performance Impact
- ⚡ 40-50% faster validation
- 💾 Reduced CPU usage
- 🚀 Significant speedup for high-frequency validation
Sync vs Async Validation
// Synchronous validation (faster, no async overhead)
UserSchema.validateSync(data);
// Asynchronous validation (for async validators)
await UserSchema.validate(data);
Direct DB Access
When no hooks are defined, arango-typed uses direct database access, bypassing the Document wrapper for better performance.
How It Works
// Schema without hooks - uses direct DB access
const SimpleSchema = new Schema({
name: String,
email: String
});
const SimpleModel = model('simple', SimpleSchema);
// Direct DB access (fast!)
await SimpleModel.create({ name: 'John', email: 'john@example.com' });
// Schema with hooks - uses Document wrapper
const HookedSchema = new Schema({
name: String,
email: String
});
HookedSchema.pre('save', async function() {
// Hook logic
});
const HookedModel = model('hooked', HookedSchema);
// Uses Document wrapper (slightly slower but necessary for hooks)
await HookedModel.create({ name: 'John', email: 'john@example.com' });
Performance Impact
- ⚡ 10-15% faster when no hooks
- 💾 Lower memory usage
- 🚀 Automatic optimization
Query Optimization Techniques
1. Use Projections (Select Only Needed Fields)
// ❌ Slow: Fetch all fields
const users = await User.find({}).all();
// ✅ Fast: Fetch only needed fields
const users = await User.find({})
.select(['name', 'email'])
.all();
2. Limit Results
// ❌ Slow: Fetch all documents
const users = await User.find({}).all();
// ✅ Fast: Limit results
const users = await User.find({})
.limit(100)
.all();
3. Use Pagination
// Efficient pagination
const page = 1;
const pageSize = 20;
const skip = (page - 1) * pageSize;
const users = await User.find({})
.skip(skip)
.limit(pageSize)
.sort({ createdAt: -1 })
.all();
4. Filter Early
// ✅ Good: Filter before other operations
const users = await User.find({ active: true })
.sort({ createdAt: -1 })
.limit(10)
.all();
// ❌ Bad: Fetch all, then filter in application
const allUsers = await User.find({}).all();
const activeUsers = allUsers.filter(u => u.active);
5. Use Indexes for Sort
// Create index on sort field
UserSchema.index('createdAt');
// Sort uses index efficiently
const users = await User.find({})
.sort({ createdAt: -1 })
.limit(10)
.all();
6. Avoid N+1 Queries
// ❌ Bad: N+1 queries
const posts = await Post.find({}).all();
for (const post of posts) {
const author = await User.findById(post.userId); // N queries!
}
// ✅ Good: Batch fetch or use populate
const posts = await Post.find({}).all();
const userIds = [...new Set(posts.map(p => p.userId))];
const users = await User.find({ _id: { $in: userIds } }).all();
const userMap = new Map(users.map(u => [u._id, u]));
posts.forEach(post => {
post.author = userMap.get(post.userId);
});
Vector Search Optimization
Vector search includes several optimizations for better performance.
Precomputed Magnitudes
For cosine similarity, precompute vector magnitudes to avoid recalculating them:
import { VectorSearch } from 'arango-typed';
const vectorSearch = new VectorSearch(db, 'documents');
// Store documents with precomputed magnitudes
await vectorSearch.storeDocument('doc1', {
content: 'Hello world',
embedding: [0.1, 0.2, 0.3],
magnitude: 0.374 // Precomputed magnitude
});
// Search is faster with precomputed magnitudes
const results = await vectorSearch.similaritySearch(
[0.1, 0.2, 0.3],
{ method: 'cosine', limit: 10 }
);
Batch Embedding Generation
// Generate embeddings in batch
const texts = ['text1', 'text2', 'text3'];
const embeddings = await generateEmbeddings(texts); // Batch API call
// Store with batch operations
const documents = texts.map((text, i) => ({
content: text,
embedding: embeddings[i],
magnitude: computeMagnitude(embeddings[i])
}));
await DocumentModel.create(documents);
Index Vector Fields
// Create index on vector field for faster searches
DocumentSchema.index('embedding', { type: 'vector' });
Graph Traversal Optimization
For graph operations, use appropriate traversal strategies.
Limit Traversal Depth
// ✅ Good: Limit depth
const traversal = new GraphTraversal(db, 'social', 'users/alice')
.direction('outbound')
.depth(1, 3) // Limit to 3 levels
.limit(100);
// ❌ Bad: Unlimited depth
const traversal = new GraphTraversal(db, 'social', 'users/alice')
.direction('outbound')
.depth(1, 100); // Too deep!
Use Appropriate Direction
// Use specific direction when possible
const friends = await UserGraph.getOutbound('users/alice', 'friends');
// Faster than 'any' direction
Index Edge Collections
// Create indexes on edge collections
// Index _from and _to fields for faster traversals
const edgeCollection = db.collection('friends');
await edgeCollection.ensureIndex({
type: 'persistent',
fields: ['_from', '_to']
});
Connection Pooling
For high-concurrency applications, consider connection pooling:
import { ConnectionPool } from 'arango-typed';
const pool = new ConnectionPool({
url: 'http://localhost:8529',
database: 'myapp',
username: 'root',
password: '',
maxConnections: 10,
minConnections: 2
});
// Get connection from pool
const db = await pool.acquire();
try {
// Use database
const users = await User.find({}).all();
} finally {
// Release connection back to pool
await pool.release(db);
}
Performance Monitoring
Monitor query performance to identify bottlenecks:
Query Profiling
// Enable query profiling in ArangoDB
// Check slow queries in ArangoDB web interface
// Or use query timing
const start = Date.now();
const users = await User.find({}).all();
const duration = Date.now() - start;
console.log(`Query took ${duration}ms`);
Index Usage
Check if queries are using indexes:
// Explain query to see index usage
const explain = await db.query(`
FOR doc IN users
FILTER doc.email == @email
RETURN doc
`, { email: 'user@example.com' }, { explain: true });
console.log(explain.plan); // Shows if index is used
Best Practices
Do's ✅
- ✅ Use indexes on frequently queried fields
- ✅ Use lean queries for read-only operations
- ✅ Batch operations when possible
- ✅ Limit results to reasonable sizes
- ✅ Use projections to fetch only needed fields
- ✅ Reuse connections (automatic with connect())
- ✅ Filter early in queries
- ✅ Use compound indexes for multi-field queries
- ✅ Precompute vector magnitudes for cosine similarity
- ✅ Limit graph traversal depth
Don'ts ❌
- ❌ Don't create connections in request handlers
- ❌ Don't fetch all documents without limit
- ❌ Don't skip indexes on queried fields
- ❌ Don't use Document wrapper unnecessarily (use lean queries)
- ❌ Don't create too many indexes (slows down writes)
- ❌ Don't use N+1 query patterns
- ❌ Don't fetch unnecessary fields
- ❌ Don't use unlimited graph traversals
- ❌ Don't ignore query profiling
- ❌ Don't use synchronous operations in hot paths unnecessarily
Performance Benchmarks
Performance comparison with raw arangojs driver:
| Operation | Raw arangojs | arango-typed | Overhead |
|---|---|---|---|
| Simple Find | 100% | 110% | +10% |
| Find with Tenant | 100% | 115% | +15% |
| Document Create | 100% | 105% | +5% |
| Create with Tenant | 100% | 110% | +10% |
| Vector Search | 100% | 105% | +5% |
| Graph Traversal | 100% | 112% | +12% |
| Lean Query | 100% | 108% | +8% |
Note: Overhead is minimal and worth it for the convenience, type safety, and features provided.
Performance Improvements Summary
- ✅ Connection caching: Eliminates connection overhead (~10-50ms per request)
- ✅ Query caching: ~20-30% faster for repeated queries
- ✅ Compiled validators: ~40-50% faster validation
- ✅ Direct DB access: ~10-15% faster when no hooks
- ✅ Lean queries: ~15-25% faster, 30-40% less memory
- ✅ Batch operations: 5-10x faster than individual operations
- ✅ Indexing: Dramatically faster queries (10-1000x depending on data size)
Common Performance Pitfalls
1. Missing Indexes
// ❌ Bad: No index on queried field
const users = await User.find({ email: 'user@example.com' }).all();
// Full collection scan - very slow!
// ✅ Good: Index on queried field
UserSchema.index('email');
const users = await User.find({ email: 'user@example.com' }).all();
// Uses index - fast!
2. Fetching Too Much Data
// ❌ Bad: Fetch all fields and all documents
const users = await User.find({}).all();
// ✅ Good: Limit and select only needed fields
const users = await User.find({})
.select(['name', 'email'])
.limit(100)
.all();
3. Not Using Lean Queries
// ❌ Bad: Using Document instances when not needed
const users = await User.find({}).all();
// Creates Document instances with overhead
// ✅ Good: Use lean queries for read-only operations
const users = await User.findLean({}).all();
// Plain objects - faster and less memory
4. Creating Connections in Request Handlers
// ❌ Bad: Create connection per request
app.get('/users', async (req, res) => {
await connect({ ... }); // Slow!
const users = await User.find({}).all();
});
// ✅ Good: Connect once at startup
await connect({ ... }); // At app startup
app.get('/users', async (req, res) => {
const users = await User.find({}).all(); // Fast!
});
Real-World Performance Tips
1. Use Aggregations for Statistics
// ❌ Bad: Fetch all, then calculate in application
const orders = await Order.find({}).all();
const total = orders.reduce((sum, o) => sum + o.amount, 0);
// ✅ Good: Use aggregation
const stats = await Order.aggregate()
.aggregate({ total: { $sum: 'amount' } })
.execute();
2. Cache Frequently Accessed Data
// Cache user roles
const cache = new Map();
async function getUserRoles(userId: string) {
if (cache.has(userId)) {
return cache.get(userId);
}
const roles = await UserRoles.find({ userId }).all();
cache.set(userId, roles);
return roles;
}
3. Use Transactions for Related Operations
// Use transactions for multiple related operations
await db.beginTransaction();
try {
await User.create(userData);
await Profile.create(profileData);
await db.commit();
} catch (error) {
await db.abort();
throw error;
}