Performance Optimization

Best practices and built‑in features to make arango‑typed fast and efficient. arango-typed is optimized to be within 10-15% of raw arangojs driver performance.

Overview

arango-typed includes multiple performance optimizations:

  • Connection Caching: Reuses database connections automatically
  • Query Caching: Caches compiled AQL queries for reuse
  • Compiled Validators: Validators compiled once and cached
  • Direct DB Access: Bypasses wrapper when no hooks are needed
  • Lean Queries: Returns plain objects instead of Document instances
  • Batch Operations: Optimized bulk operations
  • Indexing: Automatic index creation and management
  • Vector Magnitude Caching: Pre-computed magnitudes for vector search

Connection Caching and Reuse

Connections are automatically cached and reused, eliminating connection overhead.

How It Works

When you call connect() with the same parameters, arango-typed reuses the cached connection:

import { connect, getDatabase } from 'arango-typed';

// First call - creates connection and caches it
await connect({
  url: 'http://localhost:8529',
  database: 'myapp',
  username: 'root',
  password: ''
});

// Subsequent calls with same parameters - reuses cached connection (fast!)
await connect({
  url: 'http://localhost:8529',
  database: 'myapp',
  username: 'root',
  password: ''
});

// Get the cached database instance
const db = getDatabase();

Cache Key

Connections are cached based on: url + database + username

Different combinations create separate cached connections:

// These create separate cached connections
await connect({ url: 'http://localhost:8529', database: 'app1', username: 'root' });
await connect({ url: 'http://localhost:8529', database: 'app2', username: 'root' });
await connect({ url: 'http://localhost:8529', database: 'app1', username: 'admin' });

Connection Validation

Cached connections are automatically validated before reuse. If a connection is invalid, a new one is created:

// Connection is validated automatically
const db = getDatabase();
// If connection is stale, it's recreated automatically

Performance Impact

  • ✅ Eliminates connection overhead (saves ~10-50ms per request)
  • ✅ Reuses existing connections efficiently
  • ✅ Validates cached connections automatically
  • ✅ Reduces database connection pool pressure

Query Caching

AQL queries are compiled once and cached for reuse, significantly improving performance for repeated queries.

How It Works

Query structure (not values) is cached. Different values with the same structure reuse the cached query:

const User = model('users', UserSchema);

// First call - compiles and caches query structure
await User.find({ name: 'John' }).all();

// Subsequent calls - uses cached query (faster!)
await User.find({ name: 'Jane' }).all();  // Same structure, different value
await User.find({ name: 'Bob' }).all();    // Same structure, different value

Cache Key Generation

Cache keys are based on:

  • Collection name
  • Query structure (where, select, limit, skip, sort)
  • Not on query values
// Same cache key (same structure)
User.find({ name: 'John' })  // Structure: { where: { name: ... } }
User.find({ name: 'Jane' })  // Same structure, different value

// Different cache key (different structure)
User.find({ name: 'John' })       // Has where
User.find({}).limit(10)           // Has limit, no where
User.find({ name: 'John' }).sort({ createdAt: -1 })  // Has sort

Performance Impact

  • ⚡ 20-30% faster for repeated queries
  • 💾 Reduced CPU usage (no recompilation)
  • 🚀 Significant speedup for high-frequency queries

Cache Strategy

  • ✅ Caches query structure (not values)
  • ✅ Reuses compiled AQL
  • ✅ Separate bindVars for each execution
  • ✅ Automatic cache invalidation not needed (structure-based)

Indexing Strategies

Proper indexing is crucial for query performance. arango-typed supports automatic index creation and management.

Single Field Indexes

const UserSchema = new Schema({
  email: { type: String, unique: true, index: true },
  name: String,
  age: Number
});

// Create index on email field
UserSchema.index('email');

// Create unique index
UserSchema.index('email', { unique: true });

// Create sparse index (ignores null values)
UserSchema.index('email', { sparse: true });

Compound Indexes

For queries filtering on multiple fields, use compound indexes:

// Compound index for multi-field queries
UserSchema.index(['tenantId', 'email']);  // For multi-tenancy queries
UserSchema.index(['status', 'createdAt']); // For status + date queries
UserSchema.index(['category', 'price', 'rating']); // For complex filters

// Query using compound index
const users = await User.find({
  tenantId: 'tenant123',
  email: 'user@example.com'
}).all(); // Uses compound index efficiently

Index Types

  • Persistent Index: Default, for equality and range queries
  • Fulltext Index: For text search
  • Geo Index: For geographic queries
  • TTL Index: For automatic document expiration
// Fulltext index for text search
UserSchema.index('bio', { type: 'fulltext' });

// Geo index for location queries
LocationSchema.index('coordinates', { type: 'geo' });

// TTL index for automatic expiration
SessionSchema.index('expiresAt', { type: 'ttl', expireAfter: 0 });

Index Best Practices

  • Index Frequently Queried Fields: Create indexes on fields used in WHERE clauses
  • Use Compound Indexes: For queries filtering on multiple fields
  • Index Sort Fields: Create indexes on fields used in SORT clauses
  • Avoid Over-Indexing: Too many indexes slow down writes
  • Index Foreign Keys: For relationship queries
  • Index Tenant Fields: For multi-tenant applications

Lean Queries

Lean queries return plain JavaScript objects instead of Document instances, providing better performance and lower memory usage.

When to Use Lean Queries

  • Read-only operations
  • When you don't need Document methods (save, remove, etc.)
  • High-frequency queries
  • Large result sets

Using Lean Queries

// Regular query (returns Document instances)
const users = await User.find({ active: true }).all();
users[0].save(); // Document methods available

// Lean query (returns plain objects)
const users = await User.findLean({ active: true }).all();
// users[0].save(); // Error: plain object, no Document methods

// Lean queries are faster and use less memory
const users = await User.findLean({})
  .select(['name', 'email'])  // Only fetch needed fields
  .limit(100)
  .all();

Performance Comparison

  • ⚡ 15-25% faster than regular queries
  • 💾 30-40% less memory usage
  • 🚀 Better for high-frequency read operations

Batch Operations

Batch operations are optimized for bulk inserts, updates, and deletes.

Batch Create

// ❌ Slow: Individual creates
for (const userData of usersData) {
  await User.create(userData);  // Multiple round trips
}

// ✅ Fast: Batch create
await User.create(usersData);  // Single round trip

// Example
const users = [
  { name: 'Alice', email: 'alice@example.com' },
  { name: 'Bob', email: 'bob@example.com' },
  { name: 'Charlie', email: 'charlie@example.com' }
];

await User.create(users);  // Creates all in one operation

Batch Update

// Batch update multiple documents
await User.updateMany(
  { status: 'inactive' },
  { status: 'active', updatedAt: new Date() }
);

Batch Delete

// Batch delete
await User.deleteMany({ status: 'deleted' });

Performance Impact

  • ⚡ 5-10x faster than individual operations
  • 💾 Reduced network round trips
  • 🚀 Better for bulk data operations

Compiled Validators

Schema validators are compiled once and cached, providing faster validation on subsequent calls.

How It Works

const UserSchema = new Schema({
  name: { type: String, required: true, minLength: 2 },
  email: { type: String, required: true, unique: true },
  age: { type: Number, min: 0, max: 150 }
});

// First call - compiles validator
UserSchema.validateSync({ name: 'John', email: 'john@example.com', age: 30 });

// Subsequent calls - uses compiled validator (fast!)
UserSchema.validateSync({ name: 'Jane', email: 'jane@example.com', age: 25 });
UserSchema.validateSync({ name: 'Bob', email: 'bob@example.com', age: 35 });

Performance Impact

  • ⚡ 40-50% faster validation
  • 💾 Reduced CPU usage
  • 🚀 Significant speedup for high-frequency validation

Sync vs Async Validation

// Synchronous validation (faster, no async overhead)
UserSchema.validateSync(data);

// Asynchronous validation (for async validators)
await UserSchema.validate(data);

Direct DB Access

When no hooks are defined, arango-typed uses direct database access, bypassing the Document wrapper for better performance.

How It Works

// Schema without hooks - uses direct DB access
const SimpleSchema = new Schema({
  name: String,
  email: String
});
const SimpleModel = model('simple', SimpleSchema);

// Direct DB access (fast!)
await SimpleModel.create({ name: 'John', email: 'john@example.com' });

// Schema with hooks - uses Document wrapper
const HookedSchema = new Schema({
  name: String,
  email: String
});
HookedSchema.pre('save', async function() {
  // Hook logic
});
const HookedModel = model('hooked', HookedSchema);

// Uses Document wrapper (slightly slower but necessary for hooks)
await HookedModel.create({ name: 'John', email: 'john@example.com' });

Performance Impact

  • ⚡ 10-15% faster when no hooks
  • 💾 Lower memory usage
  • 🚀 Automatic optimization

Query Optimization Techniques

1. Use Projections (Select Only Needed Fields)

// ❌ Slow: Fetch all fields
const users = await User.find({}).all();

// ✅ Fast: Fetch only needed fields
const users = await User.find({})
  .select(['name', 'email'])
  .all();

2. Limit Results

// ❌ Slow: Fetch all documents
const users = await User.find({}).all();

// ✅ Fast: Limit results
const users = await User.find({})
  .limit(100)
  .all();

3. Use Pagination

// Efficient pagination
const page = 1;
const pageSize = 20;
const skip = (page - 1) * pageSize;

const users = await User.find({})
  .skip(skip)
  .limit(pageSize)
  .sort({ createdAt: -1 })
  .all();

4. Filter Early

// ✅ Good: Filter before other operations
const users = await User.find({ active: true })
  .sort({ createdAt: -1 })
  .limit(10)
  .all();

// ❌ Bad: Fetch all, then filter in application
const allUsers = await User.find({}).all();
const activeUsers = allUsers.filter(u => u.active);

5. Use Indexes for Sort

// Create index on sort field
UserSchema.index('createdAt');

// Sort uses index efficiently
const users = await User.find({})
  .sort({ createdAt: -1 })
  .limit(10)
  .all();

6. Avoid N+1 Queries

// ❌ Bad: N+1 queries
const posts = await Post.find({}).all();
for (const post of posts) {
  const author = await User.findById(post.userId);  // N queries!
}

// ✅ Good: Batch fetch or use populate
const posts = await Post.find({}).all();
const userIds = [...new Set(posts.map(p => p.userId))];
const users = await User.find({ _id: { $in: userIds } }).all();
const userMap = new Map(users.map(u => [u._id, u]));
posts.forEach(post => {
  post.author = userMap.get(post.userId);
});

Vector Search Optimization

Vector search includes several optimizations for better performance.

Precomputed Magnitudes

For cosine similarity, precompute vector magnitudes to avoid recalculating them:

import { VectorSearch } from 'arango-typed';

const vectorSearch = new VectorSearch(db, 'documents');

// Store documents with precomputed magnitudes
await vectorSearch.storeDocument('doc1', {
  content: 'Hello world',
  embedding: [0.1, 0.2, 0.3],
  magnitude: 0.374  // Precomputed magnitude
});

// Search is faster with precomputed magnitudes
const results = await vectorSearch.similaritySearch(
  [0.1, 0.2, 0.3],
  { method: 'cosine', limit: 10 }
);

Batch Embedding Generation

// Generate embeddings in batch
const texts = ['text1', 'text2', 'text3'];
const embeddings = await generateEmbeddings(texts);  // Batch API call

// Store with batch operations
const documents = texts.map((text, i) => ({
  content: text,
  embedding: embeddings[i],
  magnitude: computeMagnitude(embeddings[i])
}));

await DocumentModel.create(documents);

Index Vector Fields

// Create index on vector field for faster searches
DocumentSchema.index('embedding', { type: 'vector' });

Graph Traversal Optimization

For graph operations, use appropriate traversal strategies.

Limit Traversal Depth

// ✅ Good: Limit depth
const traversal = new GraphTraversal(db, 'social', 'users/alice')
  .direction('outbound')
  .depth(1, 3)  // Limit to 3 levels
  .limit(100);

// ❌ Bad: Unlimited depth
const traversal = new GraphTraversal(db, 'social', 'users/alice')
  .direction('outbound')
  .depth(1, 100);  // Too deep!

Use Appropriate Direction

// Use specific direction when possible
const friends = await UserGraph.getOutbound('users/alice', 'friends');
// Faster than 'any' direction

Index Edge Collections

// Create indexes on edge collections
// Index _from and _to fields for faster traversals
const edgeCollection = db.collection('friends');
await edgeCollection.ensureIndex({
  type: 'persistent',
  fields: ['_from', '_to']
});

Connection Pooling

For high-concurrency applications, consider connection pooling:

import { ConnectionPool } from 'arango-typed';

const pool = new ConnectionPool({
  url: 'http://localhost:8529',
  database: 'myapp',
  username: 'root',
  password: '',
  maxConnections: 10,
  minConnections: 2
});

// Get connection from pool
const db = await pool.acquire();
try {
  // Use database
  const users = await User.find({}).all();
} finally {
  // Release connection back to pool
  await pool.release(db);
}

Performance Monitoring

Monitor query performance to identify bottlenecks:

Query Profiling

// Enable query profiling in ArangoDB
// Check slow queries in ArangoDB web interface

// Or use query timing
const start = Date.now();
const users = await User.find({}).all();
const duration = Date.now() - start;
console.log(`Query took ${duration}ms`);

Index Usage

Check if queries are using indexes:

// Explain query to see index usage
const explain = await db.query(`
  FOR doc IN users
  FILTER doc.email == @email
  RETURN doc
`, { email: 'user@example.com' }, { explain: true });

console.log(explain.plan);  // Shows if index is used

Best Practices

Do's ✅

  • ✅ Use indexes on frequently queried fields
  • ✅ Use lean queries for read-only operations
  • ✅ Batch operations when possible
  • ✅ Limit results to reasonable sizes
  • ✅ Use projections to fetch only needed fields
  • ✅ Reuse connections (automatic with connect())
  • ✅ Filter early in queries
  • ✅ Use compound indexes for multi-field queries
  • ✅ Precompute vector magnitudes for cosine similarity
  • ✅ Limit graph traversal depth

Don'ts ❌

  • ❌ Don't create connections in request handlers
  • ❌ Don't fetch all documents without limit
  • ❌ Don't skip indexes on queried fields
  • ❌ Don't use Document wrapper unnecessarily (use lean queries)
  • ❌ Don't create too many indexes (slows down writes)
  • ❌ Don't use N+1 query patterns
  • ❌ Don't fetch unnecessary fields
  • ❌ Don't use unlimited graph traversals
  • ❌ Don't ignore query profiling
  • ❌ Don't use synchronous operations in hot paths unnecessarily

Performance Benchmarks

Performance comparison with raw arangojs driver:

Operation Raw arangojs arango-typed Overhead
Simple Find 100% 110% +10%
Find with Tenant 100% 115% +15%
Document Create 100% 105% +5%
Create with Tenant 100% 110% +10%
Vector Search 100% 105% +5%
Graph Traversal 100% 112% +12%
Lean Query 100% 108% +8%

Note: Overhead is minimal and worth it for the convenience, type safety, and features provided.

Performance Improvements Summary

  • Connection caching: Eliminates connection overhead (~10-50ms per request)
  • Query caching: ~20-30% faster for repeated queries
  • Compiled validators: ~40-50% faster validation
  • Direct DB access: ~10-15% faster when no hooks
  • Lean queries: ~15-25% faster, 30-40% less memory
  • Batch operations: 5-10x faster than individual operations
  • Indexing: Dramatically faster queries (10-1000x depending on data size)

Common Performance Pitfalls

1. Missing Indexes

// ❌ Bad: No index on queried field
const users = await User.find({ email: 'user@example.com' }).all();
// Full collection scan - very slow!

// ✅ Good: Index on queried field
UserSchema.index('email');
const users = await User.find({ email: 'user@example.com' }).all();
// Uses index - fast!

2. Fetching Too Much Data

// ❌ Bad: Fetch all fields and all documents
const users = await User.find({}).all();

// ✅ Good: Limit and select only needed fields
const users = await User.find({})
  .select(['name', 'email'])
  .limit(100)
  .all();

3. Not Using Lean Queries

// ❌ Bad: Using Document instances when not needed
const users = await User.find({}).all();
// Creates Document instances with overhead

// ✅ Good: Use lean queries for read-only operations
const users = await User.findLean({}).all();
// Plain objects - faster and less memory

4. Creating Connections in Request Handlers

// ❌ Bad: Create connection per request
app.get('/users', async (req, res) => {
  await connect({ ... });  // Slow!
  const users = await User.find({}).all();
});

// ✅ Good: Connect once at startup
await connect({ ... });  // At app startup
app.get('/users', async (req, res) => {
  const users = await User.find({}).all();  // Fast!
});

Real-World Performance Tips

1. Use Aggregations for Statistics

// ❌ Bad: Fetch all, then calculate in application
const orders = await Order.find({}).all();
const total = orders.reduce((sum, o) => sum + o.amount, 0);

// ✅ Good: Use aggregation
const stats = await Order.aggregate()
  .aggregate({ total: { $sum: 'amount' } })
  .execute();

2. Cache Frequently Accessed Data

// Cache user roles
const cache = new Map();
async function getUserRoles(userId: string) {
  if (cache.has(userId)) {
    return cache.get(userId);
  }
  const roles = await UserRoles.find({ userId }).all();
  cache.set(userId, roles);
  return roles;
}

3. Use Transactions for Related Operations

// Use transactions for multiple related operations
await db.beginTransaction();
try {
  await User.create(userData);
  await Profile.create(profileData);
  await db.commit();
} catch (error) {
  await db.abort();
  throw error;
}
📚 Related Documentation: Learn more about Queries, Connection Management, and Models & Schemas.
Next: Learn about Queries for advanced querying capabilities.