Sanctum Deployment Guide
This guide covers deployment scenarios for Sanctum's production-ready Qdrant adapter across various environments.
Table of Contents
- Prerequisites
- Local Development
- Docker Compose
- Kubernetes
- Cloud Deployments
- Production Best Practices
- Monitoring
- Backup and Recovery
Prerequisites
For Qdrant Deployment
- Docker 20.10+ (for Docker deployments)
- Kubernetes 1.21+ (for K8s deployments)
- Minimum 2GB RAM for Qdrant
- Sufficient disk space (estimate ~1KB per vector with 1536 dimensions)
Resource Estimation
| Entries | Dimension | Estimated Storage | Recommended RAM |
|---|---|---|---|
| 10,000 | 1536 | ~15 MB | 512 MB |
| 100,000 | 1536 | ~150 MB | 1 GB |
| 1,000,000 | 1536 | ~1.5 GB | 4 GB |
| 10,000,000 | 1536 | ~15 GB | 16 GB |
Local Development
Using InMemory Adapter
The simplest option for development - no infrastructure needed:
# config.yml
sanctum:
enabled: true
adapter_type: "in_memory"
use paladin::infrastructure::adapters::sanctum::InMemorySanctum; #[tokio::main] async fn main() { let sanctum = InMemorySanctum::new(); // Ready to use immediately }
Local Qdrant Instance
For testing Qdrant locally:
# Pull and run Qdrant
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant:latest
# config.yml
sanctum:
enabled: true
adapter_type: "qdrant"
qdrant:
url: "http://localhost:6334"
collection_name: "dev_memories"
vector_dimension: 1536
Access Qdrant dashboard at: http://localhost:6333/dashboard
Docker Compose
Basic Setup
# docker-compose.yml
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:v1.7.4
container_name: paladin-qdrant
ports:
- "6333:6333" # HTTP API
- "6334:6334" # gRPC API
volumes:
- qdrant_data:/qdrant/storage
environment:
QDRANT__SERVICE__HTTP_PORT: 6333
QDRANT__SERVICE__GRPC_PORT: 6334
restart: unless-stopped
paladin:
build: .
container_name: paladin-app
depends_on:
- qdrant
environment:
APP_SANCTUM_ENABLED: "true"
APP_SANCTUM_ADAPTER_TYPE: "qdrant"
APP_SANCTUM_QDRANT_URL: "http://qdrant:6334"
APP_SANCTUM_QDRANT_COLLECTION_NAME: "paladin_memories"
APP_SANCTUM_QDRANT_VECTOR_DIMENSION: "1536"
volumes:
- ./config.yml:/app/config.yml
restart: unless-stopped
volumes:
qdrant_data:
driver: local
Start services:
docker-compose up -d
Verify Qdrant health:
curl http://localhost:6333/health
Production Docker Compose
Enhanced with resource limits and monitoring:
# docker-compose.prod.yml
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:v1.7.4
container_name: paladin-qdrant-prod
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_data:/qdrant/storage
- ./qdrant-config.yaml:/qdrant/config/production.yaml
environment:
QDRANT__SERVICE__HTTP_PORT: 6333
QDRANT__SERVICE__GRPC_PORT: 6334
QDRANT__LOG_LEVEL: INFO
deploy:
resources:
limits:
cpus: '4'
memory: 8G
reservations:
cpus: '2'
memory: 4G
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:6333/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
paladin:
build:
context: .
dockerfile: Dockerfile.prod
container_name: paladin-app-prod
depends_on:
qdrant:
condition: service_healthy
environment:
APP_SANCTUM_ENABLED: "true"
APP_SANCTUM_ADAPTER_TYPE: "qdrant"
APP_SANCTUM_QDRANT_URL: "http://qdrant:6334"
APP_SANCTUM_QDRANT_COLLECTION_NAME: "production_memories"
APP_SANCTUM_QDRANT_VECTOR_DIMENSION: "1536"
RUST_LOG: "info,paladin=debug"
volumes:
- ./config.prod.yml:/app/config.yml:ro
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
volumes:
qdrant_data:
driver: local
Kubernetes
Qdrant StatefulSet
# k8s/qdrant-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: qdrant
namespace: paladin
spec:
selector:
app: qdrant
ports:
- name: http
port: 6333
targetPort: 6333
- name: grpc
port: 6334
targetPort: 6334
clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qdrant
namespace: paladin
spec:
serviceName: qdrant
replicas: 1
selector:
matchLabels:
app: qdrant
template:
metadata:
labels:
app: qdrant
spec:
containers:
- name: qdrant
image: qdrant/qdrant:v1.7.4
ports:
- containerPort: 6333
name: http
- containerPort: 6334
name: grpc
env:
- name: QDRANT__SERVICE__HTTP_PORT
value: "6333"
- name: QDRANT__SERVICE__GRPC_PORT
value: "6334"
- name: QDRANT__LOG_LEVEL
value: "INFO"
volumeMounts:
- name: qdrant-storage
mountPath: /qdrant/storage
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "8Gi"
cpu: "4000m"
livenessProbe:
httpGet:
path: /health
port: 6333
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /readyz
port: 6333
initialDelaySeconds: 10
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
storage: 50Gi
Paladin Deployment
# k8s/paladin-deployment.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: paladin-config
namespace: paladin
data:
config.yml: |
sanctum:
enabled: true
adapter_type: "qdrant"
qdrant:
url: "http://qdrant:6334"
collection_name: "k8s_memories"
vector_dimension: 1536
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: paladin
namespace: paladin
spec:
replicas: 3
selector:
matchLabels:
app: paladin
template:
metadata:
labels:
app: paladin
spec:
containers:
- name: paladin
image: paladin:latest
ports:
- containerPort: 8080
env:
- name: APP_SANCTUM_ENABLED
value: "true"
- name: APP_SANCTUM_ADAPTER_TYPE
value: "qdrant"
- name: APP_SANCTUM_QDRANT_URL
value: "http://qdrant:6334"
- name: APP_SANCTUM_QDRANT_COLLECTION_NAME
value: "k8s_memories"
- name: APP_SANCTUM_QDRANT_VECTOR_DIMENSION
value: "1536"
volumeMounts:
- name: config
mountPath: /app/config.yml
subPath: config.yml
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
volumes:
- name: config
configMap:
name: paladin-config
Deploy to Kubernetes:
# Create namespace
kubectl create namespace paladin
# Apply configurations
kubectl apply -f k8s/qdrant-statefulset.yaml
kubectl apply -f k8s/paladin-deployment.yaml
# Verify deployment
kubectl get pods -n paladin
kubectl logs -n paladin -l app=paladin
Cloud Deployments
AWS (EKS + Qdrant)
Option 1: Self-Hosted on EKS
Use the Kubernetes manifests above with EKS-specific storage class:
# Use AWS EBS for storage
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "gp3" # AWS EBS GP3
resources:
requests:
storage: 100Gi
Option 2: Qdrant Cloud
# config.yml
sanctum:
enabled: true
adapter_type: "qdrant"
qdrant:
url: "https://your-cluster.qdrant.io:6334"
collection_name: "aws_memories"
vector_dimension: 1536
Set API key via environment:
export QDRANT_API_KEY=your_api_key_here
GCP (GKE + Qdrant)
Use GCP persistent disk:
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard-rwo" # GCP persistent disk
resources:
requests:
storage: 100Gi
Azure (AKS + Qdrant)
Use Azure managed disk:
volumeClaimTemplates:
- metadata:
name: qdrant-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "managed-premium" # Azure premium SSD
resources:
requests:
storage: 100Gi
Production Best Practices
1. High Availability
Qdrant Cluster Mode (v1.2.0+):
# qdrant-config.yaml
cluster:
enabled: true
consensus:
tick_period_ms: 100
p2p:
port: 6335
Deploy multiple Qdrant replicas:
replicas: 3 # Minimum for HA
2. Resource Allocation
CPU Guidelines:
- Development: 0.5-1 CPU
- Production: 2-4 CPUs
- High load: 4-8 CPUs
Memory Guidelines:
- Base: 2 GB + (vectors * dimension * 4 bytes)
- Example: 1M vectors × 1536 dim = ~6 GB + 2 GB buffer = 8 GB
Storage:
- Use SSD for production (NVMe preferred)
- Plan for 2x growth capacity
- Enable compression (built into Qdrant)
3. Network Configuration
Firewall Rules:
- Port 6333: HTTP API (internal only)
- Port 6334: gRPC API (application access)
- Port 6335: P2P cluster communication (Qdrant cluster only)
TLS Configuration:
service:
http_port: 6333
grpc_port: 6334
enable_tls: true
tls_cert: /path/to/cert.pem
tls_key: /path/to/key.pem
4. Collection Configuration
Optimal Settings:
#![allow(unused)] fn main() { use qdrant_client::prelude::*; // Configure collection for production let collection_config = CreateCollection { collection_name: "production_memories".to_string(), vectors_config: Some(VectorsConfig { params: Some(VectorParams { size: 1536, distance: Distance::Cosine, hnsw_config: Some(HnswConfig { m: 16, // Number of edges per node (higher = better recall, more memory) ef_construct: 200, // Build-time accuracy (higher = better quality, slower build) full_scan_threshold: 10000, }), quantization_config: Some(QuantizationConfig { scalar: Some(ScalarQuantization { type_: ScalarType::Int8, // Reduce memory by 4x quantile: 0.99, always_ram: true, }), }), on_disk: false, // Keep vectors in RAM for speed }), }), // ... other settings }; }
5. Security
Authentication:
# qdrant-config.yaml
service:
api_key: ${QDRANT_API_KEY} # Use environment variable
Network Policies (Kubernetes):
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: qdrant-network-policy
namespace: paladin
spec:
podSelector:
matchLabels:
app: qdrant
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: paladin
ports:
- protocol: TCP
port: 6334
6. Backup Strategy
Automated Snapshots:
# Create snapshot
curl -X POST 'http://localhost:6333/collections/paladin_memories/snapshots'
# List snapshots
curl 'http://localhost:6333/collections/paladin_memories/snapshots'
# Download snapshot
curl -O 'http://localhost:6333/collections/paladin_memories/snapshots/snapshot-2024-01-30.snapshot'
Kubernetes CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: qdrant-backup
namespace: paladin
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: curlimages/curl:latest
command:
- sh
- -c
- |
curl -X POST http://qdrant:6333/collections/paladin_memories/snapshots
# Upload to S3/GCS/Azure Storage
restartPolicy: OnFailure
Monitoring
Metrics to Track
Qdrant Metrics:
- Collection size (number of vectors)
- Search latency (p50, p95, p99)
- Memory usage
- CPU utilization
- Disk I/O
Application Metrics:
- Store operation latency
- Search operation latency
- Error rates
- Cache hit rates
Prometheus Integration
# prometheus-config.yaml
scrape_configs:
- job_name: 'qdrant'
static_configs:
- targets: ['qdrant:6333']
metrics_path: '/metrics'
Grafana Dashboard
Key panels:
- Search Performance: p95 latency over time
- Storage Growth: Collection size trend
- Resource Usage: CPU/Memory utilization
- Error Rates: Failed operations per minute
Backup and Recovery
Full Backup
#!/bin/bash
# backup-qdrant.sh
COLLECTION="paladin_memories"
BACKUP_DIR="/backups/$(date +%Y%m%d)"
QDRANT_URL="http://localhost:6333"
# Create snapshot
SNAPSHOT=$(curl -s -X POST "${QDRANT_URL}/collections/${COLLECTION}/snapshots" | jq -r '.result.name')
# Download snapshot
curl -o "${BACKUP_DIR}/${SNAPSHOT}" \
"${QDRANT_URL}/collections/${COLLECTION}/snapshots/${SNAPSHOT}"
# Upload to S3
aws s3 cp "${BACKUP_DIR}/${SNAPSHOT}" \
"s3://paladin-backups/qdrant/${COLLECTION}/${SNAPSHOT}"
Restore from Backup
#!/bin/bash
# restore-qdrant.sh
COLLECTION="paladin_memories"
SNAPSHOT_FILE="$1"
QDRANT_URL="http://localhost:6333"
# Upload snapshot to Qdrant
curl -X POST "${QDRANT_URL}/collections/${COLLECTION}/snapshots/upload" \
-F "snapshot=@${SNAPSHOT_FILE}"
# Restore from snapshot
curl -X PUT "${QDRANT_URL}/collections/${COLLECTION}/snapshots/recover" \
-H "Content-Type: application/json" \
-d "{\"location\": \"${SNAPSHOT_FILE}\"}"
Disaster Recovery Plan
- Regular Backups: Daily automated snapshots
- Off-site Storage: Copy to cloud storage (S3/GCS/Azure)
- Test Restores: Monthly restore validation
- RPO/RTO: Define acceptable data loss and recovery time
- Runbook: Document recovery procedures
Troubleshooting
High Memory Usage
Symptoms: OOM kills, swapping
Solutions:
-
Enable quantization to reduce memory 4x:
#![allow(unused)] fn main() { quantization_config: Some(QuantizationConfig { scalar: Some(ScalarQuantization { type_: ScalarType::Int8, }), }) } -
Move vectors to disk:
#![allow(unused)] fn main() { on_disk: true // Slower but uses less RAM } -
Increase node resources
Slow Search Performance
Symptoms: Search > 500ms consistently
Solutions:
-
Increase HNSW ef parameter:
#![allow(unused)] fn main() { ef_construct: 200 // Higher = better accuracy } -
Tune search parameters:
#![allow(unused)] fn main() { search_params: Some(SearchParams { hnsw_ef: Some(128), // Higher = more accurate but slower exact: false, }) } -
Add filters to reduce search space
Connection Timeouts
Symptoms: "Failed to connect to Qdrant"
Solutions:
-
Verify Qdrant is running:
curl http://localhost:6333/health -
Check network connectivity:
telnet qdrant 6334 -
Increase timeouts:
#![allow(unused)] fn main() { QdrantClient::builder() .with_timeout(Duration::from_secs(30)) .build() }
Cost Optimization
Resource Right-Sizing
Start Small:
- 2 GB RAM for <100K vectors
- 4 GB RAM for <1M vectors
- Scale based on metrics
Storage Optimization
Techniques:
- Quantization: Reduce memory/storage by 75%
- Compression: Built into Qdrant (ZSTD)
- Pruning: Delete old/unused memories
Cloud Cost Management
Tips:
- Use spot/preemptible instances for non-critical workloads
- Scale down non-prod environments off-hours
- Use Qdrant Cloud for predictable costs
- Monitor and set budget alerts
Next Steps: