Executive Summary
Vaani is a multi-tenant SaaS platform for building, deploying, and managing AI-powered voice agents. It enables organizations to create virtual phone agents that handle inbound and outbound calls using configurable LLM, Speech-to-Text, and Text-to-Speech providers.Key Capabilities
- 🤖 Configurable AI agents with ~50 settings per agent (LLM, STT, TTS, prompts, tools)
- 📞 Dual telephony (Twilio + Vonage) with LiveKit SIP gateway
- 🔄 Warm and cold call transfers with failover logic
- 📊 Batch calling (CSV upload, Celery-based concurrency, per-item tracking)
- 📈 Analytics dashboard with call summaries, dispositions, and Prometheus metrics
- 🏢 Multi-tenant workspaces with RBAC (admin/developer/member)
- 📚 Knowledge base upload with RAG (LlamaIndex-powered retrieval)
- 💬 Web chat interface for text-based agent interaction
- 🎙️ Call recording (LiveKit Egress → S3) with pre-signed URL playback
Document Index
Repository Overview
Codebase structure, technology stack, key files, and component map
System Architecture
Component interactions, runtime flows, auth/RBAC, data storage, queues
Features Catalog
10 features mapped to files, APIs, models, validation, and limitations
API Endpoints
All REST endpoints with methods, auth, schemas, and error codes
Data Model
All database entities, fields, relationships, migrations, and lifecycle
Integrations
Internal/external dependencies, SDKs, secrets, failure handling
Deployment Guide
Local setup, 30+ env vars, Docker, logging, monitoring, troubleshooting
Risks & Recommendations
20 risks categorized P0–P3 with actionable fixes
Maintenance Plan
Doc update strategy, automation ideas, ownership, versioning
Developer FAQ
20 common questions for dev, support, product teams
Glossary
35+ domain terms and abbreviations
Recommended Onboarding Path
Enable a new engineer to understand the entire system in 2–3 hours.
Hour 1: Orientation
- Read this Overview page (executive summary + top insights)
- Read Repository Overview — understand the codebase structure
- Read System Architecture — understand how components connect
- Scan Glossary — learn domain terminology
Hour 2: Deep Dive
- Read Features Catalog — understand every feature
- Read API Endpoints — know the API surface
- Read Data Model — understand the database
Hour 3: Operations
- Read Integrations — external service dependencies
- Read Deployment Guide — set up local environment
- Read Risks & Recommendations — know the pitfalls
- Scan FAQ — common questions answered
Top 10 Critical Insights
1. The Agents Model is the System’s Heart
The Agents table has ~50 columns controlling every aspect of agent behavior — from LLM provider choice to silence timeout thresholds. Understanding this model is essential.
2. agent.py:entrypoint() is Where the Magic Happens
This 877-line function orchestrates the entire real-time call lifecycle: connection → participant detection → config loading → conversation loop → post-call analytics.
3. Factory Pattern Powers Provider Flexibility
LLMFactory, STTFactory, TTSFactory abstract away 10+ AI providers. Adding a new provider means adding one conditional branch — no architectural changes needed.
4. Warm Transfer Has Complex Failover Logic
The transfer implementation inAgentCaller.py includes SIP participant creation, handoff text delivery, retry logic, and participant disconnection.
5. Two Parallel Calling Systems Exist
Batch Jobs (Celery-based, persistent, with concurrency control) and Campaigns (BackgroundTasks-based, simpler). They should probably be unified.6. Security Needs Immediate Attention
Hardcoded DB credentials, wildcard CORS, and no rate limiting are production concerns. Thedatetime.now() default bug affects data integrity.
7. Everything is Workspace-Scoped
Multi-tenancy runs throughrequire_workspace_access() — every data query filters by workspace_id.