Skip to main content

Executive Summary

Vaani is a multi-tenant SaaS platform for building, deploying, and managing AI-powered voice agents. It enables organizations to create virtual phone agents that handle inbound and outbound calls using configurable LLM, Speech-to-Text, and Text-to-Speech providers.

Key Capabilities

  • 🤖 Configurable AI agents with ~50 settings per agent (LLM, STT, TTS, prompts, tools)
  • 📞 Dual telephony (Twilio + Vonage) with LiveKit SIP gateway
  • 🔄 Warm and cold call transfers with failover logic
  • 📊 Batch calling (CSV upload, Celery-based concurrency, per-item tracking)
  • 📈 Analytics dashboard with call summaries, dispositions, and Prometheus metrics
  • 🏢 Multi-tenant workspaces with RBAC (admin/developer/member)
  • 📚 Knowledge base upload with RAG (LlamaIndex-powered retrieval)
  • 💬 Web chat interface for text-based agent interaction
  • 🎙️ Call recording (LiveKit Egress → S3) with pre-signed URL playback
Architecture: 4-service monorepo — FastAPI backend, LiveKit agent worker, Next.js dashboard, embeddable VUI widget. PostgreSQL primary database, Redis cache/queue, S3 recording storage.

Document Index


Enable a new engineer to understand the entire system in 2–3 hours.
1

Hour 1: Orientation

  1. Read this Overview page (executive summary + top insights)
  2. Read Repository Overview — understand the codebase structure
  3. Read System Architecture — understand how components connect
  4. Scan Glossary — learn domain terminology
2

Hour 2: Deep Dive

  1. Read Features Catalog — understand every feature
  2. Read API Endpoints — know the API surface
  3. Read Data Model — understand the database
3

Hour 3: Operations

  1. Read Integrations — external service dependencies
  2. Read Deployment Guide — set up local environment
  3. Read Risks & Recommendations — know the pitfalls
  4. Scan FAQ — common questions answered

Top 10 Critical Insights

1. The Agents Model is the System’s Heart

The Agents table has ~50 columns controlling every aspect of agent behavior — from LLM provider choice to silence timeout thresholds. Understanding this model is essential.

2. agent.py:entrypoint() is Where the Magic Happens

This 877-line function orchestrates the entire real-time call lifecycle: connection → participant detection → config loading → conversation loop → post-call analytics.

3. Factory Pattern Powers Provider Flexibility

LLMFactory, STTFactory, TTSFactory abstract away 10+ AI providers. Adding a new provider means adding one conditional branch — no architectural changes needed.

4. Warm Transfer Has Complex Failover Logic

The transfer implementation in AgentCaller.py includes SIP participant creation, handoff text delivery, retry logic, and participant disconnection.

5. Two Parallel Calling Systems Exist

Batch Jobs (Celery-based, persistent, with concurrency control) and Campaigns (BackgroundTasks-based, simpler). They should probably be unified.

6. Security Needs Immediate Attention

Hardcoded DB credentials, wildcard CORS, and no rate limiting are production concerns. The datetime.now() default bug affects data integrity.

7. Everything is Workspace-Scoped

Multi-tenancy runs through require_workspace_access() — every data query filters by workspace_id.

8. The Report Router is the Largest File (1365 Lines)

It handles dashboards, call summaries, disposition analysis, and Prometheus metric parsing. It’s a strong refactoring candidate.

9. No Test Suite Exists

Zero automated tests were found. This is the single biggest risk for ongoing development velocity.

10. Post-Call Pipeline is Comprehensive

After every call: cost computation, LLM-based conversation analysis, recording URL persistence, and optional webhook execution.