Skip to main content

What Is Vaani?

Vaani is a multi-tenant AI voice agent SaaS platform that enables creation, configuration, and deployment of AI-driven voice agents capable of:
  • Making and receiving phone calls via SIP/PSTN (Twilio, Vonage)
  • Interacting through a web-based Voice UI (VUI) widget
  • Running batch outbound calling campaigns
  • Performing call transfers (cold and warm)
  • Executing custom functions (webhooks) during calls
  • Conducting post-call analysis using LLMs
  • Recording calls and computing cost/latency analytics
Built on top of LiveKit for real-time communication, with pluggable AI providers (OpenAI, Groq, Deepgram, ElevenLabs, Rime, Cartesia, Sarvam, Murf, Inworld).

Repository Structure

The repository is a monorepo containing four independent subprojects:
Voice_ai/
├── agent-studio-backend/       # Python FastAPI backend API
├── agent-studio-livekit-agent/ # Python LiveKit agent worker
├── agent-studio-ui/            # Next.js admin/user dashboard
└── agent-studio-vui-widget/    # Vite + React embeddable VUI widget
AspectDetail
LanguagePython 3.12+
FrameworkFastAPI + Gunicorn + Uvicorn
ORMSQLAlchemy (models + Alembic migrations)
DatabasePostgreSQL
Task QueueCelery + Redis
AuthJWT (HS256) with access/refresh tokens
Key Packagespydantic, python-jose, livekit-api, llama-index, boto3
agent-studio-backend/
├── app/
│   ├── main.py                     # FastAPI app entry, router registration
│   ├── core/
│   │   ├── auth.py                 # JWT user extraction
│   │   ├── config/settings.py      # Pydantic settings (env vars)
│   │   └── logging.py              # AppLogger configuration
│   ├── router/                     # API route handlers
│   │   ├── agent.py                # CRUD + file upload + RAG retrieval
│   │   ├── call.py                 # Outbound call initiation
│   │   ├── batch_call.py           # Batch job CRUD
│   │   ├── campaign.py             # Campaign management
│   │   ├── sip.py                  # SIP trunk provisioning
│   │   ├── report.py               # Call analytics & reports
│   │   ├── chat.py                 # Web chat log endpoints
│   │   ├── dynamic_data.py         # Dynamic data collections
│   │   ├── workspaces.py           # Workspace CRUD & membership
│   │   ├── admin.py                # Superuser admin endpoints
│   │   ├── auth/                   # Login, signup, refresh, OAuth
│   │   ├── user.py                 # User profile endpoints
│   │   ├── api_key.py              # API key management
│   │   └── webhooks.py             # Inbound telephony webhooks
│   ├── db/
│   │   ├── databases.py            # SQLAlchemy engine & session
│   │   ├── models/                 # ORM models (12 files)
│   │   └── schemas/                # Pydantic request/response schemas
│   ├── services/
│   │   ├── livekit/                # LiveKit API wrappers
│   │   ├── telephony/              # Twilio & Vonage service layers
│   │   ├── cache/                  # Redis client
│   │   ├── llama_index_integration.py  # RAG indexing/querying
│   │   └── llm.py                  # LLM utility functions
│   ├── workers/
│   │   ├── celery_app.py           # Celery configuration
│   │   ├── batch_dispatcher.py     # Batch call dispatch logic
│   │   └── batch_scheduler.py      # Periodic task scheduling
│   └── utils/                      # Helpers (crypto, CSV, HTTP, etc.)
├── alembic/                        # Database migrations
├── scripts/                        # Operational scripts
├── Dockerfile                      # Container build
└── docker-compose.yml              # Local dev environment

Technology Stack Summary

LayerTechnology
Backend APIPython 3.12+, FastAPI, SQLAlchemy, Alembic
DatabasePostgreSQL (primary), MongoDB (supplementary), Redis (cache + queue)
Task QueueCelery (Redis broker)
Agent RuntimeLiveKit Agents SDK, Silero VAD, Krisp Noise Cancellation
LLM ProvidersOpenAI (GPT-4, GPT-4o, Realtime), Groq
STT ProvidersDeepgram (Nova-3), Sarvam, Cartesia
TTS ProvidersOpenAI, Deepgram, Rime, ElevenLabs, Inworld, Cartesia, Murf, Sarvam
TelephonyTwilio, Vonage (SIP trunking via LiveKit)
FrontendNext.js 14+ (App Router), TailwindCSS
WidgetVite, React, TailwindCSS
StorageAWS S3 (call recordings)
ContainerizationDocker, Docker Compose
CI/CDGitLab CI (.gitlab-ci.yml in each subproject)
AuthJWT (HS256), HTTP-only cookies, refresh tokens
RAGLlamaIndex (vector embeddings for knowledge base)

API Architecture

Router Prefixes

RouterPrefixFileKey Endpoints
Auth/authapp/router/auth/Login, signup, refresh, OAuth
Admin/adminapp/router/admin.pyUser mgmt, system stats
User/usersapp/router/user.pyProfile CRUD
Agents/agentsapp/router/agent.pyCRUD, file upload, RAG query
SIP/sipapp/router/sip.pyTrunk provisioning
Campaign/campaignapp/router/campaign.pyCampaign CRUD + execution
Call/callapp/router/call.pyOutbound call initiation
Chat/chatapp/router/chat.pyWeb chat logs
DynamicData/dynamic-dataapp/router/dynamic_data.pyData collection CRUD
Workspaces/workspacesapp/router/workspaces.pyWorkspace CRUD + membership
Reports/reportsapp/router/report.pyCall analytics (46KB file)
BatchCall/batch-callapp/router/batch_call.pyBatch job CRUD + dispatch
ApiKey/api-keysapp/router/api_key.pyAPI key management
Webhooks/webhooksapp/router/webhooks.pyInbound telephony webhooks

Agent Lifecycle (Runtime Flow)

The core runtime flow when a call connects:
1. LiveKit dispatches a job → agent.py entrypoint()
2. Connect to room, identify participants
3. Detect call type: phone (SIP) vs web (widget)
4. Determine call direction: inbound vs outbound
5. Load assistant_data from PostgreSQL (via phone number lookup)
6. Render system_prompt and first_message with dynamic variables
7. Initialize LLM/STT/TTS via Factory classes
8. Create AgentSession with VAD, turn detection, noise cancellation
9. Initialize AgentCaller with @function_tool methods
10. Start background tasks:
    - enforce_max_duration()
    - send_reminder_if_inactive()
    - handling_silence_timeout()
    - monitor_user_disconnect()
11. Start recording (LiveKit Egress → S3)
12. Deliver first_message (TTS or Realtime)
13. Conversation loop (LLM-driven with tool calls)
14. On disconnect:
    - Stop recording
    - Determine disconnection reason (Twilio/Vonage API)
    - Compute usage metrics and costs
    - Run post-call analysis (conversation_analysis)
    - Save call log to PostgreSQL

Key Dependencies

  • fastapi, uvicorn, gunicorn — Web framework
  • sqlalchemy, alembic — ORM + migrations
  • pydantic, pydantic-settings — Data validation
  • celery[redis] — Background task processing
  • livekit-api — LiveKit server API
  • llama-index — RAG indexing
  • twilio — Twilio API
  • vonage — Vonage API (via HTTP)
  • boto3 — AWS S3