Experimental phone voice assistant powered by Azure OpenAI Realtime (SIP). Handles inbound calls, orchestrates tool calls, and provides comprehensive monitoring. Built to test latency and explore the Realtime API capabilities.
β οΈ Experimental Project: This is a research/experimental project built to explore Azure OpenAI Realtime capabilities and test latency. Not intended for production use without significant additional work on security, scalability, and reliability.
A Bun/Hono service that receives realtime.call.incoming webhooks from Azure OpenAI Realtime (SIP), configures voice sessions, handles tool calls, and can forward callers to human agents via SIP REFER. This project focuses on the control plane (call acceptance, WebSocket orchestration, tools, logging)βmedia (RTP) is handled entirely by Azure/Twilio.
Built to explore:
- π€ Voice-enabled interactions with Azure OpenAI Realtime
- π Phone-based AI assistants with tool integration
- π Hybrid AI/human call routing via SIP REFER
- π Real-time call monitoring and analytics
- π Low-latency voice interactions (tested when deployed in the same Azure region)
Monitor your voice assistant in real-time with a beautiful, auto-refreshing dashboard:
Access: http://localhost:8000/dashboard
The dashboard provides:
- π Live statistics β Active calls, total sessions, tool usage, average duration
- π΄ Active calls β Real-time view of ongoing conversations with caller phone numbers, duration, tools used, and message counts
- π Recent activity β Complete call history with caller information, sentiment analysis and transcripts
- π Tool analytics β Visual charts showing which tools are most frequently used
- β‘ Live updates β Server-Sent Events (SSE) for real-time, non-intrusive updates without page refreshes
Perfect for demos, monitoring, and understanding your voice assistant's performance at a glance.
- The dashboard now shows Avg Latency (ms), derived from user transcript completion to first assistant response per turn.
- Raw values are streamed via
/api/eventsand exposed in/api/statsunderaverageLatencyMs. - Console logs include
latency_measured_mslifecycle entries for deeper analysis. - Expect lower latency when the webhook server and Azure Realtime resource run in the same Azure region (avoid hairpin across clouds).
- Observed in local tests (server on laptop, Azure SIP in-cloud) ~1.3s user-stop β assistant audio start; co-locating the server in Azure should reduce this significantly.
- Natural conversations powered by Azure OpenAI Realtime
- 9 built-in tools ready to use (order lookup, inventory, weather, store locator, and more)
- Barge-in support β Users can interrupt naturally
- SIP REFER β Seamless handoff to human agents
- Multi-language ready β Easy prompt customization
- Real-time dashboard with beautiful UI
- Comprehensive analytics β Call duration, tool usage, sentiment analysis
- Latency tracing β Per-turn latency (user speech β assistant reply) captured and averaged; visible on the dashboard and via
/api/stats - Admin API β RESTful endpoints for integrations
- Enhanced logging β Color-coded console output with call summaries
- Transcript viewer β Full conversation history per call
- TypeScript with full type safety
- Zod validation for all tool arguments
- Hot reload development mode
- PII redaction in logs
- Extensible architecture β Easy to add new tools
Quick Reference:
PSTN β Twilio Elastic SIP Trunk β Azure OpenAI Realtime (SIP)
β (Webhook: realtime.call.incoming)
Your Bun/Hono Server
ββ POST /accept (REST)
ββ wss /v1/realtime?call_id=... (control)
ββ tools (function calls)
ββ /dashboard (monitoring)
ββ /api/* (admin API)
ββ POST /refer (REST, on handoff)
Call Flow:
- Inbound call hits the Azure Realtime SIP connector configured for your Azure OpenAI resource
- Azure posts
realtime.call.incomingwebhook to your server - Your server accepts the call (REST), then attaches to the call control WebSocket
- The bot greets, listens, calls tools, and can REFER to a human queue
- All activity is tracked in real-time on the dashboard
- Azure subscription with an Azure OpenAI resource in a supported region (e.g., Sweden Central)
- A model deployment named
gpt-realtime(Azure AI Foundry β Deployments) - API key for your Azure OpenAI resource
- Public URL for webhook (local: use
ngrokor Cloudflare Tunnel) - (Optional) Twilio account for Elastic SIP Trunking
# Clone the repository
git clone <your-repo>
cd <your-repo>
# Install dependencies
bun install
# Copy environment template
cp env.template .envEdit .env with your Azure OpenAI credentials:
# Azure OpenAI Configuration
OPENAI_API_KEY=YOUR_AZURE_OPENAI_KEY
OPENAI_WEBHOOK_SECRET= # Set after creating webhook endpoint
REALTIME_MODEL=gpt-realtime
REALTIME_VOICE=marin # Any supported voice (marin, alloy, etc.)
SIP_TARGET_URI= # Optional: tel:+1AAA BBB CCCC (for REFER)
PORT=8000
# Azure Endpoints (replace with your resource subdomain)
OPENAI_BASE=https://<your-resource>.openai.azure.com/openai
REALTIME_WS_BASE=wss://<your-resource>.openai.azure.com/openai/v1/realtime
# Leave empty for Azure Realtime (don't add api-version)
REALTIME_API_VERSION=
# Logging (optional)
LOG_FORMAT=pretty # 'pretty' for color-coded logs, 'json' for structured
DEBUG_LOGGING=0 # Set to '1' to enable verbose event loggingImportant: Do not append
api-versionto Realtime WS/REST URLs. The Azure Realtime SIP control plane accepts/v1/...without it.
# Development mode (hot reload)
bun run dev
# Production mode
bun run startThe server will print:
Server listening on http://localhost:8000/
Dashboard available at: /dashboard
Health checks:
GET /healthzβ{ ok: true }GET /β Service info
# Using ngrok
npx ngrok http 8000
# Note the https://<subdomain>.ngrok-free.app URL
# You'll use this in the next stepCreate a webhook endpoint in Azure that points to your public server URL:
curl -sS -X POST "https://<your-resource>.openai.azure.com/openai/v1/dashboard/webhook_endpoints" \
-H "Content-Type: application/json" \
-H "api-key: <YOUR_AZURE_OPENAI_KEY>" \
-d '{
"name": "realtime-incoming",
"url": "https://<your-public-host>/openai/webhook",
"event_types": ["realtime.call.incoming"]
}'After creating, Azure will show a Webhook Signing Secret. Add it to your .env:
OPENAI_WEBHOOK_SECRET=your_secret_hereIn Azure AI Foundry β Deployments, deploy gpt-realtime (Global Standard). Ensure your .env uses REALTIME_MODEL=gpt-realtime (deployment name, not model family).
Twilio routes PSTN calls to Azure's SIP connector. Point your SIP trunk at the SIP URI Azure provides for your resource/project.
Step-by-step guide:
Quick steps:
- In Twilio, create an Elastic SIP Trunk
- Add an Origination URI pointing to your Azure OpenAI SIP connector URI (looks like
sip:proj_...@sip.api.openai.azure.com) - Assign a Twilio phone number to the trunk
- Call that number β Twilio routes to Azure β Azure triggers your webhook
Prefer the stock OpenAI cloud? The same service works there tooβjust swap the endpoints in .env.
1. Configure OpenAI webhooks + SIP:
- Navigate to OpenAI Platform Settings
- Create webhook:
https://<your-public-host>/openai/webhook - Create SIP connection:
sip:<project-id>@sip.api.openai.com;transport=tls - Copy webhook secret to
.env
2. Update .env:
OPENAI_BASE=https://api.openai.com
REALTIME_WS_BASE=wss://api.openai.com/v1/realtime
REALTIME_API_VERSION=3. Restart and test:
bun run dev
# Call your number β OpenAI posts webhook β same flow!This project includes extensive demo and monitoring capabilities perfect for showcasing your voice assistant.
Visit http://localhost:8000/dashboard to see:
-
Live Statistics
- Active calls counter with pulse animation
- Total calls handled (all-time)
- Completed calls
- Total tool executions
- Average call duration
- System uptime
-
Active Calls Section
- Real-time view of ongoing conversations
- Caller phone number (extracted from SIP headers)
- Call duration counter
- Tools used per call
- Message count
- Status indicators
-
Recent Activity
- Last 5 calls with complete details
- Call status (active, completed, failed, transferred)
- Duration, tool usage, transcripts
- Color-coded by status
- "View Transcript" button for each call
-
Tool Usage Analytics
- Visual bar charts showing tool popularity
- Percentage breakdown of tool calls
- Real-time updates
Demo Tip: Keep the dashboard open on a second screen during demos to wow your audience!
handoff_humanβ Transfer to live agent (SIP REFER)lookup_orderβ Check order status by order numbercheck_inventoryβ Product availability lookupschedule_callbackβ Book a callback appointmentget_weatherβ Get weather for any locationcheck_company_hoursβ Business hours lookupsearch_productsβ Product catalog searchfind_store_locationβ Nearest store finder
All tools include:
- Zod validation for type-safe arguments
- Error handling with graceful fallbacks
- Follow-up instructions for natural conversation flow
Beautiful color-coded logs (enable with LOG_FORMAT=pretty):
- π CALL (Cyan) β Call lifecycle events
- π§ TOOL (Magenta) β Tool executions with status
- π¬ TRANSCRIPT (White) β User/assistant conversations
- β SUCCESS (Green) β Successful operations
- β WARNING (Yellow) β Important notices
- β ERROR (Red) β Problems and failures
- βΉ INFO (Blue) β General information
Special features:
- Timestamps with millisecond precision
- Call ID tags for easy tracking
- Beautiful boxed call summaries
- Periodic system statistics display
- Startup banner
Comprehensive tracking of every aspect:
Call Metrics:
- Caller identification (phone number from SIP headers)
- Start/end times and duration
- Tool calls with execution timing
- Full conversation transcripts
- User sentiment analysis (positive/neutral/negative)
- Barge-in events (user interruptions)
- Response counts
System Statistics:
- Total/active/completed calls
- Failed and transferred calls
- Average call duration
- Tool usage patterns
- System uptime
RESTful API for integrations and monitoring:
| Endpoint | Description |
|---|---|
GET /api/stats |
System-wide statistics |
GET /api/calls |
Recent calls (use ?limit=N to customize) |
GET /api/calls/active |
Currently active calls |
GET /api/calls/:callId |
Detailed metrics for a specific call |
GET /api/calls/:callId/transcript |
Full conversation transcript |
Example:
# Get system stats
curl http://localhost:8000/api/stats | jq
# Get recent calls
curl http://localhost:8000/api/calls?limit=20 | jq
# Get specific call details
curl http://localhost:8000/api/calls/CALL_RTC_9E31 | jqSee DEMO_FEATURES.md for complete documentation and demo script suggestions.
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
β | Azure OpenAI resource API key |
OPENAI_WEBHOOK_SECRET |
β | Webhook signing secret from Azure/OpenAI |
REALTIME_MODEL |
β | Azure deployment name (e.g., gpt-realtime) |
REALTIME_VOICE |
β | Voice name (e.g., marin, alloy) |
SIP_TARGET_URI |
β | Destination for REFER (e.g., tel:+1XXXXXXXXXX) |
PORT |
β | Server port (defaults to 8000) |
OPENAI_BASE |
β (Azure) | https://<resource>.openai.azure.com/openai |
REALTIME_WS_BASE |
β (Azure) | wss://<resource>.openai.azure.com/openai/v1/realtime |
REALTIME_API_VERSION |
β | Leave empty for Azure Realtime |
TEST_MODE |
β | 1 disables signature verification (local tests only) |
LOG_FORMAT |
β | pretty (default) or json for structured logs |
# Development (hot reload)
bun run dev
# Production
bun run start
# Run tests
bun test
# Smoke test webhook (offline, no Azure/Twilio)
TEST_MODE=1 bun run scripts/smoke-sip-webhook.tsNote:
smoke-realtime-ws.tstargetsapi.openai.comand validates account-level realtime, not Azure SIP. For full Azure SIP verification, place a real call or use the webhook smoke withTEST_MODE=1.
-
Webhook Verification:
POST /openai/webhookvalidates the request signature (or bypasses withTEST_MODE=1) -
Call Acceptance: On
realtime.call.incoming, the server POSTs/v1/realtime/calls/{call_id}/acceptwith your model, tools, and instructions -
WebSocket Connection: Connects to
wss://.../v1/realtime?call_id=...usingapi-keyheader (Azure) and sendssession.updateto configure voice/instructions -
Turn-Taking: Listens for speech start/stop and transcription events to request concise turn responses
-
Tool Execution: Function-call items stream in; arguments are collected, validated with
zod, and executed (handoff_human, lookup_order, check_inventory, etc.) -
Human Handoff:
handoff_humantool triggers a/refertoSIP_TARGET_URI -
Observability: Structured logs with PII redaction, real-time dashboard updates, and comprehensive analytics
Implement a ToolDefinition in src/tools.ts:
{
name: 'my_new_tool',
description: 'What this tool does',
schema: z.object({
param: z.string().describe('Parameter description'),
}),
jsonSchema: { /* OpenAI function schema */ },
handler: async (args, ctx) => {
// Your logic here
return {
output: { result: 'success' },
followUpInstructions: 'Tell the user about the result',
};
},
}The tool automatically appears in realtimeToolSchemas and is available to the assistant.
Edit src/prompts.ts (systemPrompt, greetingPrompt). Keep prompts conciseβRealtime responds quickly to short, intentional instructions.
For real-world use, your tool handlers would query internal APIs/databases and return summarized payloads for the bot to explain naturally. Currently, the included tools use demo/mock data.
Running your control plane in the same Azure region as your Azure OpenAI resource minimizes WebSocket round-trips and avoids public-internet hairpins.
Deployment Options:
- Azure Container Apps or Azure App Service β Containerize or run Bun directly
- Azure VM / Scale Set β Maximum control and custom networking
- Azure Functions β Not ideal (HTTP + WS requires long-lived connections; prefer process-based hosts)
After Deployment:
-
Update webhook endpoint with your Azure-hosted HTTPS domain:
curl -sS -X POST "https://<your-resource>.openai.azure.com/openai/v1/dashboard/webhook_endpoints" \ -H "Content-Type: application/json" \ -H "api-key: <YOUR_AZURE_OPENAI_KEY>" \ -d '{ "name": "realtime-incoming", "url": "https://<your-azure-host>/openai/webhook", "event_types": ["realtime.call.incoming"] }'
-
Twilio stays the same (still points at Azure's SIP connector)
-
Keep the same
OPENAI_BASEandREALTIME_WS_BASEvalues
Networking Tips:
- Ensure your app can reach
https://<resource>.openai.azure.com/openaiandwss://.../v1/realtime - If using Private Endpoints, put the app in the same VNet/subnet and update DNS
- Don't add
api-versionto the WebSocket URL - Confirm
REALTIME_WS_BASEiswss://<resource>.openai.azure.com/openai/v1/realtime - Azure requires the
api-keyheader (notAuthorization: Bearer). The server sets this automatically whenOPENAI_BASEis Azure.
- The server sends
session.updateand applies voice onsession.updated - If Azure returns
unknown_parameterforaudio.output.voice, the server retries withvoice(compatibility shim)
- Set a real
SIP_TARGET_URI(E.164 for PSTN, orsip:...@...for PBX) - Your carrier/PBX must accept REFER to that target
- Check browser console for SSE connection errors
- Verify
/api/eventsendpoint is accessible - Ensure
/api/statsendpoint returns data - Check network tab for EventSource connection status
Known Limitation: Azure OpenAI Realtime (SIP) does not currently support user input transcription.
- Assistant transcription: β Works (visible in dashboard and logs)
- User transcription: β Not supported by Azure's SIP implementation
- Speech detection: β Works (VAD detects speech, tools execute based on user input)
Why this happens:
- Azure rejects
input_audio_transcriptionconfig in both/acceptpayload andsession.updatemessages - The API understands user speech and processes it correctly, but doesn't provide transcript events
- This appears to be a deliberate API limitation, not a configuration issue
Workarounds:
- Accept this limitation and track only assistant responses + tool calls
- Use Twilio Media Streams fork with external STT (Whisper/Deepgram)
- Wait for Azure to add this feature to their SIP implementation
Debug: Enable DEBUG_LOGGING=1 to see all events and confirm no conversation.item.input_audio_transcription.completed events arrive.
Note: This is an experimental project. For production use, consider:
- Secrets Management: Keep
.envout of version control (already ignored). Store secrets in Azure Key Vault or your secret manager - PII Redaction: Logs automatically redact phone numbers and emails. Add a SIEM/SOC sink (Datadog, etc.) for production deployments
- Data Residency: For PHI/PII, ensure data residency and DSR procedures align with your policies
- Signature Verification: Webhook signatures are verified by default (disable only with
TEST_MODE=1for local testing)
When you place a call:
-
Server logs show:
π CALL [CALL_RTC] Webhook received: realtime.call.incoming π CALL [CALL_RTC] Call accepted π CALL [CALL_RTC] WebSocket opened π¬ TRANSCRIPT [CALL_RTC] ASSISTANT: Hello! How can I help you today? -
Dashboard updates in real-time with call metrics
-
Bot responds naturally to your voice
-
Tools execute when requested (e.g., "check order ACME-12345")
-
Handoff works if you say "transfer me to a person" (when
SIP_TARGET_URIis set)
- Azure Realtime SIP β Microsoft Learn: "Use the GPT Realtime API via SIP"
- Realtime Conversations & Tools β OpenAI Platform Documentation
- Azure Realtime Audio Reference β Complete API reference
- Twilio Elastic SIP Trunking + OpenAI Realtime β Step-by-step Twilio guide
- Bandwidth β OpenAI Realtime SIP β Bandwidth integration guide
MIT
Built with β€οΈ using Bun, Hono, and Azure OpenAI Realtime

