Alex or Mario:
"Thanks for coming. Quick logistics: this isn't a sales pitch. We're not demoing products or pushing you to sign anything. This is the first Voice AI Connect Sydney — a monthly forum where we talk honestly about what works and what breaks when you deploy voice AI in production.
We drew a good crowd because you're all dealing with the same problem: AI models are commoditized, but getting them to work reliably over a phone call is still black magic. Today we're unpacking why that is, what the real bottlenecks are, and how teams are solving them.
Format: we'll do some guided discussion, then I'll show you what we're seeing work in practice. Interrupt anytime. If you've shipped voice AI and hit these problems, share what you learned. If you're evaluating it now, ask the hard questions. Let's make this useful."
"How many of you have tried and/or are using at least one AI vendor? LLMs, turnkey offerings, voice AI platforms — all count."
(Most hands go up)
"Okay, keep your hands up if you've tried 2 vendors."
(Some hands drop)
"How about 3?" → "4?" → "5 or above?"
"New question: raise your hand if an LLM has sent something it created directly to production — no human review in between."
(Smaller group)
"Keep your hand up if you're comfortable with that."
(Laughs, some hands drop)
"By show of hands, how many of you are using a tool like Claude Code or Cursor or Copilot regularly?"
(Moderate number of hands)
"Now let's narrow to voice AI specifically. Raise your hand if you've evaluated a vendor in the last year for a voice AI pilot or PoC."
(Hands go up — probably 40-60% of room)
Progress through: 2 vendors → 3 vendors → More than 3
"Alright, final question: raise your hand if your voice AI application is actually in production right now."
(Smaller subset — expect 20-30%)
"Keep your hand raised if it's handling thousands of calls a month."
(Very small group — maybe 3-5 people)
"And just so I know who's in the room — quick shout-outs by industry. Healthcare? Fintech? Contact centers / customer support? E-commerce / logistics?"
(Note responses)
"Okay, good mix. Let's dig in."
Split into 3 rounds of ~10 minutes each:
Facilitate:
"I'll go first. We had a customer running voice AI for appointment confirmations. Worked perfectly in testing — 10 concurrent calls, sub-300ms latency, 95% accuracy. They launched to production: 500 calls/day. Within 2 hours, their media server crashed because they'd hit the rate limit on their STT provider, and the retry logic created a cascade failure. They didn't know until customers started complaining about dropped calls.
The AI model wasn't the problem. The infrastructure underneath it was. Anyone else hit something like this?"
Guide toward: Latency, audio quality, vendor stitching, compliance, scaling, cost overruns
"67% of CIOs at the ADAPT conference yesterday said conversational AI hasn't delivered on its promise. Why do we think that is?"
Seed questions if discussion stalls:
"Right — and here's the thing nobody tells you: the LLM isn't usually the bottleneck. It's the carrier routing and audio transcoding. You've got 6-8 network hops before the LLM even sees the text."
"Yeah, and when a call fails at 2AM, whose fault is it? Twilio blames OpenAI, OpenAI blames ElevenLabs, and you're in the middle with an angry customer."
This naturally surfaces:
Common regrets:
"So I'm hearing three themes: latency, vendor fragmentation, and compliance. Those aren't separate problems — they're symptoms of the same root cause: most voice AI stacks weren't designed for production from day one. They were cobbled together from dev tools that work great at 10 calls/day but fall apart at 10,000."
"So here's what I'm hearing from this room — and it's the same thing I hear from teams in NZ, Dubai, across APAC: the AI model isn't the problem. It's everything underneath it."
Show or draw this diagram:
TYPICAL MULTI-VENDOR STACK: Customer → Carrier A → SIP Provider B → Media Server C → STT (Deepgram) → LLM (OpenAI) → TTS (ElevenLabs) → Media Server C → SIP Provider B → Carrier A → Customer = 8+ network hops = 600-900ms round-trip latency = 4 vendors to coordinate when things break
SINGLE-STACK APPROACH: Customer → Telnyx (carrier + media + STT + TTS + LLM) → Customer = 2 network hops = 180-300ms round-trip latency = 1 vendor, 1 SLA, 1 throat to choke
"Most teams don't realize they're paying a 400-600ms 'vendor tax' just in transport overhead. That's before the LLM even thinks. For conversational AI, that's the difference between 'this feels natural' and 'this feels broken.'"
"Most voice providers don't own the carrier network. They lease routes from wholesale carriers. That means unpredictable latency — sometimes 200ms, sometimes 800ms — depending on time of day, carrier congestion, and routing path.
For batch calls (appointment reminders), who cares? For conversational AI, 800ms round-trip feels broken."
What matters:
"AI models are trained on clean audio. Real phone calls are not clean. You've got codec compression, packet loss, background noise, accents, crosstalk.
If your STT model can't handle Australian accents or someone calling from a noisy cafe, your AI agent sounds dumb even if the LLM is GPT-4."
"At 10 concurrent calls, everything works. At 100, you hit rate limits. At 1,000, your media server crashes. At 10,000, your LLM provider throttles you and your TTS queue backs up 5 seconds.
Most teams don't discover this until launch day."
What breaks:
"You're stitching together 5-7 vendors: telephony, STT, TTS, LLM, media server, monitoring, billing. When a call fails, whose fault is it? Each vendor points at the other. You're spending 40% of eng time debugging vendor integrations instead of improving your product."
What matters:
"So what does 'enterprise-grade voice AI' actually mean? Not marketing fluff — what should you demand from your stack?"
| Requirement | Why It Matters |
|---|---|
| Sub-300ms end-to-end latency | Anything above 500ms feels broken |
| 99.99%+ uptime SLA | One hour of downtime = customer trust lost |
| Owned carrier network | Predictable routing, no wholesale middlemen |
| Single-vendor stack | Fewer handoffs, unified support |
| Real-time observability | Debug calls mid-conversation, not post-mortem |
| Compliance baked in | SOC2, HIPAA, PCI-DSS if needed |
"If you're building voice AI in-house, this is your RFP checklist. If you're buying, don't let vendors hand-wave these. Make them prove it."
Your response:
"Fair question. I can talk about it in the abstract, or I can just show you. Give me 90 seconds."
[Execute your prepared demo here]
Key callouts during/after demo:
"So that's the stack. Questions?"
Let them ask, then pivot back:
"Look, the point isn't 'use Telnyx.' The point is: if you're stitching together 4-5 vendors, you're paying a latency tax and a complexity tax. Whether you solve that with us or someone else, solve it before you scale to production. Otherwise you'll hit the same walls everyone in this room has hit."
Alex or Mario:
"We're doing this monthly. Next month's topic: [e.g., 'Interruption Handling in Real Conversations']. If there's something specific you want us to cover, let me know.
No pressure to use Telnyx. But if you want to dig deeper into what we discussed — latency benchmarks, architecture reviews, whatever — grab us after or shoot us an email. Otherwise, see you next month."
Hand out or email afterward:
This positions you as: Experts who've seen the hard problems, honest brokers (not just selling), community builders (not just vendors).