Twilio integrates conversational intelligence to offer deep behavioural insights.
AI is here to automate and simplify tasks for humans. Its inherent traits even enhance performance, boosting business outcomes. The results surprised many.
In a recent months-long experiment, a pizza brand tested AI-powered virtual agents, developed by Twilio, by allowing them to handle customer orders.
In a conversation with AIM, Christopher Connolly, director of solution engineering at Twilio, a customer engagement and communications platform, shared insights into how AI is not only keeping pace with customer interactions but also occasionally surpassing human agents.
Among the standout findings, AI agents sold more soft drinks than their human counterparts, not due to charm but because of their persistence and shameless upselling strategies.
Real-world data from Twilio's new observability tools reveals that virtual agents, when monitored correctly, can improve both upselling and customer satisfaction. The company shared that using AI agents resulted in 25 times faster responses to customers requesting to speak with Sales, 75% of service ticket resolutions without escalation, and a 3.1 times higher conversion rate than before.
Twilio's new 'conversation relay' feature aims to tackle a longstanding blind spot in AI deployments -- how virtual agents behave after going live. Unveiled first internally and now shared with the broader developer ecosystem, the system enables companies to observe and analyse AI agents in production in real time.
Connolly told AIM that the tool works across virtual agents, whether built on Twilio or external platforms, and integrates with conversational intelligence to offer deep behavioural insights.
Twilio's machine learning team has open-sourced six specialised AI agent observability language operators that are purpose-built for everyday use cases that have been requested by the platform's customers. "These operators allow users to extract signals such as agent interruptions, hallucination events, conversation flow errors, and more," Connolly said.
To ensure safety and reliability, Twilio uses a multi-model setup, where one large language model keeps another in check.
"We've effectively got one LLM checking another," he highlighted. This internal network can detect hallucinations, ensure tasks are completed, and generate predictive customer satisfaction scores mid-call.
Connolly noted that Governments, banks, and insurers in regulated industries are concerned about AI becoming too autonomous. They worry that AI will offer inappropriate advice or get stuck in repetitive loops when performing assigned tasks. Our solution addresses these concerns.
Unlike traditional call analytics, which mainly converts speech into searchable text, Twilio's system taps into the potential of LLMs to derive meaning and context from conversations.
From his observations, Connolly said, "The shift that we're seeing is we're not just using NLP to turn it into text and do tagging, we're now able to add the LLM capability into it to be a lot more abstract with our questioning."
The company's approach to virtual agents involves a flexible "bring your own language model" (BYOLM) strategy through something called conversation relay. They provide sample integrations for various platforms like Azure, OpenAI, Groq, and Hugging Face, allowing customers to plug in whichever LLM they prefer.
This approach is popular because it handles complex speech-related tasks, like Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) through partners like ElevenLabs and Deepgram, and sentence endpoints, through Twilio, leaving the LLM choice open.
However, they do provide an LLM option directly in some cases. In those scenarios, they currently offer support for OpenAI's models. The focus remains on simplifying communications infrastructure and connecting disparate technologies rather than developing their own language models.
All of this is governed by clear data usage policies, the company claims. "Customers really value consent, and that is in all of our data, all of our testing. Consent is always required," Connolly added.
To aid transparency, Twilio publishes AI "nutrition labels" explaining how models operate and where humans are involved in the loop.
To see how AI agents perform on the ground, the customer engagement platform partnered with a pizza delivery chain to compare virtual and human agents in a live ordering environment. The experiment ran for several months and revealed some unexpected advantages of automation.
The company observed how many conversations it takes to order a pizza. Connolly explained that the number of turns increased when the virtual agent was introduced. He reasoned this because of the upsell insertions and tasks the virtual agent was asked to perform. Despite the longer interaction, customer satisfaction remained stable, and revenue per order actually rose.
Most notably, one product stood out. "The virtual agent sells more soft drinks than the human agent," Connolly revealed. "The virtual agent is shameless," he noted, adding that they don't mind asking repeatedly, pushing for add-on drinks and snacks.
In contrast, human agents often tended to avoid repeated upsell prompts. "They'll just want to get through the order, move on to the next order," he said.
Not just food brands like Domino's, Twilio also tested its AI-powered conversational relay feature with a financial services provider. For instance, it tested with Cedar for a healthcare client to streamline billing through automated patient communication and secure voice payments. Additionally, ING, a Dutch multinational and financial services company, integrates Twilio's voice, chat, and video tools with its APIs to enhance contact centre interactions.
Looking ahead, Connolly sees AI voice agents becoming more human-like and responsive. "OpenAI has a real-time API that we're working with. Gemini has the same. Google's got the same," he said.