Parlant enables an AI conversation modeling system that automatically tailors responses from a large and dynamically controlled selection of pre-approved "utterances".
The capabilities of generative AI have prompted businesses to explore its potential in customer service, but the underlying technical blockers remain significant. Recently dubbed "The Giant Engineering Problem That Nobody Else on Earth Has Been Able to Solve", Large Language Model "hallucinations" expose businesses deploying customer-facing AI to intolerable risks.
Hallucinations happen because, at their core, LLMs generate responses through a probabilistic, token-by-token, autoregressive process. The model continuously selects what it sees as the most likely tokens from an extensive "token vocabulary" that can span hundreds of thousands of tokens. For example, OpenAI's GPT-4o has a vocabulary size of nearly 200,000 tokens.
This token selection process is inherently error-prone, as each probabilistic prediction relies solely on the preceding context. This often leads to many different types of hallucinations and deviations from critical service protocols. Such unpredictability poses a significant challenge in high-stakes environments where consistent behavior is non-negotiable.
Some try to address unpredictability in chatbots by employing traditional solutions to confine LLM responses through rigid flow charts, as seen in frameworks like LangFlow, LangGraph, or Rasa. These solutions guide interactions along linear paths, but this is already known to fail at managing real-world queries that may involve multiple intents and conversational paths that deviate from the flow designer's vision.
Moreover, adjusting responses in these contexts frequently necessitates tedious manual edits to flows and fragile modifications of prompts, posing risks of protocol breaches and unintended consequences. But even after all this, critical hallucinations still occur at an unacceptable level.
For example, if you've managed to use such frameworks to increase accuracy and correctness to an unprecedented 99%, that still exposes a bank servicing 1 million daily conversations to 10,000 new customer-facing mistakes to deal with every day, many of which can be practically unlimited in scope and severity. This is why enterprises are still averse to deploying customer-facing GenAI. But with Parlant, a framework now embraced by some of the largest financial services companies in the world, this is finally starting to change.
Parlant adopts a fundamentally different approach by developing an open-source conversational AI engine that allows developers to take control of their user-facing AI agents. Parlant is built by Emcie, an up-and-coming startup with leading software engineers from Microsoft, EverC, Check Point, and Dynamic Yield, along with natural language processing (NLP) researchers from the Weizmann Institute of Science, in collaboration with world-class Conversation Design experts from the Conversation Design Institute.
Parlant enables an AI Conversation Modeling system that automatically tailors responses from a large and dynamically controlled selection of pre-approved "utterances." Using these new conversation modeling paradigms, organisations can precisely control GenAI communications while maintaining the level of naturalness and flexibility expected of LLMs, as operators and designers can manage and refine utterances with adjustable freedom levels, and Parlant's engine applies intelligently applies them at the right time based on situational awareness and guidelines that you can provide it.
To simplify creating these utterances while prototyping, Parlant offers a 'Fluid Composition' mode where AI generates natural responses. This mode allows conversational designers to extract and tweak these auto-suggested responses into approved utterances while experimenting with their AI agents iteratively during development.
Once established, the system switches to the 'strict' mode, exclusively using pre-approved utterances to construct responses. This ensures predictability and control while preserving the AI's ability to creatively address diverse inquiries by intelligently utilising a large set of approved utterances using an LLM's natural capabilities to select the best responses precisely.
Parlant analyses the conversation context at runtime, determines the relevant set of utterance candidates, and dynamically applies them to produce a response. It also filters and selects guidelines based on the context, allowing the developer to achieve a high degree of behavioural control over their agents without sacrificing the ability to scale the agents' complexity. This runtime filtering of guidelines enables developers to support more conversational use cases while maintaining focused behaviour from their LLM in many different situations.
Moreover, Parlant lets you easily troubleshoot by tracing how and why each utterance was applied for any given response. This is made possible by using highly descriptive and explainable log outputs, produced by the LLM during the utterance selection process.
Parlant, an open-source project, is LLM-agnostic, meaning that it supports multiple LLM providers, including OpenAI, Google, Meta, and Anthropic, via multiple inference providers.
What allows Parlant to ensure aligned and expected outcomes from LLMs lies in the team's research focus on techniques to gain control over LLMs.
Emcie, the startup company behind Parlant, earlier this year published a research study titled 'Attentive Reasoning Queries (ARQ): A Systematic Method for Optimising Instruction-Following in Large Language Models'. The study outlines methods to optimise instruction following in LLMs.
Unlike free-form reasoning approaches such as Chain-of-Thought (CoT), Attentive Reasoning Queries (ARQs) guide LLMs through systematic, targeted queries that reinforce critical information and instructions and prevent hallucinations and attention drift.
The research also revealed test results where ARQs achieved a 90.2% success rate in correctly interpreting and applying instructions, outperforming CoT reasoning and direct response generation. The study also revealed that ARQs have the potential to be more computationally efficient than free-form reasoning when carefully designed.