AI chatbots interpret 'likely' as 80% probability while humans assume 65%, new study reveals

Reviewed byNidhi Govil

2 Sources

Share

A new study published in NPJ Complexity reveals that AI chatbots and humans interpret probability words differently, with language models assigning 'likely' to 80% while humans assume 65%. This probability misalignment poses risks in high-stakes fields like healthcare and government policy, where miscommunication about uncertainty could lead to flawed decisions.

Language Models Struggle with Probability Communication

When an AI chatbot like ChatGPT describes something as "likely" or "probable," it's not communicating the same odds that humans understand. A recent NPJ Complexity study reveals a critical gap in AI and human interpretation of uncertainty, showing that language models assign "likely" to an 80% probability while humans typically interpret it as closer to 65%

1

2

. This probability misalignment extends beyond simple miscommunication, representing a fundamental challenge for human-AI interaction in critical applications.

Source: The Conversation

Source: The Conversation

The research focused on words of estimative probability, including terms like "maybe," "probably," and "almost certain." While AI models and humans tend to agree on extremes like "impossible," they diverge sharply on hedge words. Humans interpret these terms based on contextual cues and personal experiences, drawing from real-world situations to assess likelihood

1

. Language models, however, appear to average over conflicting usages in their training data, leading to interpretations that don't align with human understanding of uncertainty.

Biases in LLM Interpretations Emerge Across Languages and Gender

The study uncovered additional layers of complexity in how AI chatbot systems process probability language. When prompts shifted from "he" to "she," the models' probability estimates became more rigid, exposing biases embedded in training data

2

. Language choice also matters significantly. When researchers changed prompts from English to Chinese, probability estimates shifted, possibly reflecting cultural differences in how people express and understand uncertainty across languages

1

. These findings suggest that gendered language and linguistic context introduce systematic variations in how models communicate risk.

High-Stakes Applications Face Risk from Flawed Decisions

This misalignment poses serious implications for AI safety as language models expand into healthcare, government policy, and scientific reporting. If an AI assistant helping a doctor describes a side effect as "unlikely," but the model's internal calculation differs significantly from the doctor's interpretation, the resulting decision could be flawed

1

. The disconnect becomes a matter of public trust when AI systems summarize medical research or inform policy decisions where understanding uncertainty accurately is essential.

Researchers have studied how humans quantify uncertainty since the 1960s, when CIA analysts pioneered methods to improve intelligence reporting. The current study treats the interaction between humans and AI as a biological-like system where meaning can degrade, moving beyond measuring whether an AI is "smart" to asking if it is aligned

1

. Other researchers are exploring whether chain-of-thought prompting can fix these errors, but the study found that even advanced reasoning doesn't always bridge the gap between statistical data and verbal labels.

Future Development Must Address Communication Reliability

Looking ahead, developers aim to create models that don't just predict the next likely word but actually understand the weight of uncertainty they convey. Researchers are calling for more robust consistency metrics to ensure that if a model sees a 10% chance in the data, it chooses the same word every time

1

. As AI systems increasingly manage schedules and summarize scientific papers, ensuring that "probably" means "probably" is a vital step in making these systems reliable partners rather than sophisticated parrots. The path forward requires addressing not just technical performance but fundamental alignment in how machines and humans communicate risk and uncertainty.

Source: Fortune

Source: Fortune

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo