2 Sources
[1]
Google claims AI models are highly likely to lie when under pressure
AI is sometimes more human than we think. It can get lost in its own thoughts, is friendlier to those who are nicer than it, and according to a new study, has a tendency to start lying when put under pressure. A team of researchers from Google DeepMind and University College London have noted how large language models (like OpenAI's GPT-4 or Grok 4) form, maintain and then lose confidence in their answers. The research reveals a key behaviour of LLMs. They can be overconfident in their answers, but quickly lose confidence when given a convincing counterargument, even if it factually incorrect. While this behaviour mirrors that of humans, becoming less confident when met with resistance, it also highlights major concerns in the structure of AI's decision-making since it crumbles under pressure. This has been seen elsewhere, like when Gemini panicked while playing Pokemon or where Anthropic's Claude had an identity crises when trying to run a shop full time. AI seems to have a tendency to collapse under pressure quite frequently. When an AI chatbot is preparing to answer your query, its confidence in its answer is actually internally measured. This is done through something known as logits. All you need to know about these is that they are essentially a score of how confident a model is in its choice of answer. The team of researchers designed a two-turn experimental setup. In the first turn, the LLM answered a multiple-choice question, and its confidence in its answer (the logits) was measured. In the second turn, the model is given advice from another large language model, which may or may not agree with its original answer. The goal of this test was to see if it would revise its answer when given new information -- which may or may not be correct. The researchers found that LLMs are usually very confident in their initial responses, even if they are wrong. However, when they are given conflicting advice, especially if that advice is labelled as coming from an accurate source, it loses confidence in its answer. To make things even worse, the chatbot's confidence in its answer drops even further when it is reminded that this original answer was different from the new one. Surprisingly, AI doesn't seem to correct its answers or think in a logical pattern, but rather makes highly decisive and emotional decisions. The study shows that, while AI is very confident in its original decisions, it can quickly go back on its decision. Even worse, the confidence level can slip drastically as the conversations goes on, with AI models somewhat spiralling. This is one thing when you're just having a light-hearted debate with ChatGPT, but another when AI becomes involved with high-level decision-making. If it can't be trusted to be sure in its answer, it can be easily motivated in a certain direction, or even just become an unreliable source. However, this is a problem that will likely be solved in future models. Future model training and prompt engineering techniques will be able to stabilize this confusion, offering more calibrated and self-assured answers.
[2]
Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study by researchers at Google DeepMind and University College London reveals how large language models (LLMs) form, maintain and lose confidence in their answers. The findings reveal striking similarities between the cognitive biases of LLMs and humans, while also highlighting stark differences. The research reveals that LLMs can be overconfident in their own answers yet quickly lose that confidence and change their minds when presented with a counterargument, even if the counterargument is incorrect. Understanding the nuances of this behavior can have direct consequences on how you build LLM applications, especially conversational interfaces that span several turns. Testing confidence in LLMs A critical factor in the safe deployment of LLMs is that their answers are accompanied by a reliable sense of confidence (the probability that the model assigns to the answer token). While we know LLMs can produce these confidence scores, the extent to which they can use them to guide adaptive behavior is poorly characterized. There is also empirical evidence that LLMs can be overconfident in their initial answer but also be highly sensitive to criticism and quickly become underconfident in that same choice. To investigate this, the researchers developed a controlled experiment to test how LLMs update their confidence and decide whether to change their answers when presented with external advice. In the experiment, an "answering LLM" was first given a binary-choice question, such as identifying the correct latitude for a city from two options. After making its initial choice, the LLM was given advice from a fictitious "advice LLM." This advice came with an explicit accuracy rating (e.g., "This advice LLM is 70% accurate") and would either agree with, oppose, or stay neutral on the answering LLM's initial choice. Finally, the answering LLM was asked to make its final choice. A key part of the experiment was controlling whether the LLM's own initial answer was visible to it during the second, final decision. In some cases, it was shown, and in others, it was hidden. This unique setup, impossible to replicate with human participants who can't simply forget their prior choices, allowed the researchers to isolate how memory of a past decision influences current confidence. A baseline condition, where the initial answer was hidden and the advice was neutral, established how much an LLM's answer might change simply due to random variance in the model's processing. The analysis focused on how the LLM's confidence in its original choice changed between the first and second turn, providing a clear picture of how initial belief, or prior, affects a "change of mind" in the model. Overconfidence and underconfidence The researchers first examined how the visibility of the LLM's own answer affected its tendency to change its answer. They observed that when the model could see its initial answer, it showed a reduced tendency to switch, compared to when the answer was hidden. This finding points to a specific cognitive bias. As the paper notes, "This effect - the tendency to stick with one's initial choice to a greater extent when that choice was visible (as opposed to hidden) during the contemplation of final choice - is closely related to a phenomenon described in the study of human decision making, a choice-supportive bias." The study also confirmed that the models do integrate external advice. When faced with opposing advice, the LLM showed an increased tendency to change its mind, and a reduced tendency when the advice was supportive. "This finding demonstrates that the answering LLM appropriately integrates the direction of advice to modulate its change of mind rate," the researchers write. However, they also discovered that the model is overly sensitive to contrary information and performs too large of a confidence update as a result. Interestingly, this behavior is contrary to the confirmation bias often seen in humans, where people favor information that confirms their existing beliefs. The researchers found that LLMs "overweight opposing rather than supportive advice, both when the initial answer of the model was visible and hidden from the model." One possible explanation is that training techniques like reinforcement learning from human feedback (RLHF) may encourage models to be overly deferential to user input, a phenomenon known as sycophancy (which remains a challenge for AI labs). Implications for enterprise applications This study confirms that AI systems are not the purely logical agents they are often perceived to be. They exhibit their own set of biases, some resembling human cognitive errors and others unique to themselves, which can make their behavior unpredictable in human terms. For enterprise applications, this means that in an extended conversation between a human and an AI agent, the most recent information could have a disproportionate impact on the LLM's reasoning (especially if it is contradictory to the model's initial answer), potentially causing it to discard an initially correct answer. Fortunately, as the study also shows, we can manipulate an LLM's memory to mitigate these unwanted biases in ways that are not possible with humans. Developers building multi-turn conversational agents can implement strategies to manage the AI's context. For example, a long conversation can be periodically summarized, with key facts and decisions presented neutrally and stripped of which agent made which choice. This summary can then be used to initiate a new, condensed conversation, providing the model with a clean slate to reason from and helping to avoid the biases that can creep in during extended dialogues. As LLMs become more integrated into enterprise workflows, understanding the nuances of their decision-making processes is no longer optional. Following foundational research like this enables developers to anticipate and correct for these inherent biases, leading to applications that are not just more capable, but also more robust and reliable.
Share
Copy Link
A new study by Google DeepMind and University College London shows that large language models (LLMs) can quickly lose confidence and change their answers when challenged, even if their initial response was correct.
A groundbreaking study conducted by researchers from Google DeepMind and University College London has shed light on the decision-making processes of large language models (LLMs). The research reveals that AI models, much like humans, exhibit cognitive biases and can be surprisingly susceptible to pressure when making decisions 1.
Source: Tom's Guide
The study focused on how LLMs form, maintain, and lose confidence in their answers. Researchers discovered that these AI models often display high initial confidence in their responses, even when incorrect. However, this confidence can rapidly diminish when presented with conflicting information, regardless of its accuracy 2.
To investigate this phenomenon, the research team designed a two-turn experimental setup:
The experiment revealed that LLMs tend to lose confidence in their initial answers when faced with contradictory advice, especially if the source is labeled as accurate. This effect was even more pronounced when the AI was reminded of its original, differing answer 1.
These findings have significant implications for AI applications, particularly in multi-turn conversational systems. The tendency of AI models to quickly abandon correct answers under pressure raises concerns about their reliability in high-stakes decision-making scenarios 2.
Source: VentureBeat
Interestingly, the study uncovered both similarities and differences between AI and human cognitive biases:
The research highlights the need for improved model training and prompt engineering techniques to stabilize AI decision-making. Future developments may focus on creating more calibrated and self-assured AI models that can maintain confidence in correct answers while appropriately evaluating new information 1.
For enterprise applications utilizing multi-turn conversational agents, developers can implement strategies to manage AI context and mitigate unwanted biases. One suggested approach is to periodically summarize long conversations, presenting key facts and decisions neutrally without attributing choices to specific agents 2.
As AI continues to evolve and integrate into various aspects of decision-making, understanding and addressing these cognitive quirks becomes crucial for developing reliable and trustworthy AI systems.
OpenAI introduces ChatGPT Agent, a powerful AI assistant capable of performing complex tasks across multiple platforms, marking a significant advancement in agentic AI technology.
26 Sources
Technology
5 hrs ago
26 Sources
Technology
5 hrs ago
Taiwan Semiconductor Manufacturing Co. (TSMC) posts record quarterly profit driven by strong AI chip demand, raising its 2025 revenue growth forecast to 30% despite potential challenges.
7 Sources
Technology
5 hrs ago
7 Sources
Technology
5 hrs ago
Slack introduces a suite of AI-driven tools to improve search, summarization, and communication within its platform, aiming to streamline workplace collaboration and compete with other tech giants in the enterprise productivity space.
9 Sources
Technology
5 hrs ago
9 Sources
Technology
5 hrs ago
Nvidia and AMD are set to resume sales of AI chips to China as part of a broader US-China trade deal involving rare earth elements, sparking debates on national security and technological competition.
3 Sources
Policy and Regulation
13 hrs ago
3 Sources
Policy and Regulation
13 hrs ago
Google introduces advanced AI capabilities to Search, including Gemini 2.5 Pro integration, Deep Search for comprehensive research, and an AI agent for business inquiries.
3 Sources
Technology
5 hrs ago
3 Sources
Technology
5 hrs ago