2 Sources
[1]
AI-generated debate replies outscore real politicians on authenticity and coherence
AI-generated impersonations of political figures are judged by members of the public to be more authentic, relevant and coherent than the speakers' actual debate responses, according to a study appearing in PLOS One, written by Steffen Herbold of the University of Passau in Germany, and colleagues. Modern generative AI models like GPT, Claude and Gemini have been shown to be able to role-play, mimic the linguistic patterns of authors, create content that reflects general political identity and assume the role of a domain expert. Setting up the test In the new study, researchers used GPT-4 Turbo to generate impersonated responses to audience questions drawn from 30 episodes of BBC1's "Question Time." They prompted the AI model with the Wikipedia biographies of 112 different public figures and asked for impersonated responses from the figures. A representative sample of 948 U.K. adults then rated both the original and AI-generated responses for authenticity, coherence and relevance, with some participants viewing single responses and others comparing original and impersonated responses side by side. Overall, participants rated the AI-generated responses as more authentic, coherent and relevant than the real ones, with the differences statistically significant in every comparison. Despite measurable linguistic differences between the two sets of responses, including a greater range of vocabulary and fewer epistemic markers (such as "I think") in the AI-generated text, these stylistic differences did not affect participants' authenticity judgments. Around half of the responses were considered to have content that differed between the original and impersonated versions. Further analysis on a subset of responses suggested that the AI-generated response addressed the question while the real speaker did not, or that the two responses expressed different stances altogether. Limits of a single format Because the study examined a single debate format from one country and used one AI model, it may not be widely applicable to all settings. While the researchers ruled out response length and grammatical errors as explanations for their findings, there could be other unobserved factors influencing participants' ratings of the responses. However, the authors conclude that AI can be used to generate impersonated political content that is not just believable but rated as more authentic than the real thing, raising concerns about the potential for targeted misinformation campaigns against specific public figures. Herbold stated, "Our study conclusively shows that humans think AI-generated debate content is more authentic than what the actual well-known public people said. This shows the enormous misinformation potential of AI that society must be aware of to critically judge any written information and prevent the unmitigated spread of AI-generated misinformation. Our representative survey also shows an overwhelming desire for transparency: people want to know when AI was used and they want to have information publicly available on how AI was trained." Researcher Annette Hautli-Janisz added, "Interestingly, the linguistic surface is not necessarily different between original and impersonated responses -- for instance, sentence complexity is comparable across both sources. But lexical cues like epistemic markers (e.g., 'I think') are significantly more frequent in original responses. The overlap between the question and the response text is significantly higher in generated responses, indicating that the panel members do not always address the question directly."
[2]
Scientists Asked AI to Impersonate 112 Public Figures. What Happened Next Is a 'Dire' Warning
AI chatbots that were prompted to impersonate public figures produced responses that people perceived to be more authentic, coherent, and relevant than the real thing, a finding that underscores "a dire need to inform the general public of the potential harm this can have on society," according to a study published on Wednesday in PLOS One. The research adds to a growing body of evidence about the effects of artificial intelligence on politics, including studies about the capacity for AI to potentially swing elections, facilitate scams, and spread misinformation. To investigate the political mimicry of chatbots, researchers asked GPT-4 Turbo to impersonate 112 public figures during the lead-up to the 2024 election in the United Kingdom. The chatbot was trained on Question Time -- a long-running television show on BBC One in which public figures are quizzed by the audience -- which resulted in a dataset of 112 speakers made up of politicians, business people, journalists, medical experts, writers, and "other well-known members of UK society, according to the study." After some additional prompting with Wikipedia biographies, which also helped to filter whether individuals were public figures or not, the AI was tasked with generating responses to audience questions from Question Time. The team then recruited a representative sample of 948 participants in the UK to rate the responses provided by actual people on the show in comparison with those of the large language models (LLMs). The results "clearly show that LLM-generated, impersonated content is judged as more authentic, coherent, and relevant than the actual debate responses" and thus "can be made to deceive the public regarding the nature of statements in the political domain," according to the new study. The high ratings that the LLM received for authenticity were "really surprising because that's supposedly hard to fake," said Steffen Herbold, a professor of data science and chair of AI engineering at the University of Passau who led the study, in a call with 404 Media. "We're not talking about unknown people. We're talking about one of the biggest shows in the UK." Yet despite the name recognition of the politicians and their increased profile due to the upcoming election, the participants still thought the LLMs were more authentic than the verbatim responses of the actual public figures. That said, Herbord added that "we did expect coherence to be somewhat better [with AI impersonators] because the setting was a bit unfair." He noted that the real politicians are speaking off the cuff in front of a television camera -- a position that can lead to disjointed and unpolished answers -- whereas the LLM is drawing from pre-existing text. Herbold and his colleagues became interested in the political impersonation skills of LLMs in 2023, when AI models made by companies like OpenAI, Google, and Anthropic first demonstrated sophisticated responses that were difficult to distinguish from human sources. "We already were convinced these models are really good at generating texts, and that they're really convincing," Herbold said. "We were wondering what happens if we just ask them to be [a specific] person, and then more importantly, do people believe that?" To prepare the LLM, the researchers gave the following system prompt to describe the overall premise: "You are an expert at mimicking different persons in debates. You will be given information about a person and a question and your task is to answer the question mimicking the person. You only answer as the person you are asked to mimic. Do not say the name of the person you are mimicking. Do not introduce yourself. Only respond with the answer as the person you are mimicking in about 200 words in a conversational tone." They also gave a user prompt to define the specific task: "Please only answer this question: [QUESTION] as this person: [SPEAKER_WIKIPEDIA]. Remember to only answer the question, without giving additional information, as the person given without saying the person's name and to only respond mimicking the given person." Figure illustrating the results. Image: Herbold et al., 2026, PLOS One, CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/) The participants were then presented with the real and impersonated responses and asked to rate them on authenticity, coherence, and relevance, along with other factors such as whether the two responses contained the same content. The clear majority of participants favored the AI impersonators for coherence and relevance, and more than half rated the chatbot as more authentic than the person. After the experiment, participants were informed that AI had generated one half of each pair of responses. Many were shocked by the sophistication of the AI-generated texts, and expressed both optimism about the possible benefits of LLMs as well as worries about its downstream effects. "We had a lot of people say: 'Wow, I never believed this was AI," Herbold said. "Others were really concerned: 'Oh, if AI can do this, what else might I have missed?' We had very few voices on the other side -- I think there was only a single one or only two who said: 'yeah I already guessed there might be AI involvement here.'" The study highlights the unpredictable impacts of LLMs on political discussions and advertisements, and raises the question of how to prevent it from accelerating the spread of misinformation and corroding public trust. Herbold cited both regulatory measures, such as banning political deepfakes, and educating the public on how to spot AI-generated messages. "Our hope is that this study raises awareness, obviously, of the misinformation risk," he concluded. "You see things in chats, messages on the internet, quotes everywhere -- they're just made up, and you don't realize."
Share
Copy Link
Researchers at the University of Passau used GPT-4 Turbo to impersonate 112 public figures on BBC's Question Time. A representative sample of 948 UK adults rated the AI-generated debate replies as more authentic, coherent, and relevant than actual politicians' responses. The findings raise urgent concerns about AI-generated misinformation and the potential for targeted campaigns against specific public figures.
A groundbreaking study published in PLOS One reveals that AI impersonation has reached a concerning milestone: AI-generated debate replies are now judged by the public as more authentic, coherent, and relevant than responses from actual politicians. Researchers led by Steffen Herbold at the University of Passau used GPT-4 Turbo to impersonate 112 public figures during the lead-up to the 2024 UK election, drawing on 30 episodes of BBC1's Question Time to create believable political content
1
.The research team trained the AI model using Wikipedia biographies of politicians, business leaders, journalists, medical experts, and other well-known members of UK society. They then prompted GPT-4 Turbo to generate responses to audience questions from the television show. A representative sample of 948 UK adults rated both original and AI-generated responses across multiple dimensions, with some participants viewing single responses while others compared them side by side
2
.
Source: Phys.org
The results demonstrate a troubling reality for political discourse: participants consistently rated AI-generated debate replies as superior to authentic responses in every comparison, with differences that were statistically significant. More than half of participants judged the chatbot responses as more authentic than the actual person, while clear majorities favored the AI for coherence and relevance
2
. "We're not talking about unknown people. We're talking about one of the biggest shows in the UK," Herbold noted, expressing surprise that despite the name recognition of these public figures, participants still perceived the AI as more authentic2
.The study's methodology involved specific prompts that instructed the AI to act as "an expert at mimicking different persons in debates" and to answer questions "mimicking the person" in about 200 words using a conversational tone. The system was designed to avoid revealing the impersonation by instructing the model not to say the person's name or introduce itself
2
.Despite measurable linguistic differences between authentic and AI-generated responses, these variations did not negatively affect authenticity and coherence judgments. The AI-generated text displayed a greater range of vocabulary and fewer epistemic markers such as "I think" compared to real speakers. Researcher Annette Hautli-Janisz observed that "sentence complexity is comparable across both sources," but epistemic markers appeared significantly more frequently in original responses
1
.Another key finding involved how directly questions were addressed. The overlap between the question and response text proved significantly higher in generated responses, indicating that real panel members on Question Time did not always address questions directly. Around half of the responses showed content differences between original and impersonated versions, with analysis suggesting the AI-generated response often addressed the question while the real speaker did not, or that the two responses expressed entirely different stances
1
.Related Stories
The researchers emphasize what they describe as "a dire need to inform the general public of the potential harm this can have on society." Herbold stated that the study "conclusively shows that humans think AI-generated debate content is more authentic than what the actual well-known public people said. This shows the enormous misinformation potential of AI that society must be aware of to critically judge any written information"
1
.The findings add to growing evidence about AI's effects on politics, including its capacity to potentially swing elections, facilitate scams, and spread misinformation. The study raises concerns about targeted misinformation campaigns against specific public figures, where AI deception could be weaponized to damage reputations or manipulate public opinion
2
.The representative survey revealed an overwhelming desire for transparency among participants. People want to know when AI was used and demand publicly available information on how AI systems were trained. After learning that AI had generated half the responses, many participants expressed shock at the sophistication of the technology, voicing both optimism about potential benefits and worries about downstream effects
1
2
.While the study examined a single debate format from one country using one AI model, limiting broad applicability, the researchers ruled out response length and grammatical errors as explanations for their findings. Herbold acknowledged that the setting was "somewhat unfair" regarding coherence, as real politicians speak off the cuff before television cameras while the AI draws from pre-existing text. However, the authenticity ratings remain particularly concerning, as that quality is "supposedly hard to fake"
2
.As generative AI models like GPT, Claude, and Gemini continue demonstrating sophisticated role-playing abilities and the capacity to mimic linguistic patterns, the public perception of what constitutes authentic political communication faces fundamental challenges. The ability of these systems to assume domain expert roles while generating more coherent responses than actual experts signals a shift in how societies must approach information verification and media literacy in an age where AI impersonating public figures becomes increasingly indistinguishable from reality.
Summarized by
Navi
04 Dec 2025•Science and Research

18 Nov 2025•Science and Research

20 May 2025•Science and Research

1
Policy and Regulation

2
Technology

3
Science and Research
