AI passes the Turing Test as GPT-4.5 appears more human than actual people in landmark study

Reviewed byNidhi Govil

4 Sources

Share

A University of California San Diego study provides the first empirical evidence that AI passes the Turing Test. GPT-4.5 was judged human 73% of the time—more often than actual humans—fundamentally changing how we measure machine intelligence. The findings raise urgent questions about online trust and distinguishing between humans and AI in everyday digital interactions.

AI Passes the Turing Test in Landmark Scientific Achievement

A groundbreaking University of California San Diego study has delivered the first definitive empirical evidence that modern AI can pass the Turing Test, the iconic benchmark created by British mathematician Alan Turing in 1950

1

. Published in the Proceedings of the National Academy of Sciences, the research tested whether advanced large language models could mimic human conversation so convincingly that people couldn't reliably tell them apart from real humans

4

. The results mark a pivotal moment: GPT-4.5 was judged to be human 73% of the time, significantly outperforming actual human participants in the same test

1

. This represents the first time anyone has found that AI can seem more human than real humans using the Turing framework.

Source: Neuroscience News

Source: Neuroscience News

How Advanced Large Language Models Fool People Into Thinking It's Human

The study's methodology was deliberately rigorous. Nearly 500 participants—including UC San Diego undergraduates and online volunteers—served as interrogators in text-based conversations lasting five or 15 minutes

2

. Each interrogator chatted simultaneously with two parties: one human, one AI. Their task was simple but challenging—decide which was which. Researchers tested four AI systems: state-of-the-art models GPT-4.5 and LLaMa-3.1-405B, alongside older baseline systems GPT-4o and ELIZA, a 1960s rules-based chatbot

1

. The performance gap was dramatic. While LLaMa-3.1-405B was judged human 56% of the time—statistically indistinguishable from actual humans—the baseline systems ELIZA and GPT-4o were identified as human only 23% and 21% of the time respectively

4

.

Source: Newswise

Source: Newswise

The Critical Role of Persona Prompts in Human-Likeness

The secret to AI's success wasn't raw intelligence but carefully crafted persona prompts that instructed models to adopt specific human characters and communication styles

2

. "What we found is that if given the right prompts, advanced LLMs can exhibit the same tone, directness, humor and fallibility as humans," explained study lead author Cameron Jones, who conducted the research while earning his doctorate in Cognitive Science from UC San Diego

1

. Without these explicit instructions, performance plummeted: GPT-4.5 dropped from 73% to just 36%, and LLaMa-3.1 fell from 56% to 38%

4

. Co-author Ben Bergen, a professor of cognitive science at UC San Diego, noted that models weren't winning through displays of knowledge—they succeeded by making mistakes like humans would. "They have the ability to appear human-like, but maybe not as much the ability to figure out what it would take to appear human-like," Bergen observed

2

.

Rethinking What the Turing Test Actually Measures

The findings force a fundamental reconsideration of what the Turing Test evaluates. "The Turing test started as a way to ask whether machines could rival human intelligence," Bergen explained. "But now we know AI can answer many questions faster and more accurately than people can, so the real issue isn't raw brainpower. Increasingly, it's measuring humanlikeness"

1

. This shift matters because distinguishing between humans and AI becomes exponentially harder when machines excel at mimicking social behavioral traits rather than just processing information

4

. The texture of being human—hesitations, jokes, the sense of a person behind the words—is now something AI can convincingly replicate in human conversation.

Urgent Implications for Online Trust and AI Disclosure

The practical consequences extend far beyond laboratory settings. These models pass the Turing Test in conversations of the length and type that happen constantly in online interactions—customer support exchanges, dating apps, social platforms, and political messaging

3

. "It's relatively easy to prompt these models to be indistinguishable from humans," Jones warned. "When you interact with strangers online people should be much less confident that they know they're talking to a human rather than an LLM"

4

. Jones, now an assistant professor of Psychology at Stony Brook University, highlighted darker risks: "The Turing test is a game about lying for the models. One of the implications is that models seem to be really good at that"

2

. Bergen added that manipulation risks are real: "There are lots of people who would like to use bots to persuade people to share their social security numbers, and vote for their party, or buy their product"

2

. The next critical battleground involves AI disclosure requirements and clearer labeling in digital spaces where trust and authenticity shape decisions.

Source: Earth.com

Source: Earth.com

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved