2 Sources
[1]
OpenAI unveils ChatGPT-5 and its hyped 'PhD level' intelligence struggled with basic spelling and geography
Some users on social media found the model making basic errors. The chatbot repeatedly claimed there were three Bs in 'blueberry' The cutting edge of AI, dubbed "PhD level" intelligence by its creators, thinks there are three Rs in the words "Northern Territory". That is what users discovered after trying out the latest advancement of OpenAI, GPT-5. During a GPT-5 launch event on Thursday, the OpenAI CEO, Sam Altman, described the latest version of ChatGPT as like having "access to a PhD-level expert in your pocket", comparing the previous version to a college student, and the one before that a high school student. When users on social media attempted to challenge GPT-5, however, they found the model making basic errors in its responses. One Bluesky user found the chatbot repeatedly claimed there were three Bs in "blueberry". "Yep - blueberry is one of those words where the middle almost trips you up, like it's saying 'b-b-better pay attention,'" the chatbot said in the posted chat. "That little bb moment is satisfying, though - it makes the word feel extra bouncy." Another user found the chatbot unable to correctly identify the US states containing the letter R. And when asked to produce a map, it misspelled states including "Krizona" and "Vermoni". ChatGPT also double listed California and invented the states "New Jefst" and "Mitroinia". When Guardian Australia asked the latest model of ChatGPT to identify the number of Rs in Australia's states and territories, it could identify those which did. But the AI also believed Northern Territory had just three Rs not five. When asked to produce it on a map, spelled the territory as "Northan Territor" OpenAI was contacted for comment. The company said on launching the new product that there would be fewer errors and AI hallucinations under the latest model. The source of the issues may be the way GPT-5 works as a combination of its AI models. GPT-5 has a "real-time router" that quickly decides which of its model to use depending on the conversation type and intent. OpenAI has said that if you ask ChatGPT to "think hard about this" it uses the latest reasoning model. "The router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time," OpenAI's announcement stated. The chief executive of media and AI startup Every, Dan Shipper, found that the model would sometimes hallucinate on questions posed to it that should have been routed to the reasoning model. "If I take a picture of a passage in a novel and ask it to explain what's happening, GPT-5 will sometimes confidently make things up," he wrote. "If I ask it to 'think longer,' it will deliver an accurate answer." After launching GPT-5 on Thursday, the latest ChatGPT model was made available to the company's 700 million weekly users immediately. Altman said the AI had not yet reached artificial general intelligence level, but it was "generally intelligent" and a "significant step on the path to AGI".
[2]
GPT-5 Launch Demo Plagued With Catastrophically Dumb Errors
OpenAI's GPT-5 is finally here and already powering ChatGPT, but it hasn't made a great first impression. In a livestream dedicated to the release, OpenAI tried to show off its newest large language model which CEO Sam Altman called a "significant step along the path to AGI" -- but instead turned heads with some catastrophically dumb errors. Across several examples, bar graphs intended to show off GPT-5's awesome performance benchmarks, while appearing professional-looking, turned out to be horribly inaccurate nonsense upon closer inspection. The gaffes were flagged on social media and highlighted by The Verge. The most egregious example is a bar graph comparing coding benchmark scores for GPT-5 compared to older models. Somehow, the bar for GPT-5's score of 52.8 percent accuracy is nearly twice as tall as the bar for a score of 69.1 percent for the o3 model. Even more bafflingly, the 69.1 percent bar is the exact same size as another bar representing 30.8 percent for GPT-4o. Make it make sense! OpenAI hasn't confirmed if it used GPT-5 to generate the graphs -- and at this point, it has every reason not to -- but it's an incredibly embarrassing mistake from a company that's being valued in the region of half a trillion smackeroos. It's also a little poetic. Some research suggests that newer models could actually be getting dumber in key ways, hallucinating more frequently than earlier versions. One study even found that the longer these new reasoning models "think," the more their performance deteriorates. Other research implicates the AI slop that's increasingly poisoning the AI's training data. Circling back to GPT-5's bar graph, you have OpenAI trying to spin its lower score of 52.8 as actually being better than its predecessor's. Altman, playing it cool, tried to laugh off the blunder. Human error may or may not be to blame for the charts, but following GPT-5's release, users were quick to expose how error-prone its image- and diagram-generating capabilities remain. One asked ChatGPT to draw a map of two cities in Virginia with their neighborhoods labeled, prompting it to return names that were complete gobbledygook. And in what should've been a layup for GPT-5, Ed Zitron of the "Where's Your Ed At?" newsletter found that the AI couldn't even nail a simple map of the US. Ever think of visiting "West Wigina," "Delsware," "Fiorata," or "Rhoder land"? Or maybe "Tonnessee" and "Mississipo?" The irony is that OpenAI bragged back in March that an update for its previous GPT-4o model meant that ChatGPT could now excel at generating texts in images. "As you can tell now it's very good at text," one of the example generated images read. "Look at all this accurate text!" Sounds like they might've spoken too soon. Or maybe AI models really are going backwards.
Share
Copy Link
OpenAI's highly anticipated GPT-5 launch has been overshadowed by a series of basic errors in spelling, geography, and data representation, challenging claims of its "PhD level" intelligence and sparking debates about the true progress of AI technology.
OpenAI, the artificial intelligence research laboratory, has launched its latest language model, GPT-5, amidst great fanfare and expectations. However, the release has been marred by a series of embarrassing errors that have cast doubt on the company's claims of "PhD level" intelligence 1.
During the launch event, OpenAI CEO Sam Altman described GPT-5 as providing "access to a PhD-level expert in your pocket." However, users quickly discovered that the model struggled with basic spelling and geography tasks. Some of the notable errors include:
Source: Futurism
The launch presentation itself was not immune to errors. Bar graphs intended to showcase GPT-5's performance benchmarks contained glaring inaccuracies:
These visualization errors raised questions about whether GPT-5 was used to generate the graphs, although OpenAI has not confirmed this.
The errors have sparked a debate about the true progress of AI technology. Some researchers suggest that newer models might actually be deteriorating in key areas:
OpenAI has been contacted for comment on these issues. The company had previously stated that GPT-5 would have fewer errors and AI hallucinations compared to earlier models. CEO Sam Altman attempted to downplay the blunders, but the AI community and users have been quick to expose the model's shortcomings 1 2.
Despite these setbacks, OpenAI maintains that GPT-5 represents a significant step towards Artificial General Intelligence (AGI). The company has made the model available to its 700 million weekly users, emphasizing its "real-time router" that decides which AI model to use based on the conversation type and intent 1.
As the AI community grapples with these unexpected challenges, the GPT-5 launch serves as a reminder of the complexities involved in advancing language models and the importance of rigorous testing and validation in AI development.
Chinese state-affiliated media criticizes Nvidia's H20 chips, claiming security risks and lack of technological advancement, amid ongoing US-China tech tensions.
6 Sources
Technology
2 hrs ago
6 Sources
Technology
2 hrs ago
China is pushing for the US to ease restrictions on AI chip exports, particularly high-bandwidth memory chips, as part of trade negotiations ahead of a potential summit between Presidents Xi and Trump.
2 Sources
Business and Economy
2 hrs ago
2 Sources
Business and Economy
2 hrs ago
Apple's highly anticipated Siri upgrade and App Intents feature, initially planned for iOS 18, has been pushed back to iOS 26.4 due to engineering challenges. The company aims to revolutionize AI interactions on iPhones but faces concerns over app compatibility and accuracy in critical scenarios.
2 Sources
Technology
2 hrs ago
2 Sources
Technology
2 hrs ago
Amazon introduces Alexa+, a major AI upgrade to its voice assistant, aiming to compete with ChatGPT's conversational abilities. While offering improved features, early testing reveals significant bugs and reliability issues.
2 Sources
Technology
10 hrs ago
2 Sources
Technology
10 hrs ago
As electricity costs increase, states are under pressure to protect consumers from the growing energy demands of Big Tech data centers, with evidence suggesting that these facilities are contributing significantly to higher bills.
5 Sources
Business and Economy
1 day ago
5 Sources
Business and Economy
1 day ago