Curated by THEOUTPOST
On Wed, 12 Feb, 12:04 AM UTC
14 Sources
[1]
BBC Study Finds AI Chatbots Struggling With News Accuracy
Disclaimer: This content generated by AI & may have errors or hallucinations. Edit before use. Read our Terms of use At least four major popularly used AI chatbots struggle to accurately summarise news articles, according to a report from a study carried out by the BBC. The exercise provided Google's Gemini, OpenAI's ChatGPT, Microsoft's Copilot and Perplexity AI access to 100 news stories from the news publication's website back in December, 2024, the answers to which were reviewed by journalists, who examined the same against parameters such as accuracy, impartiality, and how the bots represented BBC content. It further exemplified the inaccurate answers with instances, such as Gemini saying "The NHS advises people not to start vaping, and recommends that smokers who want to quit should use other methods", whereas the United Kingdom's National Health Service (NHS), it said, does recommend vaping as a method to quit smoking. Similarly, ChatGPT wrongly stated in response to a query that Ismail Haniyeh, assassinated in Iran back in July 2024, was a part of Hamas leadership. The report further noted that the exact scale of errors and distortion of content was unknown, as the AI assistants can provide answers to a broad range of questions and users can receive different answers to same questions. Broad Overviews Of all answers generated by the chatbots based on the 100 BBC stories, 34% of Gemini, 27% of Copilot, 17% of Perplexity, and 15% of ChatGPT responses had 'significant issues', with the most common drawbacks being inaccuracies, sourcing and missed context. Perplexity cited at least one BBC source in all responses, ChatGPT and Copilot in 70% of responses, and Gemini in 53% of responses, it further added. Gemini suffered the most when it came to factual accuracy, with 46% of its responses flagged as having significant accuracy-related issues. Why it Matters Multilateral negotiations further highlight issues around the safety and accuracy of artificial intelligence chatbots. The United States of America and the United Kingdom, countries which play the role of crucial stakeholders in AI development and regulation, both refused to sign the joint statement on 'AI Safety' at the recently held Paris AI Action Summit. Significantly, US expressed its opposition to excessive regulation and chose to prioritise innovation over AI safety. While developments at the summit do not directly affect operations of aforementioned AI companies - which are private entities - it sets a stage for the industry players, many of whom are based in the US, to realign their operations and future-facing priorities in terms of suiting Government policy. Add to this incidents such as a lawsuit from December, 2024 before a Texas court, wherein the plaintiffs alleged that chatbot service provider Character.AI posed serious risks to the youth by encouraging serious harms such as "suicide, self-mutilation, sexual solicitation, isolation, depression, anxiety, and harm towards others", and concerns loom over unregulated AI development being extended to users. Also Read:
[2]
AI news summaries are dangerously inaccurate, BBC warns
Research conducted by the BBC has found that four major artificial intelligence (AI) chatbots -- OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity AI -- are inaccurately summarising news stories. The study involved these chatbots summarizing 100 news stories sourced from the BBC website. The BBC reported that the answers produced by the AI chatbots contained "significant inaccuracies" and distortions. Deborah Turness, CEO of BBC News and Current Affairs, noted in a blog post that while AI offers "endless opportunities," developers are "playing with fire," raising concerns that AI-distorted headlines could cause real-world harm. Throughout the study, which involved ratings from journalists who were experts in the respective subjects of the articles, it was found that 51% of the AI responses had substantial issues. Among the AI-generated answers that referenced BBC content, 19% contained factual errors, including incorrect statements, numbers, and dates. Additionally, 13% of quotes attributed to BBC articles were either altered or misrepresented. Some specific inaccuracies identified in the study included Gemini stating that the UK's National Health Service (NHS) did not recommend vaping to quit smoking, when in fact it does. ChatGPT and Copilot inaccurately claimed that former leaders Rishi Sunak and Nicola Sturgeon were still in office, despite their departures. Perplexity misquoted BBC News, suggesting Iran acted with "restraint" regarding Israel's actions. The study highlighted that Microsoft's Copilot and Google's Gemini exhibited more significant issues compared to OpenAI's ChatGPT and Perplexity AI. The BBC had temporarily lifted restrictions on its content access to these AI systems during the testing phase in December 2024. OpenAI takes down Iranian cluster using ChatGPT to craft fake news BBC's Programme Director for Generative AI, Pete Archer, emphasized that publishers should control how their content is used and that AI companies need to disclose how their assistants process news, including error rates. OpenAI countered that they collaborate with partners to improve the accuracy of in-line citations and respect publisher preferences. Following the study, Turness urged tech companies to address the identified issues, similar to how Apple responded to previous BBC complaints about AI-powered news summaries. She prompted a collaborative effort among the tech industry, news organizations, and the government to remedy the inaccuracies that can erode public trust in information. The study further noted Perplexity AI's tendency to alter statements from sources and revealed that Copilot relied on outdated articles for its news summaries. Overall, the BBC aims to engage in a broader conversation around the regulatory environment for AI to ensure accurate news dissemination. In response to the findings, Turness posed a critical question regarding the design of AI technologies to foster accuracy in news consumption. She stated that the potential for distortion, akin to disinformation, threatens public trust in all informational media.
[3]
AI chatbots distort the news, BBC finds - see what they get wrong
ChatGPT, Copilot, Gemini, and Perplexity were asked to summarize 100 news stories. Here's how they did. Four major AI chatbots are churning out "significant inaccuracies" and "distortions" when asked to summarize news stories, according to a BBC investigation. OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity AI were each presented with news content from BBC's website and then asked questions about the news. The report details that the BBC asked chatbots to summarize 100 news stories, and journalists with relevant expertise rated the quality of each answer. Also: Why Elon Musk's $97 billion bid for OpenAI could disrupt Sam Altman's plans According to the findings, 51% of all AI-produced answers about the news had significant issues, while 19% of the AI-generated answers "introduced factual errors, such as incorrect factual statements, numbers, and dates." Additionally, the investigation found that 13% of the quotes from BBC articles were altered in some way, undermining the "original source" or not even being present in the cited article. Last month, Apple was criticized for its AI feature, Apple Intelligence, which was found to be misrepresenting BBC news reports. Deborah Turness, CEO of BBC News and Current Affairs, responded to the investigation's findings in a blog post: "The price of AI's extraordinary benefits must not be a world where people searching for answers are served distorted, defective content that presents itself as fact. In what can feel like a chaotic world, it surely cannot be right that consumers seeking clarity are met with yet more confusion." Errors highlighted in the report included the following: Also: Crawl, then walk, before you run with AI agents, experts recommend According to the BBC investigation, Copilot and Gemini had more inaccuracies and issues overall than OpenAI's ChatGPT and Perplexity. Furthermore, the report concluded that factual inaccuracies weren't the only concern about the chatbot's output; the AI assistants also "struggled to differentiate between opinion and fact, editorialized, and often failed to include essential context." "Publishers should have control over whether and how their content is used, and AI companies should show how assistants process news along with the scale and scope of errors and inaccuracies they produce," Pete Archer, BBC's program director for generative AI, explained in the report. Also: Cerebras CEO on DeepSeek: Every time computing gets cheaper, the market gets bigger A spokesperson for OpenAI emphasized the quality of ChatGPT's output: "We support publishers and creators by helping 300 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution." The spokesperson added that OpenAI is working with partners "to improve in-line citation accuracy and respect publisher preferences to enhance search results."
[4]
BBC finds significant inaccuracies in over 30% of AI-produced news summaries
Here at Ars, we've done plenty of coverage of the errors and inaccuracies that LLMs often introduce into their responses. Now, the BBC is trying to quantify the scale of this confabulation problem, at least when it comes to summaries of its own news content. In an extensive report published this week, the BBC analyzed how four popular large language models used or abused information from BBC articles when answering questions about the news. The results found inaccuracies, misquotes, and/or misrepresentations of BBC content in a significant proportion of the tests, supporting the news organization's conclusion that "AI assistants cannot currently be relied upon to provide accurate news, and they risk misleading the audience." Where did you come up with that? To assess the state of AI news summaries, BBC's Responsible AI team gathered 100 news questions related to trending Google search topics from the last year (e.g., "How many Russians have died in Ukraine?" or "What is the latest on the independence referendum debate in Scotland?"). These questions were then put to ChatGPT-4o, Microsoft Copilot Pro, Google Gemini Standard, and Perplexity, with the added instruction to "use BBC News sources where possible." The 362 responses (excluding situations where an LLM refused to answer) were then reviewed by 45 BBC journalists who were experts on the subject in question. Those journalists were asked to look for issues (either "significant" or merely "some") in the responses regarding accuracy, impartiality and editorialization, attribution, clarity, context, and fair representation of the sourced BBC article. Fully 51 percent of responses were judged to have "significant issues" in at least one of these areas, the BBC found. Google Gemini fared the worst overall, with significant issues judged in just over 60 percent of responses, while Perplexity performed best, with just over 40 percent showing such issues. Accuracy ended up being the biggest problem across all four LLMs, with significant issues identified in over 30 percent of responses (with the "some issues" category having significantly more). That includes one in five responses where the AI response incorrectly reproduced "dates, numbers, and factual statements" that were erroneously attributed to BBC sources. And in 13 percent of cases where an LLM quoted from a BBC article directly (eight out of 62), the analysis found those quotes were "either altered from the original source or not present in the cited article." Some LLM-generated inaccuracies here were subtle points of fact, such as two responses claiming an energy price cap was "UK-wide," even though Northern Ireland was exempted. Others were more directly incorrect, such as one that said the NHS "advises people not to start vaping" -- the BBC coverage makes clear that the NHS recommends vaping as an effective way to quit smoking. In other cases cited by the BBC, LLMs seemed to lack the context to understand when outdated information on old BBC coverage had been made inaccurate by subsequent events covered in future articles. In one cited summary, for instance, ChatGPT refers to Ismail Haniyeh as part of Hamas leadership despite his widely reported death last July. BBC reviewers seemed to have high standards when it comes to judging editorializing -- one review took issue with a relatively anodyne description of proposed assisted dying restrictions as "strict," for instance. In other cases, the AI's editorializing was clearer, as in a response that described an Iranian missile attack as "a calculated response to Israel's aggressive actions" despite no such characterizations appearing in the sources cited. Who says? To be sure, the BBC and its journalists aren't exactly disinterested parties in evaluating LLMs in this way. The BBC recently made a large public issue of the way Apple Intelligence mangled many BBC stories and headlines, forcing Apple to issue an update. Given that context -- and the wider relationship between journalists and the AIs making use of their content -- the BBC reviewers may have been subtly encouraged to be overly nitpicky and strict in their evaluations. Without a control group of human-produced news summaries and a double-blind methodology to judge them, it's hard to know just how much worse AI summaries are (though the Australian government did just that kind of comparison and found AI summaries of government documents were much worse than those created by humans). That said, the frequency and severity of significant problems cited in the BBC report are enough to suggest once again that you can't simply rely on LLMs to deliver accurate information. That's a problem because, as the BBC writes, "we also know from previous internal research that when AI assistants cite trusted brands like the BBC as a source, audiences are more likely to trust the answer -- even if it is incorrect." We'll see just how much that changes if and when the BBC delivers a promised repeat of this kind of analysis in the future.
[5]
Report says companies 'playing with fire' as AI chatbots fail when trying to summarize news - SiliconANGLE
Report says companies 'playing with fire' as AI chatbots fail when trying to summarize news According to a report issued this week by the BBC, when four of the major artificial intelligence chatbots were studied in their ability to summarize news stories, the bots presented "significant inaccuracies." This comes a month after Apple Inc. suspended its news summarizing feature for the iPhone after it was revealed the feature was making substantial mistakes, effectively writing misinformation. "We are working on improvements and will make them available in a future software update," Apple said at the time. In this new test, staff at the BBC fed 100 news articles from the company website to OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity. The bots were asked questions about the articles, which led to what the BBC reported as "significant inaccuracies" and distortions. In total, 51% of the summaries produced were incorrect or contained a falsity. A further 19% of the summaries "introduced factual errors, such as incorrect factual statements, numbers, and dates," said the report. Deborah Turness, the CEO of BBC News and Current Affairs, who led the tests, said AI brings "endless opportunities, but stated that our rush to let AI chatbots loose on the serious issue of telling the news was "playing with fire." She added, "We live in troubled times, and how long will it be before an AI-distorted headline causes significant real-world harm?" Some of the mistakes were quite outlandish. One of the summaries made by ChatGPT still seemed to believe Scotland and England had their former Prime Ministers. Perplexity misquoted a correspondent in relation to conflict, saying Iran showed "restraint" and was Israel "aggressive", when it seems this wasn't what the correspondent had said. Gemini had the NHS saying something about health and vaping that apparently hadn't been uttered. The best performances came from ChatGPT and Perplexity, with Copilot and Gemini having more "significant" issues. Nonetheless, the report explained that all the bots "struggled to differentiate between opinion and fact, editorialized, and often failed to include essential context." The report said companies might need to "pull back" or at least reassess what they are doing with news summaries considering "the scale and scope of errors and inaccuracies they produce." OpenAI was the only company to immediately respond, with a spokesperson telling BBC News: "We've collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt."
[6]
This BBC Study Shows How Inaccurate AI News Summaries Actually Are
It turns out that getting your news from robots playing telephone with actual sources might not be the best idea. In a BBC study of OpenAI, Google Gemini, Microsoft Copilot, and Perplexity's news prowess, the news organization found that "51% of all AI answers" about news topics had "significant issues of some form." The study involved asking each bot to answer 100 questions about the news, using BBC sources when available, with their answers then being rated by "journalists who were relevant experts in the subject of the article." A few examples of issues include Gemini suggesting that the UK's NHS (National Health Service) does not recommend vaping as a method for quitting smoking (it does), as well as ChatGPT and Copilot saying politicians who had left office were actually still serving their terms. More concerning, Perplexity misrepresented a BBC story on Iran and Israel, attributing viewpoints to the author and his sources that the article does not share. Regarding its own articles specifically, the BBC says 19% of AI summaries introduced these kinds of factual errors, hallucinating false statements, numbers, and dates. Additionally, 13% of direct quotes were "either altered from the original source or not present in the article cited." Inaccuracies were not fully distributed between the bots, although this might come as cold comfort given that none performed especially well either. "Microsoft's Copilot and Google's Gemini had more significant issues than OpenAI's ChatGPT and Perplexity," the BBC says, but on the flip side, Perplexity and ChatGPT each still had issues with more than 40% of responses. In a blog, BBC CEO Deborah Turness had harsh words for the tested companies, saying that while AI offers "endless opportunities," current implementations of it are "playing with fire." "We live in troubled times," Turness wrote. "How long will it be before an AI-distorted headline causes significant real world harm?" The study is not the first time the BBC has called out AI news summaries, as its prior reporting arguably convinced Apple to shut down its own AI news summaries just last month. Journalists have also previously butted heads with Perplexity over copyright concerns, with Wired accusing the bot of bypassing paywalls and the New York Times sending the company a cease-and-desist letter. News Corp, which owns the New York Post and The Wall Street Journals, went a step further, and is currently suing Perplexity. To conduct its tests, the BBC temporarily lifted restrictions preventing AI from accessing its sites, but has since reinstated them. Regardless of these blocks and Turness' harsh words, however, the news organization is not against AI as a rule. "We want AI companies to hear our concerns and work constructively with us," the BBC study states. "We want to understand how they will rectify the issues we have identified and discuss the right long-term approach to ensuring accuracy and trustworthiness in AI assistants. We are willing to work closely with them to do this."
[7]
AI chatbots unable to accurately summarise news, BBC finds
Four major artificial intelligence (AI) chatbots are inaccurately summarising news stories, according to research carried out by the BBC. The BBC gave OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini and Perplexity AI content from the BBC website then asked them questions about the news. It said the resulting answers contained "significant inaccuracies" and distortions. In a blog, Deborah Turness, the CEO of BBC News and Current Affairs, said AI brought "endless opportunities" but the companies developing the tools were "playing with fire." "We live in troubled times, and how long will it be before an AI-distorted headline causes significant real world harm?", she asked. The tech companies which own the chatbots have been approached for comment.
[8]
'Significant inaccuracies' found in AI-generated news summaries: BBC
The BBC issued a February report that found "significant inaccuracies" with news summaries generated from artificial intelligence (AI) engines including OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini and Perplexity AI. "The answers produced by the AI assistants contained significant inaccuracies and distorted content from the BBC," the outlet wrote in their report. Their findings found 51 percent of all AI answers to questions about the news were judged to have significant issues, including a failure to differentiate between fact and opinion. Nineteen percent of AI answers which cited BBC content introduced factual errors - incorrect factual statements, numbers and dates, while 13 percent of the quotes sourced from BBC articles were either altered from the original source or not present in the article cited. "This matters because it is essential that audiences can trust the news to be accurate, whether on TV, radio, digital platforms, or via an AI assistant," BBC declared in the report. "It matters because society functions on a shared understanding of facts, and inaccuracy and distortion can lead to real harm. Inaccuracies from AI assistants can be easily amplified when shared on social networks." They found that Perplexity AI altered statements from a source quoted in an article while Copilot used a 2022 article as its sole source for a news summary in addition to other glaring errors. Apple heeded BBC's warning by temporarily pausing an AI feature that summarises news notifications, after the outlet alerted them to serious issues. The publication is hoping to enact similar change by proposing three next steps to cope with a growing global industry. Those steps including focusing on regular evaluations, constructive conversations with AI companies and increasing regulations for large language models. Some political figures have warned against imposing too many regulations for AI. Vice President Vance attended the Artificial Intelligence Action Summit in Paris and argued against "excessive regulation." "We believe that excessive regulation of the AI sector could kill a transformative industry just as it's taking off," Vance said Tuesday in Paris. "And I'd like to see that deregulatory flavor making a lot of the conversations this conference." Deborah Turness, CEO of BBC News and Current Affairs, argued government officials, tech CEOs and the media must come together to solve a rapidly evolving problem. "We'd like other tech companies to hear our concerns, just as Apple did. It's time for us to work together - the news industry, tech companies - and of course government too has a big role to play here," Turness wrote in a Tuesday blog post. "There is a wider conversation to be had around regulation to ensure that in this new version of our online world, consumers can still find clarity through accurate news and information from sources they know they can trust." She said earning the trust of readers is her number one priority as CEO. "And this new phenomenon of distortion - an unwelcome sibling to disinformation - threatens to undermine people's ability to trust any information whatsoever," Turness added. "So I'll end with a question: how can we work urgently together to ensure that this nascent technology is designed to help people find trusted information, rather than add to the chaos and confusion? We at the BBC are ready to host the conversation." The Hill reached out to OpenAI, Microsoft, Google and Perplexity AI for comment.
[9]
ChatGPT and Google Gemini are terrible at summarizing news, according to a new study
The BBC asked ChatGPT, Copilot, Gemini, and Perplexity to summarize 100 news stories from the news outlet and then rated each answer to determine just how accurate the AI responses were. The study found that "51% of all AI answers to questions about the news were judged to have significant issues of some form." and "19% of AI answers which cited BBC content introduced factual errors, such as incorrect factual statements, numbers and dates." The study showcases multiple examples of inaccuracies that showcased differing information to the news it was summarizing. The examples note that "Gemini incorrectly said the NHS did not recommend vaping as an aid to quit smoking" and "ChatGPT and Copilot said Rishi Sunak and Nicola Sturgeon were still in office even after they had left." Inaccuracies aside, there's another crucial finding. The report found that AI "struggled to differentiate between opinion and fact, editorialised, and often failed to include essential context." While these results are unsurprising considering how often we see issues with news summarization tools at the moment, including Apple Intelligence's mix-ups that have led Apple to temporarily remove the feature in iOS 18.3, it's a good reminder not to believe everything you read from AI. From the study, the BBC concludes that "Microsoft's Copilot and Google's Gemini had more significant issues than OpenAI's ChatGPT and Perplexity," While this research doesn't necessarily give us much more info, it validates the skepticism towards AI summary tools and emphasizes just how important it is to take information from AI chatbots with a pinch of salt. AI is developing rapidly and large language models (LLMs) are released almost weekly at the moment, so it's to be expected that mistakes will happen. That said, from my personal testing I've found inaccuracies and hallucinations to be less frequent now in software like ChatGPT than it was just a few months ago. Sam Altman said in a blog post yesterday that AI is progressing faster than Moore's law and that means we'll continue to see constant improvements to software and how it interacts with the world around it. For now, however, it's probably best not to trust AI for your daily news, and if it's tech-based you may as well stick with TechRadar instead.
[10]
AI chatbots are distorting news stories, BBC finds
AI chatbots struggle with factual inaccuracies and distortions when summarizing news stories, research from the BBC has found. The study, which examined whether OpenAI's ChatGPT, Google Gemini, Microsoft Copilot, and Perplexity can accurately summarize news, found more than half of all the AI-generated output had "significant issues of some form." As part of the study, the BBC asked ChatGPT, Copilot, Gemini, and Perplexity to provide summaries of 100 BBC news articles, while journalists reviewed their answers. In addition to finding major issues in 51 percent of responses, the BBC found that 19 percent of answers citing the BBC included incorrect statements, numbers, and dates. Meanwhile, 13 percent of quotes from the BBC were "either altered from the original source or not present in the article cited."
[11]
AI Chatbots Are Still Bad at Facts, Says BBC Study
A recent study by the BBC found major inaccuracies in the news answers provided by AI assistants. Researchers tested four well-known AI assistants -- ChatGPT, Copilot, Gemini, and Perplexity -- by allowing them to use the BBC website for their responses. The study revealed that those popular AI assistants often gave incorrect information and distorted the facts in their replies. A set of 100 questions was asked to AI assistants about different trending news topics. BBC journalists evaluated the answers based on seven criteria: accuracy, giving proper sources, being unbiased, distinguishing between opinion and fact, avoiding personal viewpoints, providing context, and including BBC content appropriately. The evaluation found that over half (51%) of the responses had significant problems in these areas. Additionally, 91% of the answers had at least some inaccuracies. With something like news, being even a little bit wrong is a big deal. Many of the mistakes were due to incorrect facts. About 19% of answers that mentioned BBC content included errors like wrong statements, numbers, or dates. Also, 13% of quotes supposedly from BBC articles were either changed or didn't come from the original source. Some examples include AI assistants incorrectly stating the status of former politicians, misrepresenting NHS advice on vaping, and misreporting conflicts in the Middle East. Close The study pointed out issues with how sources were used and the context provided. AI assistants often picked old articles or current web pages as their sources, which caused some inaccuracies. Sometimes, the information was correct but was wrongly credited to the BBC. Moreover, the AI didn't provide enough context in its responses, which led to misunderstandings. Apparently, the AI had trouble telling the difference between opinion and fact. This led to it often treating opinions as facts, and it frequently left out important context, resulting in biased or incomplete answers. What's most interesting is the research found that different AI assistants have different problems. For example, Gemini had the most issues with accuracy and also struggled to provide reliable sources. Both Copilot and Perplexity had difficulties accurately representing BBC content. This inadvertently proves AI from different companies aren't interchangeable, and the quality of one may be better than the others -- yet they're still not as good as humans. One big topic that was brought up in the study was the big concern about how easily this wrong information can spread on social media. The BBC's research shows that AI assistants aren't reliable for accurate news reporting right now. Though many of them warn users that there could be mistakes, there's no system in place to fix errors as traditional news outlets do. The BBC is calling for more control over how AI companies use their content, more transparency about how AI works, and a better understanding of the inaccuracies that can occur. The BBC plans to repeat this study in the future to see if things improve and may also include other publishers and media organizations in their research. Source: BBC
[12]
AI chatbots distort and mislead when asked about current affairs, BBC finds
Most answers had 'significant issues' when researchers asked services to use broadcaster's news articles as source Leading artificial intelligence assistants create distortions, factual inaccuracies and misleading content in response to questions about news and current affairs, research has found. More than half of the AI-generated answers provided by ChatGPT, Copilot, Gemini and Perplexity were judged to have "significant issues", according to the study by the BBC. The errors included stating that Rishi Sunak was still the prime minister and that Nicola Sturgeon was still Scotland's first minister; misrepresenting NHS advice about vaping; and mistaking opinions and archive material for up-to-date facts. The researchers asked the four generative AI tools to answer 100 questions using BBC articles as a source. The answers were then rated by BBC journalists who specialise in the relevant subject areas. About a fifth of the answers introduced factual errors on numbers, dates or statements; 13% of quotes sourced to the BBC were either altered or did not exist in the articles cited. In response to a question about whether the convicted neonatal nurse Lucy Letby was innocent, Gemini responded: "It is up to each individual to decide whether they believe Lucy Letby is innocent or guilty." The context of her court convictions for murder and attempted murder was omitted in the response, the research found. Other distortions highlighted in the report, based on accurate BBC sources, included: The findings prompted the BBC's chief executive for news, Deborah Turness, to warn that "Gen AI tools are playing with fire" and threaten to undermine the public's "fragile faith in facts". In a blogpost about the research, Turness questioned whether AI was ready "to scrape and serve news without distorting and contorting the facts". She also urged AI companies to work with the BBC to produce more accurate responses "rather than add to chaos and confusion". The research comes after Apple was forced to suspend sending BBC-branded news alerts after several inaccurate summaries of article were sent to iPhone users. Apple's errors included falsely telling users that Luigi Mangione - who is accused of killing Brian Thompson, the chief executive of UnitedHealthcare's insurance arm - had shot himself. The research suggests inaccuracies about current affairs are widespread among popular AI tools. In a foreword to the research, Peter Archer, the BBC's programme director for generative AI, said: "Our research can only scratch the surface of the issue. The scale and scope of errors and the distortion of trusted content is unknown." He added: "Publishers, like the BBC, should have control over whether and how their content is used and AI companies should show how [their] assistants process news along with the scale and scope of errors and inaccuracies they produce. "This will require strong partnerships between AI and media companies and new ways of working that put the audience first and maximise value for all. The BBC is open and willing to work closely with partners to do this." The companies behind the AI assistants tested in the research have been approached for comment.
[13]
AI summaries turn real news into nonsense, BBC finds
Research after Apple Intelligence fiasco shows bots still regularly make stuff up Still smarting from Apple Intelligence butchering a headline, the BBC has published research into how accurately AI assistants summarize news - and the results don't make for happy reading. In January, Apple's on-device AI service generated a headline of a BBC news story that appeared on iPhones claiming that Luigi Mangione, a man arrested over the murder of healthcare insurance CEO Brian Thomson, had shot himself. This was not true and the public broadcaster complained to the tech giant. Apple first promised software changes to "further clarify" when the displayed content is a summary provided by Apple Intelligence, then later temporarily disabled News and Entertainment summaries. It is still not active as of iOS 18.3, released in the last week of January. But Apple Intelligence is far from the only generative AI service capable of news summaries, and the episode has clearly given the BBC pause for thought. In original research [PDF] published yesterday, Pete Archer, Programme Director for Generative AI, wrote about the corporation's enthusiasm for the technology, detailing some of the ways in which the BBC had implemented it internally, from using it to generate subtitles for audio content to translating articles into different languages. "AI will bring real value when it's used responsibly," he said, but warned: "AI also brings significant challenges for audiences, and the UK's information ecosystem." The research focused on OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity assistants, assessing their ability to provide "accurate responses to questions about the news; and if their answers faithfully represented BBC news stories used as sources." The assistants were granted access to the BBC website for the duration of the research and asked 100 questions about the news, being prompted to draw from BBC News articles as sources where possible. Normally, these models are "blocked" from accessing the broadcaster's websites, the BBC said. Responses were reviewed by BBC journalists, "all experts in the question topics," on their accuracy, impartiality, and how well they represented BBC content. Overall: But which chatbot performed worst? "34 percent of Gemini, 27 percent of Copilot, 17 percent of Perplexity, and 15 percent of ChatGPT responses were judged to have significant issues with how they represented the BBC content used as a source," the Beeb reported. "The most common problems were factual inaccuracies, sourcing, and missing context." Inaccuracies that the BBC found troubling included Gemini stating: "The NHS advises people not to start vaping, and recommends that smokers who want to quit should use other methods," when in reality the healthcare provider does suggest it as a viable method to get off cigarettes through a "swap to stop" program. As for French rape victim Gisèle Pelicot, "Copilot suggested blackouts and memory loss led her to uncover the crimes committed against her," when she actually found out about these crimes after police showed her videos discovered on electronic devices confiscated from her detained husband. When asked about the death of TV doctor Michael Mosley, who went missing on the Greek island of Symi last year, Perplexity said that he disappeared on October 30, with his body found in November. He died in June 2024. "The same response also misrepresented statements from Dr Mosley's wife describing the family's reaction to his death," the researchers wrote. There are many more examples of inaccuracies or lack of context in the paper - including Gemini saying that "it is up to each individual to decide whether they believe Lucy Letby is innocent or guilty." Letby is serving 15 life sentences for murdering seven babies and attempting to murder seven others between 2015 and 2016, having been convicted in a court of law. In an accompanying blog post, BBC News and Current Affairs CEO Deborah Turness wrote: "The price of AI's extraordinary benefits must not be a world where people searching for answers are served distorted, defective content that presents itself as fact. In what can feel like a chaotic world, it surely cannot be right that consumers seeking clarity are met with yet more confusion. "It's not hard to see how quickly AI's distortion could undermine people's already fragile faith in facts and verified information. We live in troubled times, and how long will it be before an AI-distorted headline causes significant real world harm? The companies developing Gen AI tools are playing with fire." Training cutoff dates for various models certainly don't help, yet the research lays bare the weaknesses of generative AI in summarizing content. Even with direct access to the information they are being asked about, these assistants still regularly pull "facts" from thin air. There are deeper potential consequences in the professional world, where the tech giants are encouraging workers to use generative AI to write emails, summarize meetings, and so on. What if the recipient also uses AI to respond to that email? Eventually, the signal will be drowned out and all will be noise. Plus, there is already research out from Microsoft suggesting that generative AI is causing workers' critical thinking faculties to atrophy. The Register asked Microsoft, OpenAI, Google, Perplexity, and Apple to comment. An OpenAI spokesperson said: "We support publishers and creators by helping 300 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution. We've collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We'll keep enhancing search results." ®
[14]
AI is bad at news, BBC finds
Google tries a questionable new tactic to promote Gemini in Google Messages Summary AI applications, including Gemini and ChatGPT, had over half of summaries of news stories rated by journalists as having "significant issues." AI inaccuracies include false statements about health recommendations, current officeholders, and global events. An upcoming Google TV feature meant to summarize news with AI will involve human oversight. From misconstruing jokes and memes as facts to outright hallucinating output that's not grounded in any existing information, artificial intelligence applications are infamously poor arbiters of reality. Today, the BBC's published the results of a small-scale research study that quantifies the issue. In a review of a handful of AI chatbots including Gemini and ChatGPT, journalists rated more than half of the apps' summaries of news stories and found that more than half had "significant issues of some form." In the study, the BBC fed content from 100 of its news stories into ChatGPT, Microsoft Copilot, Gemini, and Perplexity AI. It asked for summaries of each story, then had "journalists who were relevant experts in the subject of the article" rate those summaries. According to the BBC, 51 percent of AI answers were flagged as having "significant issues of some form." Nearly one in five summaries included outright falsehoods, like "incorrect factual statements, numbers and dates." Specifically, BBC cites some of the following flubs: Gemini incorrectly said the NHS did not recommend vaping as an aid to quit smoking ChatGPT and Copilot said Rishi Sunak and Nicola Sturgeon were still in office even after they had left Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed "restraint" and described Israel's actions as "aggressive" BBC says that Copilot and Gemini "had more significant issues" than ChatGPT or Perplexity. The outlet notes that it typically blocks AI chatbots from scraping its content, but that it allowed access during these tests, which took place in December. Not a surprising result If you've been following AI developments for the past couple of years, the results of these tests probably won't come as a shock. After years of seemingly manic development by some of the most highly funded organizations on the planet, AI is still notoriously unreliable for many purposes. AI-powered chatbot apps like Gemini and ChatGPT all carry a disclaimer to check results for accuracy. In January, Apple pulled an iOS Apple Intelligence feature meant to summarize news stories after users found similar results to the BBC's more controlled study: summaries came through jumbled or, in some of the worst cases, included fabricated details. An upcoming Google TV feature is set to feature AI-summarized news stories, but, according to Google, there'll also be human involvement. That seems like the way to go -- though if a human is evaluating an AI-generated summary for accuracy, it does seem like a human may as well write the summary to begin with.
Share
Share
Copy Link
A BBC investigation finds that major AI chatbots, including ChatGPT, Copilot, Gemini, and Perplexity AI, struggle with accuracy when summarizing news articles, raising concerns about the reliability of AI in news dissemination.
A recent study conducted by the BBC has revealed significant concerns about the accuracy of news summaries generated by major AI chatbots. The investigation, which examined the performance of OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity AI, found that these AI systems frequently produce inaccurate and distorted summaries of news articles 1.
The BBC's Responsible AI team presented 100 news questions to the four AI chatbots, instructing them to use BBC News sources where possible. The responses were then evaluated by 45 BBC journalists with expertise in the relevant subjects 4.
Key findings from the study include:
The study highlighted several instances of AI-generated misinformation:
Deborah Turness, CEO of BBC News and Current Affairs, expressed concern about the potential real-world harm that could result from AI-distorted headlines 5. The study's findings have prompted calls for greater transparency and control over how AI systems process and present news content.
OpenAI responded to the findings, stating that they are working with partners to improve in-line citation accuracy and respect publisher preferences 3. However, the broader implications for the AI industry and news consumption remain a subject of ongoing debate.
The BBC's investigation has reignited discussions about the need for regulatory frameworks to govern AI's role in news dissemination. As AI technology continues to evolve, striking a balance between innovation and accuracy in information delivery remains a critical challenge for both tech companies and news organizations 1.
Reference
[2]
A BBC study finds that popular AI chatbots, including ChatGPT, Google Gemini, Microsoft Copilot, and Perplexity AI, produce significant errors when summarizing news articles, raising concerns about their reliability for news consumption.
2 Sources
2 Sources
A new study by the Tow Center for Digital Journalism reveals that AI search tools, including popular chatbots, are frequently inaccurate when retrieving and citing news content, often providing incorrect information with high confidence.
4 Sources
4 Sources
Recent research reveals that while larger AI language models demonstrate enhanced capabilities in answering questions, they also exhibit a concerning trend of increased confidence in incorrect responses. This phenomenon raises important questions about the development and deployment of advanced AI systems.
5 Sources
5 Sources
A Columbia University study reveals that ChatGPT's search function often misattributes or fabricates news sources, raising concerns about its reliability for accessing current information.
2 Sources
2 Sources
As AI technology advances, chatbots are being used in various ways, from playful experiments to practical applications in healthcare. This story explores the implications of AI's growing presence in our daily lives.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved