Curated by THEOUTPOST
On Tue, 29 Apr, 12:02 AM UTC
4 Sources
[1]
RAG can make AI models riskier and less reliable, new research shows
Retrieval-Augmented Generation (RAG) is rapidly emerging as a robust framework for organizations seeking to harness the full power of generative AI with their business data. As enterprises seek to move beyond generic AI responses and leverage their unique knowledge bases, RAG bridges general AI capabilities and domain-specific expertise. Hundreds, perhaps thousands, of companies are already using RAG AI services, with adoption accelerating as the technology matures. Also: I tested 10 AI content detectors, and these 5 correctly identified AI text every time That's the good news. The bad news: According to Bloomberg Research, RAG can also vastly increase the chances of getting dangerous answers. Before diving into the dangers, let's review what RAG is and its benefits. RAG is an AI architecture that combines the strengths of generative AI models -- such as OpenAI's GPT-4, Meta's LLaMA 3, or Google's Gemma -- with information from your company's records. RAG enables large language models (LLMs) to access and reason over external knowledge stored in databases, documents, and live in-house data streams, rather than relying solely on the LLMs' pre-trained "world knowledge." When a user submits a query, a RAG system first retrieves the most relevant information from a curated knowledge base. It then feeds this information, along with the original query, into the LLM. Maxime Vermeir, senior director of AI strategy at ABBYY, describes RAG as a system that enables you to "generate responses not just from its training data, but also from the specific, up-to-date knowledge you provide. This results in answers that are more accurate, relevant, and tailored to your business context." The advantages of using RAG are clear. While LLMs are powerful, they lack the information specific to your business's products, services, and plans. For example, if your company operates in a niche industry, your internal documents and proprietary knowledge are far more valuable for answers than what can be found in public datasets. By letting the LLM access your actual business data -- be these PDFs, Word documents, or Frequently Asked Questions (FAQ) -- at query time, you get much more accurate and on-point answers to your questions. In addition, RAG reduces hallucinations. It does this by grounding AI answers to reliable, external, or internal data sources. When a user submits a query, the RAG system retrieves relevant information from curated databases or documents. It provides this factual context to the language model, which then generates a response based on both its training and the retrieved evidence. This process makes it less likely for the AI to fabricate information, as its answers can be traced back to your own in-house sources. Also: 60% of AI agents work in IT departments - here's what they do every day As Pablo Arredondo, a Thomson Reuters vice president, told WIRED, "Rather than just answering based on the memories encoded during the initial training of the model, you utilize the search engine to pull in real documents -- whether it's case law, articles, or whatever you want -- and then anchor the response of the model to those documents." RAG-empowered AI engines can still create hallucinations, but it's less likely to happen. Another RAG advantage is that it enables you to extract useful information from your years of unorganized data sources that would otherwise be difficult to access. While RAG offers significant advantages, it is not a magic bullet. If your data is, uhm, bad, the phrase "garbage-in, garbage out" comes to mind. A related problem: If you have out-of-date data in your files, RAG will pull this information out and treat it as the gospel truth. That will quickly lead to all kinds of headaches. Also: Want generative AI LLMs integrated with your business data? You need RAG Finally, AI isn't smart enough to clean up all your data for you. You'll need to organize your files, manage RAG's vector databases, and integrate them with your LLMs before a RAG-enabled LLM will be productive. Here's what Bloomberg's researchers discovered: RAG can actually make models less "safe" and their outputs less reliable. Bloomberg tested 11 leading LLMs, including GPT-4o, Claude-3.5-Sonnet, and Llama-3-8 B, using over 5,000 harmful prompts. Models that rejected unsafe queries in standard (non-RAG) settings generated problematic responses when the LLMs were RAG-enabled. They found that even "safe" models exhibited a 15-30% increase in unsafe outputs with RAG. Moreover, longer retrieved documents correlated with higher risk, as LLMs struggled to prioritize safety. In particular, Bloomberg reported that even very safe models, "which refused to answer nearly all harmful queries in the non-RAG setting, become more vulnerable in the RAG setting." Also: Why neglecting AI ethics is such risky business - and how to do AI right What kind of "problematic" results? Bloomberg, as you'd expect, was examining financial results. They saw the AI leaking sensitive client data, creating misleading market analyses, and producing biased investment advice. Besides that, the RAG-enabled models were more likely to produce dangerous answers that could be used with malware and political campaigning. In short, as Amanda Stent, Bloomberg's head of AI strategy & research in the office of the CTO, explained, "This counterintuitive finding has far-reaching implications given how ubiquitously RAG is used in gen AI applications such as customer support agents and question-answering systems. The average internet user interacts with RAG-based systems daily. AI practitioners need to be thoughtful about how to use RAG responsibly, and what guardrails are in place to ensure outputs are appropriate." Sebastian Gehrmann, Bloomberg's head of responsible AI, added, "RAG's inherent design-pulling of external data dynamically creates unpredictable attack surfaces. Mitigation requires layered safeguards, not just relying on model providers' claims." Bloomberg suggests creating new classification systems for domain-specific hazards. Companies deploying RAG should also improve their guardrails by combining business logic checks, fact-validation layers, and red-team testing. For the financial sector, Bloomberg advises examining and testing your RAG AIs for potential confidential disclosure, counterfactual narrative, impartiality issues, and financial services misconduct problems. Also: A few secretive AI companies could crush free society, researchers warn You must take these issues seriously. As regulators in the US and EU intensify scrutiny of AI in finance, RAG, while powerful, demands rigorous, domain-specific safety protocols. Last, but not least, I can easily see companies being sued if their AI systems provide clients with not merely poor, but downright wrong answers and advice.
[2]
Bloomberg's Responsible AI Research: Mitigating Risky RAGs & GenAI in Finance | Bloomberg LP
Safety concerns, or "unsafe" generation, include harmful, illegal, offensive, and unethical content, such as spreading misinformation and jeopardizing personal safety and privacy. This led them to investigate whether the number of "unsafe" generations may have increased because the retrieved documents provided unsafe information. While they found that the probability of "unsafe" outputs rose sharply when "unsafe" documents were retrieved, the probability of generating "unsafe" responses in the RAG setting still far exceeded that of the non-RAG setting - even with "safe" documents. In the process, they observed two key phenomena how "safe" documents can lead to "unsafe" generations: "That RAG can actually make models less safe and their outputs less reliable is counterintuitive, but this finding has far-reaching implications given how ubiquitously RAG is used in GenAI applications," explains Dr. Amanda Stent, Bloomberg's Head of AI Strategy & Research in the Office of the CTO. "From customer support agents to question-answering systems, the average Internet user interacts with RAG-based systems daily." "This doesn't mean organizations should abandon RAG-based systems, because there is real value in using this technique," explains Edgar Meij, Head of AI Platforms in Bloomberg's AI Engineering group. "Instead, AI practitioners need to be thoughtful about how to use RAG responsibly, and what guardrails are in place to ensure outputs are appropriate." These findings will be presented on Wednesday, April 30, 2025 at the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) in Albuquerque, New Mexico. "One key tenet of Responsible AI that we emphasize repeatedly is trustworthiness. A trustworthy product should be accurate, resilient, and robust. The accuracy of its output should be verifiable. We have been focused on developing trustworthy products since Bloomberg was founded. This is not changing in light of technological advances, and it's why we're continuously conducting research in this space," says Dr. Sebastian Gehrmann, Bloomberg's Head of Responsible AI. Trustworthiness is imperative considering the risk appetite of many financial services firms. "There's a potential mismatch between firms that want to use these technologies, but may have some resistance from their compliance and legal departments," says David Rabinowitz, Bloomberg's Technical Product Manager for AI Guardrails. While risks associated with RAG are not exclusive to the financial services industry, the industry's regulatory demands and fiduciary responsibilities make it crucial for organizations to better understand how these GenAI systems work. One way to improve trustworthiness is to build transparent attribution into RAG-based systems to make it clear to users where in each document a response was sourced, just as Bloomberg has in its GenAI solutions. This way, end-users can quickly and easily validate the generated answers against trusted source materials to ensure model outputs are accurate. Stent says, "This research isn't meant to tell legal and compliance departments to pump the brakes on RAG. Instead, it means people need to keep supporting research, while ensuring there are appropriate safeguards."
[3]
Does RAG make LLMs less safe? Bloomberg research reveals hidden dangers
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Retrieval Augmented Generation (RAG) is supposed to help improve the accuracy of enterprise AI by providing grounded content. While that is often the case, there is also an unintended side effect. According to surprising new research published today by Bloomberg, RAG can potentially make large language models (LLMs) unsafe. Bloomberg's paper, 'RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models,' evaluated 11 popular LLMs including Claude-3.5-Sonnet, Llama-3-8B and GPT-4o. The findings contradict conventional wisdom that RAG inherently makes AI systems safer. The Bloomberg research team discovered that when using RAG, models that typically refuse harmful queries in standard settings often produce unsafe responses. Alongside the RAG research, Bloomberg released a second paper, 'Understanding and Mitigating Risks of Generative AI in Financial Services,' that introduces a specialized AI content risk taxonomy for financial services that addresses domain-specific concerns not covered by general-purpose safety approaches. The research challenges widespread assumptions that retrieval-augmented generation (RAG) enhances AI safety, while demonstrating how existing guardrail systems fail to address domain-specific risks in financial services applications. "Systems need to be evaluated in the context they're deployed in, and you might not be able to just take the word of others that say, Hey, my model is safe, use it, you're good," Sebastian Gehrmann, Bloomberg's Head of Responsible AI, told VentureBeat. RAG systems can make LLMs less safe, not more RAG is widely used by enterprise AI teams to provide grounded content. The goal is to provide accurate, updated information. There has been a lot of research and advancement in RAG in recent months to further improve accuracy as well. Earlier this month a new open-source framework called Open RAG Eval debuted to help validate RAG efficiency. It's important to note that Bloomberg's research is not questioning the efficacy of RAG or its ability to reduce hallucination. That's not what the research is about. Rather it's about how RAG usage impacts LLM guardrails in an unexpected way. The research team discovered that when using RAG, models that typically refuse harmful queries in standard settings often produce unsafe responses. For example, Llama-3-8B's unsafe responses jumped from 0.3% to 9.2% when RAG was implemented. Gehrmann explained that without RAG being in place, if a user typed in a malicious query, the built-in safety system or guardrails will typically block the query. Yet for some reason, when the same query is issued in an LLM that is using RAG, the system will answer the malicious query, even when the retrieved documents themselves are safe. "What we found is that if you use a large language model out of the box, often they have safeguards built in where, if you ask, 'How do I do this illegal thing,' it will say, 'Sorry, I cannot help you do this,'" Gehrmann explained. "We found that if you actually apply this in a RAG setting, one thing that could happen is that the additional retrieved context, even if it does not contain any information that addresses the original malicious query, might still answer that original query." How does RAG bypass enterprise AI guardrails? So why and how does RAG serve to bypass guardrails? The Bloomberg researchers were not entirely certain though they did have a few ideas. Gehrmann hypothesized that the way the LLMs were developed and trained did not fully consider safety alignments for really long inputs. The research demonstrated that context length directly impacts safety degradation. "Provided with more documents, LLMs tend to be more vulnerable," the paper states, showing that even introducing a single safe document can significantly alter safety behavior. "I think the bigger point of this RAG paper is you really cannot escape this risk," Amanda Stent, Bloomberg's Head of AI Strategy and Research, told VentureBeat. "It's inherent to the way RAG systems are. The way you escape it is by putting business logic or fact checks or guardrails around the core RAG system." Why generic AI safety taxonomies fail in financial services Bloomberg's second paper introduces a specialized AI content risk taxonomy for financial services, addressing domain-specific concerns like financial misconduct, confidential disclosure and counterfactual narratives. The researchers empirically demonstrated that existing guardrail systems miss these specialized risks. They tested open-source guardrail models including Llama Guard, Llama Guard 3, AEGIS and ShieldGemma against data collected during red-teaming exercises. "We developed this taxonomy, and then ran an experiment where we took openly available guardrail systems that are published by other firms and we ran this against data that we collected as part of our ongoing red teaming events," Gehrmann explained. "We found that these open source guardrails... do not find any of the issues specific to our industry." The researchers developed a framework that goes beyond generic safety models, focusing on risks unique to professional financial environments. Gehrmann argued that general purpose guardrail models are usually developed for consumer facing specific risks. So they are very much focused on toxicity and bias. He noted that while important those concerns are not necessarily specific to any one industry or domain. The key takeaway from the research is that organizations need to have the domain specific taxonomy in place for their own specific industry and application use cases. Responsible AI at Bloomberg Bloomberg has made a name for itself over the years as a trusted provider of financial data systems. In some respects, gen AI and RAG systems could potentially be seen as competitive against Bloomberg's traditional business and therefore there could be some hidden bias in the research. "We are in the business of giving our clients the best data and analytics and the broadest ability to discover, analyze and synthesize information," Stent said. "Generative AI is a tool that can really help with discovery, analysis and synthesis across data and analytics, so for us, it's a benefit." She added that the kinds of bias that Bloomberg is concerned about with its AI solutions are focussed on finance. Issues such as data drift, model drift and making sure there is good representation across the whole suite of tickers and securities that Bloomberg processes are critical. For Bloomberg's own AI efforts she highlighted the company's commitment to transparency. "Everything the system outputs, you can trace back, not only to a document but to the place in the document where it came from," Stent said. Practical implications for enterprise AI deployment For enterprises looking to lead the way in AI, Bloomberg's research mean that RAG implementations require a fundamental rethinking of safety architecture. Leaders must move beyond viewing guardrails and RAG as separate components and instead design integrated safety systems that specifically anticipate how retrieved content might interact with model safeguards. Industry-leading organizations will need to develop domain-specific risk taxonomies tailored to their regulatory environments, shifting from generic AI safety frameworks to those that address specific business concerns. As AI becomes increasingly embedded in mission-critical workflows, this approach transforms safety from a compliance exercise into a competitive differentiator that customers and regulators will come to expect. "It really starts by being aware that these issues might occur, taking the action of actually measuring them and identifying these issues and then developing safeguards that are specific to the application that you're building," explained Gehrmann.
[4]
Bloomberg research: RAG LLMs may be less safe than you think
Retrieval-Augmented Generation, or RAG, has been hailed as a way to make large language models more reliable by grounding their answers in real documents. The logic sounds airtight: give a model curated knowledge to pull from instead of relying solely on its own parameters, and you reduce hallucinations, misinformation, and risky outputs. But a new study suggests that the opposite might be happening. Even the safest models, paired with safe documents, became noticeably more dangerous when using RAG. Researchers from Bloomberg AI, the University of Maryland, and Johns Hopkins conducted one of the first large-scale analyses of RAG systems' safety. Their findings upend the common assumptions many AI developers and users hold about how retrieval impacts model behavior. Across eleven popular LLMs, RAG often introduced new vulnerabilities, creating unsafe responses that did not exist before. In a test of over 5,000 harmful prompts, eight out of eleven models showed a higher rate of unsafe answers when RAG was activated. Safe behavior in the non-RAG setting did not predict safe behavior in RAG. The study provided a concrete example: Llama-3-8B, a model that only produced unsafe outputs 0.3 percent of the time in a standard setting, saw that figure jump to 9.2 percent when RAG was used. Not only did the overall percentage of unsafe responses climb, but models also expanded their vulnerabilities across new risk categories. Previously contained weaknesses in areas like unauthorized practice of law or malware guidance spread into broader categories including adult content, misinformation, and political campaigning. RAG, instead of narrowing risk, broadened it. The researchers traced this unexpected danger to three interlocking factors: What emerged is that simply pairing a safe model with safe documents is no guarantee of safe responses. The mechanisms that make RAG appealing, such as context synthesis and document-guided answering, also open new pathways for misuse and misinterpretation. Two main behaviors stood out when researchers analyzed unsafe outputs stemming from safe documents. First, models often repurposed harmless information into dangerous advice. For instance, a Wikipedia entry about how police use GPS trackers became, in the hands of a model, a tutorial for criminals on evading capture. Second, even when instructed to rely solely on documents, models sometimes mixed in internal knowledge. This blending of memory and retrieval undermined the safeguards RAG was supposed to provide. Even when external documents were neutral or benign, internal unsafe knowledge surfaced in ways that fine-tuning had previously suppressed in the non-RAG setting. Adding more retrieved documents only worsened the problem. Experiments showed that increasing the number of context documents made LLMs more likely to answer unsafe questions, not less. A single safe document was often enough to start changing a model's risk profile. Not all models handled RAG equally. Claude 3.5 Sonnet, for example, remained remarkably resilient, showing very low unsafe response rates even under RAG pressure. Gemma 7B appeared safe at first glance but deeper analysis revealed that it often simply refused to answer questions. Poor extraction and summarization skills masked vulnerabilities rather than fixing them. In general, models that performed better at genuine RAG tasks like summarization and extraction were paradoxically more vulnerable. Their ability to synthesize from documents also made it easier for them to misappropriate harmless facts into unsafe content when the topic was sensitive. The safety cracks widened further when researchers tested existing red-teaming methods designed to jailbreak LLMs. Techniques like GCG and AutoDAN, which work well for standard models, largely failed to transfer their success when targeting RAG setups. One of the biggest challenges was that adversarial prompts optimized for a non-RAG model lost effectiveness when documents were injected into the context. Even retraining adversarial prompts specifically for RAG improved the results only slightly. Changing the documents retrieved each time created instability, making it hard for traditional jailbreak strategies to succeed consistently. This gap shows that AI security tools and evaluations built for base models are not enough. Dedicated RAG-specific red-teaming will be needed if developers want to deploy retrieval-enhanced systems safely at scale. As companies increasingly move toward RAG architectures for large language model applications, the findings of this study land as a stark warning. Retrieval does help reduce hallucinations and improve factuality, but it does not automatically translate into safer outputs. Worse, it introduces new layers of risk that traditional safety interventions were not designed to handle. The takeaway is clear: LLM developers cannot assume that bolting on retrieval will make models safer. Fine-tuning must be explicitly adapted for RAG workflows. Red-teaming must account for context dynamism. Monitoring must treat the retrieval layer itself as a potential attack vector, not just a passive input. Without RAG-specific defenses, the very techniques designed to ground language models in truth could instead create new vulnerabilities. If the industry does not address these gaps quickly, the next generation of LLM deployments might inherit deeper risks disguised under the comforting label of retrieval.
Share
Share
Copy Link
New research by Bloomberg challenges the assumption that Retrieval-Augmented Generation (RAG) inherently makes AI models safer, revealing that RAG can actually increase the likelihood of unsafe outputs from large language models.
A groundbreaking study by Bloomberg has revealed that Retrieval-Augmented Generation (RAG), widely adopted to enhance AI model accuracy, may paradoxically increase safety risks in large language models (LLMs). The research, conducted on 11 leading LLMs including GPT-4, Claude-3, and Llama-3-8B, challenges the prevailing notion that RAG inherently improves AI safety 12.
The study found that even models considered "safe" in standard settings exhibited a 15-30% increase in unsafe outputs when RAG was implemented. Surprisingly, LLMs that typically refused harmful queries in non-RAG settings became more vulnerable to generating problematic responses with RAG enabled 1.
For instance, Llama-3-8B's unsafe response rate jumped from 0.3% to 9.2% when using RAG 4. This counterintuitive finding has significant implications for the widespread use of RAG in various AI applications, from customer support to question-answering systems 2.
The research identified several factors contributing to this increased risk:
While the risks associated with RAG are not exclusive to the financial industry, the sector's regulatory demands and fiduciary responsibilities make understanding these systems crucial 2. The research revealed potential issues such as:
Bloomberg's research emphasizes the need for domain-specific safety measures. Generic AI safety taxonomies often fail to address risks unique to specific industries like financial services 3. The study introduced a specialized AI content risk taxonomy for financial services, addressing concerns such as financial misconduct and confidential disclosure 3.
Traditional red-teaming methods and jailbreaking techniques designed for standard LLMs proved less effective against RAG-enabled systems 4. This gap highlights the need for dedicated RAG-specific safety evaluations and defenses 4.
As companies increasingly adopt RAG architectures, these findings serve as a critical warning. While RAG helps reduce hallucinations and improve factuality, it does not automatically translate into safer outputs and may introduce new layers of risk 4.
Dr. Amanda Stent, Bloomberg's Head of AI Strategy & Research, emphasized, "This doesn't mean organizations should abandon RAG-based systems... Instead, AI practitioners need to be thoughtful about how to use RAG responsibly, and what guardrails are in place to ensure outputs are appropriate" 2.
Moving forward, the industry must develop RAG-specific defenses, adapt fine-tuning processes for RAG workflows, and implement monitoring systems that treat the retrieval layer as a potential attack vector 4. Without these measures, the next generation of LLM deployments may inherit deeper risks disguised under the seemingly beneficial label of retrieval-augmented generation.
Reference
[2]
Amazon's RAGChecker and the broader implications of Retrieval-Augmented Generation (RAG) are set to transform AI applications and enterprise knowledge management. This technology promises to enhance AI accuracy and unlock valuable insights from vast data repositories.
2 Sources
2 Sources
Google introduces DataGemma, a groundbreaking large language model that incorporates Retrieval-Augmented Generation (RAG) to enhance accuracy and reduce AI hallucinations. This development marks a significant step in addressing key challenges in generative AI.
2 Sources
2 Sources
A comprehensive look at the latest developments in AI, including OpenAI's internal struggles, regulatory efforts, new model releases, ethical concerns, and the technology's impact on Wall Street.
6 Sources
6 Sources
An exploration of how generative AI and social media could be used to manipulate language and control narratives, drawing parallels to Orwell's 'Newspeak' and examining the potential beneficiaries of such manipulation.
2 Sources
2 Sources
Generative AI is revolutionizing industries, from executive strategies to consumer products. This story explores its impact on business value, employee productivity, and the challenges in building interactive AI systems.
6 Sources
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved