Curated by THEOUTPOST
On Wed, 9 Apr, 12:02 AM UTC
2 Sources
[1]
The RAG reality check: New open-source framework lets enterprises scientifically measure AI performance
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Enterprises are spending time and money building out retrieval-augmented generation (RAG) systems. The goal is to have an accurate enterprise AI system, but are those systems actually working? The inability to objectively measure whether RAG systems are actually working is a critical blind spot. One potential solution to that challenge is launching today with the debut of the Open RAG Eval open-source framework. The new framework was developed by enterprise RAG platform provider Vectara working together with Professor Jimmy Lin and his research team at the University of Waterloo. Open RAG Eval transforms the currently subjective 'this looks better than that' comparison approach into a rigorous, reproducible evaluation methodology that can measure retrieval accuracy, generation quality and hallucination rates across enterprise RAG deployments. The framework assesses response quality using two major metric categories: retrieval metrics and generation metrics. It allows organizations to apply this evaluation to any RAG pipeline, whether using Vectara's platform or custom-built solutions. For technical decision-makers, this means finally having a systematic way to identify exactly which components of their RAG implementations need optimization. "If you can't measure it, you can't improve it," Jimmy Lin, professor at the University of Waterloo, told VentureBeat in an exclusive interview. "In information retrieval and dense vectors, you could measure lots of things, ndcg [Normalized Discounted Cumulative Gain], precision, recall...but when it came to right answers, we had no way, that's why we started on this path." Why RAG evaluation has become the bottleneck for enterprise AI adoption Vectara was an early pioneer in the RAG space. The company launched in October 2022, before ChatGPT was a household name. Vectara actually debuted technology it originally referred to as grounded AI back in May 2023, as a way to limit hallucinations, before the RAG acronym was commonly used. Over the last few months, for many enterprises, RAG implementations have grown increasingly complex and difficult to assess. A key challenge is that organizations are moving beyond simple question-answering to multi-step agentic systems. "In the agentic world, evaluation is doubly important, because these AI agents tend to be multi-step," Am Awadallah, Vectara CEO and cofounder told VentureBeat. "If you don't catch hallucination the first step, then that compounds with the second step, compounds with the third step, and you end up with the wrong action or answer at the end of the pipeline." How Open RAG Eval works: Breaking the black box into measurable components The Open RAG Eval framework approaches evaluation through a nugget-based methodology. Lin explained that the nugget approach breaks responses down into essential facts, then measures how effectively a system captures the nuggets. The framework evaluates RAG systems across four specific metrics: Importantly, the framework evaluates the entire RAG pipeline end-to-end, providing visibility into how embedding models, retrieval systems, chunking strategies, and LLMs interact to produce final outputs. The technical innovation: Automation through LLMs What makes Open RAG Eval technically significant is how it uses large language models to automate what was previously a manual, labor-intensive evaluation process. "The state of the art before we started, was left versus right comparisons," Lin explained. "So this is, do you like the left one better? Do you like the right one better? Or they're both good, or they're both bad? That was sort of one way of doing things." Lin noted that the nugget-based evaluation approach itself isn't new, but its automation through LLMs represents a breakthrough. The framework uses Python with sophisticated prompt engineering to get LLMs to perform evaluation tasks like identifying nuggets and assessing hallucinations, all wrapped in a structured evaluation pipeline. Competitive landscape: How Open RAG Eval fits into the evaluation ecosystem As enterprise use of AI continues to mature, there is a growing number of evaluation frameworks. Just last week, Hugging Face launched Yourbench to test models against the company's internal data. At the end of January, Galileo launched its Agentic Evaluations technology. The Open RAG Eval is different in that it is strongly focussed on the RAG pipeline, not just LLM outputs.. The framework also has a strong academic foundation and is built on established information retrieval science rather than ad-hoc methods. The framework builds on Vectara's previous contributions to the open-source AI community, including its Hughes Hallucination Evaluation Model (HHEM), which has been downloaded over 3.5 million times on Hugging Face and has become a standard benchmark for hallucination detection. "We're not calling it the Vectara eval framework, we're calling it the Open RAG Eval framework because we really want other companies and other institutions to start helping build this out," Awadallah emphasized. "We need something like that in the market, for all of us, to make these systems evolve in the right way." What Open RAG Eval means in the real world While still an early stage effort, Vectara at least already has multiple users interested in using the Open RAG Eval framework. Among them is Jeff Hummel, SVP of Product and Technology at real estate firm Anywhere.re. Hummel expects that partnering with Vectara will allow him to streamline his company's RAG evaluation process. Hummel noted that scaling his RAG deployment introduced significant challenges around infrastructure complexity, iteration velocity and rising costs. "Knowing the benchmarks and expectations in terms of performance and accuracy helps our team be predictive in our scaling calculations," Hummel said. "To be frank, there weren't a ton of frameworks for setting benchmarks on these attributes; we relied heavily on user feedback, which was sometimes objective and did translate to success at scale." From measurement to optimization: Practical applications for RAG implementers For technical decision-makers, Open RAG Eval can help answer crucial questions about RAG deployment and configuration: In practice, organizations can establish baseline scores for their existing RAG systems, make targeted configuration changes, and measure the resulting improvement. This iterative approach replaces guesswork with data-driven optimization. While this initial release focuses on measurement, the roadmap includes optimization capabilities that could automatically suggest configuration improvements based on evaluation results. Future versions might also incorporate cost metrics to help organizations balance performance against operational expenses. For enterprises looking to lead in AI adoption, Open RAG Eval means they can implement a scientific approach to evaluation rather than relying on subjective assessments or vendor claims. For those earlier in their AI journey, it provides a structured way to approach evaluation from the beginning, potentially avoiding costly missteps as they build out their RAG infrastructure.
[2]
Vectara launches open-source framework to evaluate enterprise RAG systems - SiliconANGLE
Vectara launches open-source framework to evaluate enterprise RAG systems Artificial intelligence agent and assistant platform provider Vectara Inc. today announced the launch of Open RAG Eval, an open-source evaluation framework for retrieval-augmented generation. RAG is a technique that enhances AI responses by retrieving relevant external data to inform its output. The approach improves accuracy and reduces hallucinations by grounding responses in real-time, trusted information. Vectara's new Open RAG Eval framework, developed in conjunction with researchers from the University of Waterloo, allows enterprise users to evaluate response quality for each component and configuration of their RAG systems in order to quickly and consistently optimize the accuracy and reliability of their AI agents and other tools. Open RAG Eval is designed to determine the accuracy and usefulness of the responses provided to user prompts, depending on the components and configuration of an enterprise RAG stack. The framework assesses response quality according to two major metric categories: retrieval metrics and generation metrics. By surfacing insights across these two metric categories, Open RAG Eval enables developers to diagnose performance issues at a granular level. For example, low retrieval scores may signal the need for better document chunking or improved search strategies, while weak generation scores might point to suboptimal prompts or the use of an underperforming language model. The framework is compatible with any RAG pipeline, including Vectara's own generative AI platform and other custom solutions. Early adopters can use Open RAG Eval to make informed decisions about whether to implement semantic chunking, adjust hybrid search parameters, or refine prompt engineering for better overall results. Vectara notes that the framework's development was made possible through its collaboration with Professor Jimmy Lin and his team at the University of Waterloo, who are renowned for their contributions to information retrieval and evaluation benchmarks. Their research foundation helps ensure that Open RAG Eval delivers both scientific rigor and practical utility for enterprise applications. "AI agents and other systems are becoming increasingly central to how enterprises operate today and how they plan to grow in the future," said Professor Lin. "In order to capitalize on the promise these technologies offer, organizations need robust evaluation methodologies that combine scientific rigor and practical utility in order to continually assess and optimize their RAG systems." Vectara is a venture capital-backed startup that has raised $73.5 million over three rounds, including rounds of $28 million in May 2023 and $25 million last July. Investors in the company include FPV Ventures LP, Race Capital, Alumni Ventures, WVV Capital, Samsung NEXT, Fusion Fund, Green Sands Equity LP and Mack Ventures LP.
Share
Share
Copy Link
Vectara, in collaboration with the University of Waterloo, has launched Open RAG Eval, an open-source framework designed to objectively measure and improve the performance of enterprise Retrieval-Augmented Generation (RAG) systems.
In a significant development for the artificial intelligence industry, Vectara, an enterprise RAG platform provider, has unveiled Open RAG Eval, an open-source framework designed to scientifically measure AI performance 1. This innovative tool, developed in collaboration with Professor Jimmy Lin and his research team at the University of Waterloo, aims to transform the subjective comparison approach into a rigorous, reproducible evaluation methodology for enterprise Retrieval-Augmented Generation (RAG) systems 1.
The framework assesses response quality using two major metric categories: retrieval metrics and generation metrics. It employs a nugget-based methodology, breaking responses down into essential facts and measuring how effectively a system captures these nuggets 1. Open RAG Eval evaluates RAG systems across four specific metrics:
What sets Open RAG Eval apart is its use of large language models to automate what was previously a manual, labor-intensive evaluation process 1.
The framework allows organizations to apply this evaluation to any RAG pipeline, whether using Vectara's platform or custom-built solutions 2. For technical decision-makers, this means finally having a systematic way to identify exactly which components of their RAG implementations need optimization 1.
Am Awadallah, Vectara CEO and cofounder, emphasized the importance of evaluation in the agentic world: "If you don't catch hallucination the first step, then that compounds with the second step, compounds with the third step, and you end up with the wrong action or answer at the end of the pipeline." 1
As enterprise use of AI continues to mature, there is a growing number of evaluation frameworks. Open RAG Eval distinguishes itself by focusing strongly on the RAG pipeline, not just LLM outputs. It also has a strong academic foundation and is built on established information retrieval science 1.
While still an early-stage effort, Vectara already has multiple users interested in using the Open RAG Eval framework. Jeff Hummel, SVP of Product and Technology at real estate firm Anywhere, expects that partnering with Vectara will allow him to streamline his company's RAG evaluation process 1.
Vectara, a venture capital-backed startup that has raised $73.5 million over three rounds, is calling for other companies and institutions to contribute to the framework's development. This collaborative approach aims to establish Open RAG Eval as a standard for evaluating and improving RAG systems across the industry 2.
Amazon's RAGChecker and the broader implications of Retrieval-Augmented Generation (RAG) are set to transform AI applications and enterprise knowledge management. This technology promises to enhance AI accuracy and unlock valuable insights from vast data repositories.
2 Sources
2 Sources
Vectorize AI Inc. debuts its platform for optimizing retrieval-augmented generation (RAG) data preparation, backed by $3.6 million in seed funding led by True Ventures. The startup aims to streamline the process of transforming unstructured data for AI applications.
2 Sources
2 Sources
Voyage AI raises $20 million in Series A funding to develop improved embedding and retrieval models for enterprise Retrieval Augmented Generation (RAG) AI use cases, with backing from Snowflake and plans for integration into Snowflake's Cortex AI service.
2 Sources
2 Sources
Google introduces DataGemma, a groundbreaking large language model that incorporates Retrieval-Augmented Generation (RAG) to enhance accuracy and reduce AI hallucinations. This development marks a significant step in addressing key challenges in generative AI.
2 Sources
2 Sources
Agentic RAG combines advanced language models with autonomous AI agents, enhancing data retrieval and response generation in AI systems. This innovative framework is transforming how AI interacts with information, promising more efficient and relevant AI applications.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved