Curated by THEOUTPOST
On Fri, 1 Nov, 8:02 AM UTC
2 Sources
[1]
Patronus AI debuts API for equipping AI workloads with reliability guardrails - SiliconANGLE
Patronus AI debuts API for equipping AI workloads with reliability guardrails Patronus AI Inc. today introduced a new tool designed to help developers ensure that their artificial intelligence applications generate accurate output. The Patronus API, as the offering is called, is rolling out a few months after the startup closed a $17 million Series A funding round. The investment included the participation of Datadog Inc.'s venture capital arm and several other institutional backers. San Francisco-based Patronus AI offers a software platform that promises to ease the development of AI applications. Developers can use it to compare a set of large language models and identify which one is most suitable for a given software project. The platform also promises to ease several related tasks, such as detecting technical issues in AI applications after they're deployed to production. The Patronus API, the company's new offering, is an application programming interface that enterprises can integrate into their AI workloads. It's designed to help developers detect when an application generates inaccurate prompt responses and filter them. Many types of issues can emerge in an AI workload's output. Some user queries lead to hallucinations, neural network responses that contain inaccurate information. In other cases, an application's built-in LLM that might generate responses that are overly brief or don't align with a company's style guidelines. Fending off cyberattacks is another challenge. Hackers sometimes use malicious prompts to try and trick an LLM into carrying out a task it's not intended to perform, such as disclosing proprietary information from its training dataset. The Patronus API detects such issues by running an LLM's prompt responses through another language model. That second language model checks each response for problems such as hallucinations and notifies developers if there's a match. There are already several tools on the market that use LLMs to find issues in AI applications, but Patronus AI says they have limited accuracy. The Patronus API offers a choice of several LLM evaluation algorithms. One of them is Lynx, an open-source language model that Patronus AI released in July. It's a customized version of Meta Platforms Inc.'s Llama-3-70B-Instruct model that has been optimized to detect incorrect AI output. According to Patronus AI, Lynx is better than GPT-4o at detecting issues in AI applications with RAG features. RAG, or retrieval-augmented generation, is a machine learning technique that allows an LLM to incorporate data from external sources into its prompt responses. Patronus AI says that Lynx's accuracy partly stems from its use of COT, a processing approach that allows LLMs to break down a complex task into simpler steps. The Patronus API can also scan AI applications' output using other evaluation algorithms. Some of those algorithms have a smaller hardware footprint than Lynx, which means they can be operated more cost-efficiently. Additionally, developers may upload custom evaluation models to analyze AI applications' output based on metrics that aren't supported out of the box. Patronus AI is offering access to the API under a usage-based pricing model. Customers receive a Python software development kit that makes it easier to integrate the service into their applications.
[2]
Patronus AI launches world's first self-serve API to stop AI hallucinations
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A customer service chatbot confidently describes a product that doesn't exist. A financial AI invents market data. A healthcare bot provides dangerous medical advice. These AI hallucinations, once dismissed as amusing quirks, have become million-dollar problems for companies rushing to deploy artificial intelligence. Today, Patronus AI, a San Francisco startup that recently secured $17 million in Series A funding, launched what it calls the first self-serve platform to detect and prevent AI failures in real-time. Think of it as a sophisticated spell-checker for AI systems, catching errors before they reach users. Inside the AI safety net: How it works "Many companies are grappling with AI failures in production, facing issues like hallucinations, security vulnerabilities, and unpredictable behavior," said Anand Kannappan, Patronus AI's CEO, in an interview with VentureBeat. The stakes are high: Recent research by the company found that leading AI models like GPT-4 reproduce copyrighted content 44% of the time when prompted, while even advanced models generate unsafe responses in over 20% of basic safety tests. The timing couldn't be more critical. As companies rush to implement generative AI capabilities -- from customer service chatbots to content generation systems -- they're discovering that existing safety measures fall short. Current evaluation tools like Meta's LlamaGuard perform below 50% accuracy, making them little better than a coin flip. Patronus AI's solution introduces several innovations that could reshape how businesses deploy AI. Perhaps most significant is its "judge evaluators" feature, which allows companies to create custom rules in plain English. "You can customize evaluation to exactly like your product needs," Varun Joshi, Patronus AI's product lead, told VentureBeat. "We let customers write out in English what they want to evaluate and check for." A financial services company might specify rules about regulatory compliance, while a healthcare provider could focus on patient privacy and medical accuracy. From detection to prevention: The technical breakthrough The system's cornerstone is Lynx, a breakthrough hallucination detection model that outperforms GPT-4 by 8.3% in detecting medical inaccuracies. The platform operates at two speeds: a quick-response version for real-time monitoring and a more thorough version for deeper analysis. "The small versions can be used for real-time guardrails, and the large ones might be more appropriate for offline analysis," Joshi told VentureBeat. Beyond traditional error checking, the company has developed specialized tools like CopyrightCatcher, which detects when AI systems reproduce protected content, and FinanceBench, the industry's first benchmark for evaluating AI performance on financial questions. These tools work in concert with Lynx to provide comprehensive coverage against AI failures. Beyond simple guard rails: Reshaping AI safety The company has adopted a pay-as-you-go pricing model, starting at 15 cents per million tokens for smaller evaluators and $5 per million tokens for larger ones. This pricing structure could dramatically increase access to AI safety tools, making them available to startups and smaller businesses that previously couldn't afford sophisticated AI monitoring. Early adoption suggests major enterprises see AI safety as a critical investment, not just a nice-to-have feature. The company has already attracted clients including HP, AngelList, and Pearson, along with partnerships with tech giants like Nvidia, MongoDB, and IBM. What sets Patronus AI apart is its focus on improvement rather than just detection. "We can actually highlight the span of the specific piece of text where the hallucination is," Kannappan explained. This precision allows engineers to quickly identify and fix problems, rather than just knowing something went wrong. The race against AI hallucinations The launch comes at a pivotal moment in AI development. As large language models like GPT-4 and Claude become more powerful and widely used, the risks of AI failures grow correspondingly larger. A hallucinating AI system could expose companies to legal liability, damage customer trust, or worse. Recent regulatory moves, including President Biden's AI executive order and the EU's AI Act, suggest that companies will soon face legal requirements to ensure their AI systems are safe and reliable. Tools like Patronus AI's platform could become essential for compliance. "Good evaluation is not just protecting against a bad outcome -- it's deeply about improving your models and improving your products," Joshi emphasizes. This philosophy reflects a maturing approach to AI safety, moving from simple guard rails to continuous improvement. The real test for Patronus AI isn't just catching mistakes -- it will be keeping pace with AI's breakneck evolution. As language models grow more sophisticated, their hallucinations may become harder to spot, like finding increasingly convincing forgeries. The stakes couldn't be higher. Every time an AI system invents facts, recommends dangerous treatments, or generates copyrighted content, it erodes the trust these tools need to transform business. Without reliable guardrails, the AI revolution risks stumbling before it truly begins. In the end, it's a simple truth: If artificial intelligence can't stop making things up, it may be humans who end up paying the price.
Share
Share
Copy Link
Patronus AI introduces a new API designed to detect and prevent AI failures in real-time, offering developers tools to ensure accuracy and reliability in AI applications.
Patronus AI, a San Francisco-based startup, has unveiled a groundbreaking API designed to enhance the reliability and accuracy of AI applications. This development comes on the heels of the company's recent $17 million Series A funding round, which included participation from Datadog Inc.'s venture capital arm 12.
The newly launched Patronus API serves as a sophisticated "spell-checker" for AI systems, aimed at detecting and preventing AI failures in real-time. This tool is particularly crucial as companies rapidly deploy AI technologies across various sectors, facing challenges such as hallucinations, security vulnerabilities, and unpredictable behavior 2.
Key features of the Patronus API include:
Recent research by Patronus AI has highlighted the urgency of their solution. Findings show that leading AI models like GPT-4 reproduce copyrighted content 44% of the time when prompted, while even advanced models generate unsafe responses in over 20% of basic safety tests 2.
The API offers a choice of several LLM evaluation algorithms, including Lynx, an open-source language model optimized to detect incorrect AI output. Lynx has demonstrated superior performance, outperforming GPT-4 by 8.3% in detecting medical inaccuracies 12.
Patronus AI has adopted a usage-based pricing model, making the technology accessible to businesses of all sizes. The API is offered with a Python SDK for easy integration, and pricing starts at 15 cents per million tokens for smaller evaluators and $5 per million tokens for larger ones 12.
The launch of the Patronus API comes at a critical juncture in AI development. As large language models become more powerful and widely used, the risks associated with AI failures grow correspondingly. Early adopters of the Patronus API include major enterprises such as HP, AngelList, and Pearson, indicating the perceived importance of AI safety in the industry 2.
With recent regulatory moves, including President Biden's AI executive order and the EU's AI Act, companies may soon face legal requirements to ensure their AI systems are safe and reliable. Tools like the Patronus API could become essential for compliance in this evolving landscape 2.
As the AI industry continues to evolve rapidly, the introduction of the Patronus API represents a significant step towards more reliable and trustworthy AI applications, potentially reshaping how businesses approach AI safety and deployment.
Patronus AI releases Glider, a lightweight 3.8 billion parameter AI model that outperforms larger models in evaluating AI systems, offering speed, transparency, and on-device capabilities.
2 Sources
2 Sources
Amazon Web Services introduces Automated Reasoning checks to tackle AI hallucinations and Model Distillation for creating smaller, efficient AI models, along with multi-agent collaboration features in Amazon Bedrock.
7 Sources
7 Sources
Galileo introduces a new platform to evaluate and improve AI agent performance, addressing critical challenges in enterprise AI deployment and reliability.
2 Sources
2 Sources
Salesforce, Cisco, and Accenture form an alliance to address AI-related security concerns. Meanwhile, Salesforce's AI chief discusses the company's internal use of its Einstein products.
2 Sources
2 Sources
MLCommons, an industry-led AI consortium, has introduced AILuminate, a benchmark for assessing the safety of large language models. This initiative aims to standardize AI safety evaluation and promote responsible AI development.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved