Probably AI Raises $9M to Catch Hallucinations Using Smaller, Cheaper Models

2 Sources

Share

Probably AI has secured $9 million in seed funding from Andreessen Horowitz and Accel to tackle AI hallucinations with a novel approach. Instead of building bigger models, the startup wraps smaller AI models in a deterministic validator system that catches errors before they reach users, aiming for 99.99% accuracy while running on local hardware.

Probably AI Secures Seed Funding to Tackle AI Hallucinations

Probably AI has raised $9 million in seed funding co-led by Andreessen Horowitz and Accel, with participation from Tokyo Black and Vermilion Cliffs Ventures, to address one of the most persistent challenges in artificial intelligence: reducing factual errors in LLMs

1

2

. While most of the industry attempts to fix AI hallucinations by building larger, more powerful large language models, founder Peter Elias is betting on the opposite strategy. The company aims to achieve 99.99% accuracy—the kind of reliability common in deterministic systems but rarely seen in AI—by catching errors before they ever reach users

1

.

Source: TechCrunch

Source: TechCrunch

A Verifiable Data Agent Built on Harness Engineering

Probably AI's first product is a verifiable data agent designed as a data science tool that produces quick answers from complex datasets. Each result includes a citation and audit trails showing how it was developed

1

. What sets this approach apart is what Elias describes as a "data science mech suit"—an elaborate harness system that validates every answer. The process works by having the LLM take a first pass at answering queries, then running those results through a separate deterministic validator system that checks answers against the actual dataset and rejects anything that doesn't match

2

. The LLM has been trained against this validator, and the entire system is optimized for fast and high accuracy AI responses.

Smaller Cheaper Models That Run on Local Hardware

The implications for cost are striking. "What we learned building this was that the better your harness engineering is, the weaker the model can be," Elias explains. "If you can refine the context enough, the model does not have to work very hard to do the right thing. Basically, it's an exercise in reducing ambiguity"

1

. This approach allows Probably's data science tool to run on models that are "four classes weaker than the frontier models," meaning it can operate on local hardware like a desktop computer instead of requiring a data center . This dramatically reduces token costs associated with AI use, a welcome development as companies reassess their AI budgets amid rising expenses

2

.

Privacy and Precision-Sensitive Fields

The tool runs locally on the open-source database DuckDB, and the company states that the model only sees metadata and statistics, never raw data, which remains on the user's machine

2

. This creates a compelling privacy pitch alongside the cost benefits. Elias envisions extending the same engine to cover precision-sensitive fields like accounting and medical services—"any precision-sensitive use case" where confident wrong answers pose serious risks

1

. Researchers have repeatedly warned about AI hallucinations in science and critical applications, making this approach particularly relevant

2

.

Why Major Labs Haven't Pursued This Approach

Elias offers a provocative explanation for why big AI labs haven't attempted this strategy: "They're incentivized not to, because they make money the more times you have to correct the model"

1

. While major labs do invest resources in cutting AI hallucinations, the observation highlights a potential misalignment between business models and user needs. The approach does have limitations—a validator only works when there's hard ground truth to check against, like a dataset, which is why Probably started with data rather than open-ended writing

2

. The product is currently in public preview at version 0.1, and the 99.99% accuracy figure remains a goal rather than a proven result. Still, in a market crowded with attempts to tame hallucinations, betting on smaller models wrapped in rigorous validation represents a distinct strategy that attracted backing from Andreessen Horowitz and Accel.

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved