2 Sources
[1]
Probably raises $9M to build a more reliable kind of AI
As LLMs have grown more powerful, hallucinations have proven stubbornly difficult to avoid. Errors pop up in even the smartest models, and while there are ways to catch those errors, the industry is still figuring out the best way to do it. Probably, which just raised $9 million in seed funding from Andreessen Horowitz, is trying to build a more rigorous way to catch those errors. As founder Peter Elias (pictured above) puts it, the company's goal is to prevent hallucinations and simple factual errors from ever reaching the user, and achieve the kind of 99.99% accuracy that's common in deterministic systems but much more difficult to reach with AI. As it turns out, bringing LLMs to that level of accuracy requires rethinking many of the basic assumptions of AI engineering. Probably's first product is a data science tool, built to produce quick answers from complex datasets. Each result comes with a citation and an audit trail for how it was developed, an increasingly common practice among AI tools. But keeping errors from creeping into those summaries required an elaborate harness system that Elias describes as a "data science mech suit." The LLM's first-pass answers are checked against a deterministic validator system, which bounces back any results that don't match the dataset. Crucially, the LLM has been trained against the validator, and the whole system is optimized for fast and accurate answers, the company said. "What we learned building this was that the better your harness engineering is, the weaker the model can be," Elias says. "If you can refine the context enough, the model does not have to work very hard to do the right thing. Basically, it's an exercise in reducing ambiguity." That allows Probably's data science tool to run on significantly smaller AI models. Elias says the current version is running on a model that's "four classes weaker than the frontier models," which means it can be run on local hardware (that is, a desktop computer instead of a data center), which reduces a huge amount of the token costs associated with AI use. It's a welcome idea at a time when token costs are rising and many customers are reassessing their AI budgets. And, Elias' idea doesn't end with data science, as the same engine can be extended to cover use cases like accounting or medical services -- as Elias puts it, "any precision-sensitive use case." "I think it's really interesting that the big AI labs have not even attempted to do this," Elias says. "They're incentivized not to, because they make money the more times you have to correct the model."
[2]
Probably raises $9M to fix AI hallucinations on cheap models
Probably has raised $9M to wrap small AI models in a deterministic 'harness' that catches hallucinations before they reach you. The payoff: near-perfect accuracy on a model cheap enough to run on a laptop. Most of the AI industry is trying to fix hallucinations by building bigger, smarter models. A startup called Probably is betting on the opposite. The company has raised $9m in a seed round co-led by Andreessen Horowitz and Accel, with Tokyo Black and Vermilion Cliffs Ventures, to catch AI's factual errors before they ever reach a user. It is aiming for the 99.99% accuracy that ordinary software takes for granted but large language models rarely hit. Its trick is to lean on the model less, not more. Probably's first product, a local 'verifiable data agent' that answers questions from messy datasets, runs each answer through what founder Peter Elias calls a 'data science mech suit'. A harness, not a bigger brain The model takes a first pass, then a separate, deterministic validator checks the answer against the actual data and bounces anything that does not match. The model is trained against that validator, and every result ships with a citation and an audit trail. 'The better your harness engineering is, the weaker the model can be,' Elias says. Reduce the ambiguity enough, the argument goes, and the AI barely has to think. That has a striking consequence for cost. Probably's tool runs on a model Elias describes as 'four classes weaker' than the frontier, small enough to run on a desktop rather than a data centre, which strips out most of the token bill. It also doubles as a privacy pitch. The whole thing runs locally on the open-source database DuckDB, and the company says the model only ever sees metadata and statistics, never the raw data, which stays on your machine. Aimed at the token-cost backlash The timing is pointed. Companies are watching AI bills balloon even as per-token prices collapse, and a tool that delivers accuracy on cheap, local hardware speaks directly to that anxiety. It also lands where errors hurt most. Probably says the same engine could extend to accounting or medical work, any 'precision-sensitive' job, the kind where a confident wrong answer is the whole problem, as researchers warning about hallucinations in science keep pointing out. A provocative claim, and the catch Elias goes further, arguing the big labs have not built this because 'they make money the more times you have to correct the model'. It is a tidy sales line, and a contestable one: the major labs pour resources into cutting hallucinations, and a smaller player has every reason to cast itself as the honest broker. The bigger caveat is scope. A validator only works when there is a hard ground truth to check against, such as a dataset, which is why Probably started with data rather than open-ended writing. It is a $9m seed, the product is in public preview at version 0.1, and the 99.99% figure is still a goal, not a result. But in a market crowded with attempts to tame hallucinations, betting on smaller models is at least a refreshingly different wager, and one a16z and Accel were willing to fund.
Share
Copy Link
Probably AI has secured $9 million in seed funding from Andreessen Horowitz and Accel to tackle AI hallucinations with a novel approach. Instead of building bigger models, the startup wraps smaller AI models in a deterministic validator system that catches errors before they reach users, aiming for 99.99% accuracy while running on local hardware.
Probably AI has raised $9 million in seed funding co-led by Andreessen Horowitz and Accel, with participation from Tokyo Black and Vermilion Cliffs Ventures, to address one of the most persistent challenges in artificial intelligence: reducing factual errors in LLMs
1
2
. While most of the industry attempts to fix AI hallucinations by building larger, more powerful large language models, founder Peter Elias is betting on the opposite strategy. The company aims to achieve 99.99% accuracy—the kind of reliability common in deterministic systems but rarely seen in AI—by catching errors before they ever reach users1
.
Source: TechCrunch
Probably AI's first product is a verifiable data agent designed as a data science tool that produces quick answers from complex datasets. Each result includes a citation and audit trails showing how it was developed
1
. What sets this approach apart is what Elias describes as a "data science mech suit"—an elaborate harness system that validates every answer. The process works by having the LLM take a first pass at answering queries, then running those results through a separate deterministic validator system that checks answers against the actual dataset and rejects anything that doesn't match2
. The LLM has been trained against this validator, and the entire system is optimized for fast and high accuracy AI responses.The implications for cost are striking. "What we learned building this was that the better your harness engineering is, the weaker the model can be," Elias explains. "If you can refine the context enough, the model does not have to work very hard to do the right thing. Basically, it's an exercise in reducing ambiguity"
1
. This approach allows Probably's data science tool to run on models that are "four classes weaker than the frontier models," meaning it can operate on local hardware like a desktop computer instead of requiring a data center . This dramatically reduces token costs associated with AI use, a welcome development as companies reassess their AI budgets amid rising expenses2
.Related Stories
The tool runs locally on the open-source database DuckDB, and the company states that the model only sees metadata and statistics, never raw data, which remains on the user's machine
2
. This creates a compelling privacy pitch alongside the cost benefits. Elias envisions extending the same engine to cover precision-sensitive fields like accounting and medical services—"any precision-sensitive use case" where confident wrong answers pose serious risks1
. Researchers have repeatedly warned about AI hallucinations in science and critical applications, making this approach particularly relevant2
.Elias offers a provocative explanation for why big AI labs haven't attempted this strategy: "They're incentivized not to, because they make money the more times you have to correct the model"
1
. While major labs do invest resources in cutting AI hallucinations, the observation highlights a potential misalignment between business models and user needs. The approach does have limitations—a validator only works when there's hard ground truth to check against, like a dataset, which is why Probably started with data rather than open-ended writing2
. The product is currently in public preview at version 0.1, and the 99.99% accuracy figure remains a goal rather than a proven result. Still, in a market crowded with attempts to tame hallucinations, betting on smaller models wrapped in rigorous validation represents a distinct strategy that attracted backing from Andreessen Horowitz and Accel.Summarized by
Navi
[1]
[2]
01 Nov 2024•Technology

12 May 2026•Startups

05 Feb 2026•Technology

1
Policy and Regulation

2
Policy and Regulation

3
Business and Economy
