White Circle Raises $11 Million to Stop AI Models From Going Rogue in Production Environments

2 Sources

Share

Paris-based White Circle has raised $11 million in seed funding for its AI control platform that monitors and secures AI models in production. Founded by Denis Shilov, who went viral after exposing safety vulnerabilities in major AI models, the platform has already processed over one billion API requests. Backers include senior figures from OpenAI, Anthropic, Mistral, and Hugging Face.

White Circle Secures $11 Million Funding From AI Industry Leaders

White Circle, an AI control platform designed to monitor and secure AI models in production, has raised $11 million in seed funding from a roster of prominent AI industry figures. The Paris-based startup attracted personal investments from Romain Huet of OpenAI, Durk Kingma (formerly OpenAI, now at Anthropic), Guillaume Lample of Mistral, Thomas Wolf of Hugging Face, Olivier Pomel of Datadog, François Chollet (creator of Keras), and David Cramer of Sentry

1

2

. The participation list reads as a who's who from the labs that build the very models White Circle is designed to police. The capital will fund product development and hiring across the US, UK, and Europe, with the company currently operating a team of 20, mostly engineers, distributed across London, France, Amsterdam, and elsewhere in Europe

2

.

Source: Fortune

Source: Fortune

From Viral Jailbreak to Production AI Safety Solution

White Circle was founded by Denis Shilov, an engineer who gained widespread attention in late 2024 after discovering a universal jailbreak that bypassed the safety filters of every major AI model. While watching a crime thriller one evening, Shilov conceived a prompt that reframed AI models as API endpoints rather than chatbots with safety rules, making them comply with dangerous requests they were supposed to refuse

2

. His post on X reached 1.4 million views and prompted contact from Anthropic, OpenAI, and Hugging Face, leading to his participation in Anthropic's bug-bounty programme

1

. The experience convinced Shilov that jailbreaks represented just one facet of a much larger problem facing companies integrating AI into their workflows. "In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm," Shilov explained

2

.

Real-Time Scanning of AI Inputs and Outputs

The White Circle platform functions as an enforcement layer between users and AI models, providing real-time scanning of AI inputs and outputs against customer-defined policies. The single-API control layer detects harmful content, catches instances where models hallucinate, blocks prompt-injection attacks, flags model drift, and identifies abusive users

1

. If a user attempts to generate malware, scams, or other prohibited content, the system can flag or block the request. When a model starts leaking sensitive data, promising refunds it cannot issue, or taking destructive actions inside a software environment, White Circle's platform can intervene

2

. Customers can set custom enforcement actions, including rate-limiting and bans, and feed labelled user feedback back into White Circle's models to improve accuracy over time. The platform supports 150 languages and is SOC 2 Type I and Type II certified and HIPAA-compliant

1

.

Growing Urgency to Stop AI Models From Going Rogue

As companies transition from simple chatbots to autonomous AI agents that can write code, browse the web, access files, and take actions on behalf of users, the risks become more widespread. A customer service bot might promise a refund it is not authorized to give, a coding agent might install something dangerous on a virtual machine, or a model embedded in a fintech app might mishandle sensitive customer information

2

. Shilov argues that AI safety will not be solved entirely at the model-training stage, and that companies need to define and enforce what good AI behavior looks like inside their own products rather than relying solely on AI labs' safety testing. "We're actually enforcing behavior," Shilov said. "Model labs do some safety tuning, but it's very general and typically about the model refraining from answering questions about drugs and bioweapons. But in production, you end up having a lot more potential issues"

2

.

Research Exposes Gaps in AI Moderation and Model Behavior

White Circle has released two pieces of research that underscore the accountability gap in AI deployments. CircleGuardBench, published in May 2025, is a benchmark that tests how AI moderation models perform under real-world conditions

1

. KillBench ran more than one million experiments across 15 AI models from OpenAI, Google, Anthropic, and xAI, finding preferences linked to nationality, religion, body type, and even phone brand when models were asked to make decisions about human lives. The study also documented that structured-output integrations, the standard for production AI deployments, caused refusal rates to collapse and biases to amplify

1

. These findings highlight why companies need additional layers of control beyond what model providers offer.

Market Traction and Mixed Incentives in AI Safety

The platform has already served more than one billion API requests and lists Lovable, the vibe-coding startup, and two of the world's largest digital banks among its customers

1

2

. Shilov pointed out that model providers have mixed incentives to build the kind of real-time control layer White Circle provides. AI companies still charge for input and output tokens even when a model refuses a harmful request, which reduces the financial incentive to block abuse before it reaches the model. He also referenced what researchers call the alignment tax—the idea that training models to be safer can sometimes make them less performant on tasks such as coding

2

. This dynamic creates an opening for third-party solutions that can prevent prompt-injection attacks and other security issues without compromising model performance. Ophelia Cai, partner at Tiny VC, noted that "Denis and the White Circle team have an unusual combination of deep technical credibility and a clear commercial instinct"

1

. As businesses embed models into more products across healthcare, finance, legal, and coding platforms, the ability to control what an AI system is allowed to do in specific environments becomes increasingly critical for managing risk and maintaining compliance.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved