White Circle Raises $11M for AI Control Platform

White Circle Secures $11 Million Funding From AI Industry Leaders

White Circle, an AI control platform designed to monitor and secure AI models in production, has raised $11 million in seed funding from a roster of prominent AI industry figures. The Paris-based startup attracted personal investments from Romain Huet of OpenAI, Durk Kingma (formerly OpenAI, now at Anthropic), Guillaume Lample of Mistral, Thomas Wolf of Hugging Face, Olivier Pomel of Datadog, François Chollet (creator of Keras), and David Cramer of Sentry1

. The participation list reads as a who's who from the labs that build the very models White Circle is designed to police. The capital will fund product development and hiring across the US, UK, and Europe, with the company currently operating a team of 20, mostly engineers, distributed across London, France, Amsterdam, and elsewhere in Europe2

Source: Fortune

From Viral Jailbreak to Production AI Safety Solution

White Circle was founded by Denis Shilov, an engineer who gained widespread attention in late 2024 after discovering a universal jailbreak that bypassed the safety filters of every major AI model. While watching a crime thriller one evening, Shilov conceived a prompt that reframed AI models as API endpoints rather than chatbots with safety rules, making them comply with dangerous requests they were supposed to refuse2

. His post on X reached 1.4 million views and prompted contact from Anthropic, OpenAI, and Hugging Face, leading to his participation in Anthropic's bug-bounty programme1

. The experience convinced Shilov that jailbreaks represented just one facet of a much larger problem facing companies integrating AI into their workflows. "In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm," Shilov explained2

Real-Time Scanning of AI Inputs and Outputs

The White Circle platform functions as an enforcement layer between users and AI models, providing real-time scanning of AI inputs and outputs against customer-defined policies. The single-API control layer detects harmful content, catches instances where models hallucinate, blocks prompt-injection attacks, flags model drift, and identifies abusive users1

. If a user attempts to generate malware, scams, or other prohibited content, the system can flag or block the request. When a model starts leaking sensitive data, promising refunds it cannot issue, or taking destructive actions inside a software environment, White Circle's platform can intervene2

. Customers can set custom enforcement actions, including rate-limiting and bans, and feed labelled user feedback back into White Circle's models to improve accuracy over time. The platform supports 150 languages and is SOC 2 Type I and Type II certified and HIPAA-compliant1

Growing Urgency to Stop AI Models From Going Rogue

As companies transition from simple chatbots to autonomous AI agents that can write code, browse the web, access files, and take actions on behalf of users, the risks become more widespread. A customer service bot might promise a refund it is not authorized to give, a coding agent might install something dangerous on a virtual machine, or a model embedded in a fintech app might mishandle sensitive customer information2

. Shilov argues that AI safety will not be solved entirely at the model-training stage, and that companies need to define and enforce what good AI behavior looks like inside their own products rather than relying solely on AI labs' safety testing. "We're actually enforcing behavior," Shilov said. "Model labs do some safety tuning, but it's very general and typically about the model refraining from answering questions about drugs and bioweapons. But in production, you end up having a lot more potential issues"2

Research Exposes Gaps in AI Moderation and Model Behavior

White Circle has released two pieces of research that underscore the accountability gap in AI deployments. CircleGuardBench, published in May 2025, is a benchmark that tests how AI moderation models perform under real-world conditions1

. KillBench ran more than one million experiments across 15 AI models from OpenAI, Google, Anthropic, and xAI, finding preferences linked to nationality, religion, body type, and even phone brand when models were asked to make decisions about human lives. The study also documented that structured-output integrations, the standard for production AI deployments, caused refusal rates to collapse and biases to amplify1

. These findings highlight why companies need additional layers of control beyond what model providers offer.

Market Traction and Mixed Incentives in AI Safety

The platform has already served more than one billion API requests and lists Lovable, the vibe-coding startup, and two of the world's largest digital banks among its customers1

. Shilov pointed out that model providers have mixed incentives to build the kind of real-time control layer White Circle provides. AI companies still charge for input and output tokens even when a model refuses a harmful request, which reduces the financial incentive to block abuse before it reaches the model. He also referenced what researchers call the alignment tax—the idea that training models to be safer can sometimes make them less performant on tasks such as coding2

. This dynamic creates an opening for third-party solutions that can prevent prompt-injection attacks and other security issues without compromising model performance. Ophelia Cai, partner at Tiny VC, noted that "Denis and the White Circle team have an unusual combination of deep technical credibility and a clear commercial instinct"1

. As businesses embed models into more products across healthcare, finance, legal, and coding platforms, the ability to control what an AI system is allowed to do in specific environments becomes increasingly critical for managing risk and maintaining compliance.

White Circle Raises $11 Million to Stop AI Models From Going Rogue in Production Environments

White Circle Secures $11 Million Funding From AI Industry Leaders

From Viral Jailbreak to Production AI Safety Solution

Real-Time Scanning of AI Inputs and Outputs

Growing Urgency to Stop AI Models From Going Rogue

Research Exposes Gaps in AI Moderation and Model Behavior

Market Traction and Mixed Incentives in AI Safety

References

White Circle raises $11m Seed for production AI control platform

Exclusive: White Circle raises $11 million to stop AI models from going rogue | Fortune

Related Stories

Irregular Raises $80 Million to Secure Frontier AI Models

AIceberg Secures $10 Million in Seed Funding to Enhance AI Security and Compliance

Lakera Secures $20M to Safeguard Enterprises Against AI Vulnerabilities

Recent Highlights

Meta AI chatbot exploited by hackers to hijack high-profile Instagram accounts worth millions

Florida sues OpenAI and Sam Altman over ChatGPT safety, alleging AI harms linked to violence

Nvidia RTX Spark chips power new AI laptops with up to 128GB memory and local agent capabilities

Recent Highlights

Today's Top Stories

Anthropic calls for global AI development slowdown as models approach recursive self-improvement

Bot Traffic Surpasses Human Activity as AI Agents Reshape the Internet Faster Than Expected

ChatGPT's Dreaming V3 memory upgrade lets it remember you better across conversations

Cambridge researchers trial first AI-designed vaccine to protect against future pandemics