2 Sources
2 Sources
[1]
How AI could eat itself: Using LLMs to distill rivals
Two of the world's biggest AI companies, Google and OpenAI, both warned this week that competitors including China's DeepSeek are probing their models to steal the underlying reasoning, and then copy these capabilities in their own AI systems. "This is coming from threat actors throughout the globe," Google Threat Intelligence Group chief analyst John Hultquist told The Register, adding that the perpetrators are "private-sector companies." He declined to name specific companies or countries involved in this type of intellectual property theft. "Your model is really valuable IP, and if you can distill the logic behind it, there's very real potential that you can replicate that technology - which is not inexpensive," Hultquist said. "This is such an important technology, and the list of interested parties in replicating it are endless." Google calls this process of using prompts to clone its models "distillation attacks," and in a Thursday report said one campaign used more than 100,000 prompts to "try to replicate Gemini's reasoning ability in non-English target languages across a wide variety of tasks." American tech giants have spent billions of dollars training and developing their own LLMs. Abusing legitimate access to mature models like Gemini, and then using this information to train newer models, makes it significantly cheaper and easier for competitors to develop their own AI chatbots and systems. Google says it detected this probe in real time and protected its internal reasoning traces. However, distillation appears to be yet another AI risk that is extremely difficult - if not impossible - to eliminate. This is such an important technology, and the list of interested parties in replicating it are endless Distillation from Gemini models without permission violates Google's terms of service, and Google can block accounts that do this, or even take users to court. While the company says it continues to develop better ways to detect and stop these attempts, the very nature of LLMs makes them susceptible. Public-facing AI models are widely accessible, and enforcement against abusive accounts can turn into a game of whack-a-mole. Plus, as Hultquist warned, as other companies develop their own models and train them on internal, sensitive data, the risk from distillation attacks is going to spread. "We're on the frontier when it comes to this, but as more organizations have models that they provide access to, it's inevitable," he said. "As this technology is adopted and developed by businesses like financial institutions, their intellectual property could also be targeted in this way." Meanwhile, OpenAI, in a Thursday memo [PDF] to the House Select Committee on China, blamed DeepSeek and other Chinese LLM providers and universities for copying ChatGPT and other US firms' frontier models. It also noted some occasional activity from Russia, and warned illicit model distillation poses a risk to "American-led, democratic AI." China's distillation methods over the last year have become more sophisticated, moving beyond chain-of-thought (CoT) extraction to multi-stage operations. These include synthetic-data generation, large-scale data cleaning, and other stealthy methods. As OpenAI wrote: OpenAI also notes that it has invested in stronger detections to prevent unauthorized distillation. It bans accounts that violate its terms of service and proactively removes users who appear to be attempting to distill its models. Still, the company admits that it alone can't solve the model distillation problem. It's going to take an "ecosystem security" approach to protect against distillation, and this will require some US government assistance, OpenAI says. "It is not enough for any one lab to harden its protection because adversaries will simply default to the least protected provider," according to the memo. The AI company also suggests that US government policy "may be helpful" when it comes to sharing information and intelligence, and working with the industry to develop best practices on distillation defenses. OpenAI also called on Congress to close API router loopholes that allow DeepSeek and other competitors to access US models, and to restrict "adversary" access to US compute and cloud infrastructure. ®
[2]
Google Says People Are Copying Its AI Without Its Permission, Much Like It Scraped Everybody's Data Without Asking to Create Its AI in the First Place
Google has relied on a tremendous amount of material without permission to train its Gemini AI models. The company, alongside many of its competitors in the AI space, has been indiscriminately scraping the internet for content, without compensating rightsholders, racking up many copyright infringement lawsuits along the way. But when it comes to its own tech being copied, Google has no problem pointing fingers. This week, the company accused "commercially motivated" actors of trying to clone its Gemini AI. In a Thursday report, Google complained it had become under "distillation attacks," with agents querying Gemini up to 100,000 times to "extract" the underlying model -- the convoluted AI industry equivalent of copying somebody's homework, basically. Google called the attacks a "method of intellectual property theft that violates Google's terms of service" -- which, let's face it, is a glaring double standard given its callous approach to scraping other IP without remuneration. Google remained vague on who it identified as the culprits, beyond pointing out "private sector entities" and "researchers seeking to clone proprietary logic." The stakes are high, as companies continue to pour tens of billions of dollars into AI infrastructure to make models more powerful. It's no wonder Google is scared to lose its competitive edge as offerings start to converge at the head of the pack. The output of one pioneering model has become almost indistinguishable from another, forcing companies to try to differentiate their products. It's far from the first time the subject of model distillation has caused drama. Chinese startup DeepSeek rattled Silicon Valley to its core in early 2025 after showing off a far cheaper and more efficient AI model. At the time, OpenAI suggested DeepSeek may have broken its terms of service by distilling its AI models. The ChatGPT maker quickly became the subject of widespread mockery following the comments, with netizens accusing the company of hypocrisy, pointing out that OpenAI itself had indiscriminately ripped off other people's work for many years. Google's latest troubles likely won't be the last time we hear about smaller actors trying to extract mainstream AI models through distillation. Google's Threat Intelligence Group chief analyst John Hultquist told NBC News that "we're going to be the canary in the coal mine for far more incidents." But whether they'll be able to defend themselves in the coming months and years remains uncertain. AI companies remain significantly exposed since their models are available for public use. "Historically, adversaries seeking to steal high-tech capabilities used conventional computer-enabled intrusion operations to compromise organizations and steal data containing trade secrets," Google's report reads. "For many AI technologies where LLMs are offered as services, this approach is no longer required; actors can use legitimate API access to attempt to 'clone' select AI model capabilities." Google outlined one case study, after finding that attackers were using "over 100,000 prompts," suggesting an "attempt to replicate Gemini's reasoning ability in non-English target languages across a wide variety of tasks." However, the company's systems "recognized this attack in real time and lowered the risk of this particular attack." It's a particularly vulnerable point in time as AI companies are desperately trying to find a way of monetizing the tech through a variety of revenue drivers, from pricey subscription models to ads. With far lower upfront costs, it's entirely possible that much smaller entities could break through, not unlike what we saw with DeepSeek in early 2025.
Share
Share
Copy Link
Google and OpenAI revealed that competitors are using distillation attacks to clone their AI models through legitimate access. One campaign used over 100,000 prompts to extract Gemini's reasoning capabilities. Both companies warn this intellectual property theft poses risks to the AI industry, though critics note the irony given their own data scraping practices.
Both Google and OpenAI issued warnings this week that competitors, including Chinese LLM providers like DeepSeek, are actively probing large language models to extract underlying reasoning capabilities and replicate them in their own systems. Google calls this practice distillation attacks, describing it as a form of intellectual property theft that violates terms of service. John Hultquist, chief analyst at Google Threat Intelligence Group, told The Register that threat actors from private-sector companies across the globe are targeting valuable model IP
1
. "Your model is really valuable IP, and if you can distill the logic behind it, there's very real potential that you can replicate that technology - which is not inexpensive," Hultquist explained.
Source: Futurism
Google detected one campaign that used over 100,000 prompts attempting to replicate AI reasoning ability in non-English languages across various tasks
1
. The company's systems recognized this attack in real time and protected its internal reasoning traces2
. This method of model distillation exploits legitimate API access to public-facing LLMs, making it significantly cheaper for competitors to develop their own chatbots without spending billions on training. As Google's report notes, adversaries no longer need conventional computer intrusion to steal trade secrets—they can simply use legitimate service access to clone AI model capabilities2
.OpenAI, in a Thursday memo to the House Select Committee on China, specifically blamed DeepSeek and other Chinese universities for copying ChatGPT and frontier models from U.S. firms
1
. The company noted that China's distillation methods have grown more sophisticated over the past year, evolving beyond chain-of-thought extraction to multi-stage operations involving synthetic-data generation and large-scale data cleaning. OpenAI warned that illicit model distillation poses a risk to "American-led, democratic AI" and called for U.S. government intervention to address adversary access to AI infrastructure1
.Related Stories
Both companies acknowledge that individual labs cannot solve this problem alone. OpenAI argues that AI ecosystem security requires an industry-wide approach, stating "it is not enough for any one lab to harden its protection because adversaries will simply default to the least protected provider"
1
. The company suggests U.S. government policy could help by sharing intelligence, developing best practices on distillation defenses, closing API router loopholes, and restricting adversary access to U.S. compute and cloud infrastructure. Hultquist warned that as more organizations develop models trained on internal, sensitive data, the risk spreads beyond tech giants to financial institutions and other businesses1
.The complaints have sparked criticism given that both Google and OpenAI built their AI models by scraping vast amounts of internet content without permission or compensation, facing numerous copyright infringement lawsuits in the process
2
. Critics point out the double standard: while these companies characterize distillation as intellectual property theft, they've shown little regard for others' IP rights. The AI industry now faces a vulnerability that may be impossible to eliminate—public-facing models remain widely accessible, and enforcement against abusive accounts becomes a game of whack-a-mole1
. Google can ban accounts for violation of terms of service or pursue legal action, but the fundamental nature of LLMs makes them susceptible to probing. As smaller entities potentially break through with lower upfront costs—similar to DeepSeek's disruption in early 2025—the stakes for protecting proprietary reasoning capabilities continue to rise2
.Summarized by
Navi
[1]
12 Feb 2026•Technology

31 Jan 2025•Technology

04 Jun 2025•Technology

1
Technology

2
Business and Economy

3
Science and Research
