2 Sources
[1]
AI companies race to use 'distillation' to produce cheaper models
San Francisco/London | Leading artificial intelligence firms including OpenAI, Microsoft and Meta are turning to a process called "distillation" in the global race to create AI models that are cheaper for consumers and businesses to adopt. The technique caught widespread attention after China's DeepSeek used it to build powerful and efficient AI models based on open-source systems released by competitors Meta and Alibaba.
[2]
AI firms follow DeepSeek's lead, create cheaper models with "distillation
Leading artificial intelligence firms including OpenAI, Microsoft, and Meta are turning to a process called "distillation" in the global race to create AI models that are cheaper for consumers and businesses to adopt. The technique caught widespread attention after China's DeepSeek used it to build powerful and efficient AI models based on open-source systems released by competitors Meta and Alibaba. The breakthrough rocked confidence in Silicon Valley's AI leadership, leading Wall Street investors to wipe billions of dollars of value from US Big Tech stocks. Through distillation, companies take a large language model -- dubbed a "teacher" model -- which generates the next likely word in a sentence. The teacher model generates data which then trains a smaller "student" model, helping to quickly transfer knowledge and predictions of the bigger model to the smaller one. While distillation has been widely used for years, recent advances have led industry experts to believe the process will increasingly be a boon for start-ups seeking cost-effective ways to build applications based on the technology. "Distillation is quite magical," said Olivier Godement, head of product for OpenAI's platform. "It's the process of essentially taking a very large smart frontier model and using that model to teach a smaller model . . . very capable in specific tasks that is super cheap and super fast to execute." Large language models such as OpenAI's GPT-4, Google's Gemini and Meta's Llama require massive amounts of data and computing power to develop and maintain. While the companies have not revealed precise figures for how much it costs to train large models, it is likely to be hundreds of millions of dollars. Thanks to distillation, developers, and businesses can access these models' capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones. Developers can use OpenAI's platform for distillation, learning from the large language models that underpin products like ChatGPT. OpenAI's largest backer, Microsoft, used GPT-4 to distill its small language family of models Phi as part of a commercial partnership after investing nearly $14 billion into the company. However, the San Francisco-based start-up has said it believes DeepSeek distilled OpenAI's models to train its competitor, a move that would be against its terms of service. DeepSeek has not commented on the claims. While distillation can be used to create high-performing models, experts add they are more limited. "Distillation presents an interesting trade-off; if you make the models smaller, you inevitably reduce their capability," said Ahmed Awadallah of Microsoft Research, who said a distilled model can be designed to be very good at summarising emails, for example, "but it really would not be good at anything else." David Cox, vice-president for AI models at IBM Research, said most businesses do not need a massive model to run their products, and distilled ones are powerful enough for purposes such as customer service chatbots or running on smaller devices like phones. "Any time you can [make it less expensive] and it gives you the right performance you want, there is very little reason not to do it," he added. That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load. Yet, OpenAI's Godement argued that large language models will still be required for "high intelligence and high stakes tasks" where "businesses are willing to pay more for a high level of accuracy and reliability." He added that large models will also be needed to discover new capabilities that can then be distilled into smaller ones. Still, the company aims to prevent its large models from being distilled to train a competitor. OpenAI has teams monitoring usage and can remove access to users it suspects are generating vast amounts of data to export and train a rival, as it has apparently done with accounts it believes were linked to DeepSeek. Yet much of this action happens retroactively. "OpenAI has been trying to protect against distillation for a long time, but it is very hard to avoid it altogether," said Douwe Kiela, chief executive of Contextual AI, a start-up building information retrieval tools for enterprises. Distillation is also a victory for advocates of open models, where the technology is made freely available for developers to build upon. DeepSeek has made its recent models also open for developers. "We're going to use [distillation] and put it in our products right away," said Yann LeCun, Meta's chief AI scientist. "That's the whole idea of open source. You profit from everyone and everyone else's progress as long as those processes are open." Distillation also means that model-makers can spend billions of dollars to advance the capabilities of AI systems but still face competitors that often catch up quickly, as DeepSeek's recent releases demonstrate. This raises questions about the first-mover advantage in building LLMs when their capabilities can be replicated in a matter of months. "In a world where things are moving so fast . . . you could actually spend a lot of money, doing it the hard way, and then the rest of the field is right on your heels," IBM's Cox said. "So it is an interesting and tricky business landscape." Additional reporting by Michael Acton in San Francisco. © 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.
Share
Copy Link
Leading AI companies are adopting 'distillation' techniques to create more cost-effective AI models, following the success of China's DeepSeek. This shift could democratize AI access but challenges existing business models.
In a significant shift in the artificial intelligence landscape, leading AI companies including OpenAI, Microsoft, and Meta are turning to a process called "distillation" to create more affordable and efficient AI models. This move comes in response to the success of China's DeepSeek, which used the technique to build powerful models based on open-source systems from Meta and Alibaba 1.
Distillation involves using a large language model, termed the "teacher" model, to generate data that trains a smaller "student" model. This process effectively transfers knowledge and predictions from the larger model to the smaller one, resulting in more cost-effective and faster-to-execute AI systems 2.
Olivier Godement, head of product for OpenAI's platform, describes distillation as "quite magical," enabling the creation of smaller models that are "super cheap and super fast to execute" while remaining capable of specific tasks 2.
The adoption of distillation techniques could significantly reduce the costs associated with AI model development and deployment. Large language models like GPT-4, Google's Gemini, and Meta's Llama require massive amounts of data and computing power, with estimated development costs in the hundreds of millions of dollars 2.
Distillation allows developers and businesses to access these models' capabilities at a fraction of the price, enabling AI applications to run quickly on devices such as laptops and smartphones. This democratization of AI technology could lead to more widespread adoption and innovation across various sectors 2.
While distillation offers numerous benefits, it also presents challenges:
Limited Capabilities: Distilled models, while high-performing, are more limited in their abilities compared to their larger counterparts. Ahmed Awadallah of Microsoft Research notes that a distilled model might excel at specific tasks like summarizing emails but would struggle with broader applications 2.
Business Model Disruption: The rise of cheaper, distilled models challenges the revenue streams of leading AI firms. Companies like OpenAI may need to adapt their pricing strategies as distilled models require less computational power and are less expensive to create and run 2.
Intellectual Property Concerns: OpenAI has raised concerns about the potential misuse of distillation, claiming that DeepSeek may have used their models to train competitors, violating their terms of service 2.
Major players in the AI industry are adapting to this new landscape:
The rise of distillation techniques is reshaping the AI landscape, potentially leveling the playing field between tech giants and smaller players. As David Cox from IBM Research notes, this rapid advancement raises questions about the first-mover advantage in building large language models when their capabilities can be replicated quickly 2.
As the AI industry continues to evolve, the balance between innovation, accessibility, and protecting intellectual property will remain a critical challenge for companies and policymakers alike.
Summarized by
Navi
[1]
Anthropic has cut off OpenAI's API access to its Claude AI models, citing violations of terms of service. The move comes as OpenAI prepares to launch GPT-5, highlighting growing competition in the AI industry.
5 Sources
Technology
8 hrs ago
5 Sources
Technology
8 hrs ago
Major tech companies are investing unprecedented amounts in AI infrastructure, with combined spending expected to reach $344 billion in 2025. This massive expenditure reflects the intense competition and fear of missing out in the rapidly evolving AI landscape.
3 Sources
Business and Economy
16 hrs ago
3 Sources
Business and Economy
16 hrs ago
Microsoft co-founder Bill Gates expresses surprise at AI's rapid advancement and discusses its potential to replace human workers, highlighting the uncertainty surrounding the timeline for this transition.
2 Sources
Technology
8 hrs ago
2 Sources
Technology
8 hrs ago
AI startups are experiencing unprecedented growth with record-breaking investments and strategic acquisitions, signaling a robust market despite economic uncertainties.
2 Sources
Startups
8 hrs ago
2 Sources
Startups
8 hrs ago
Researchers at NJIT use AI to identify five promising materials for multivalent-ion batteries, potentially revolutionizing energy storage technology and offering a sustainable alternative to lithium-ion batteries.
2 Sources
Science and Research
8 hrs ago
2 Sources
Science and Research
8 hrs ago