3 Sources
[1]
Apple reportedly trying to distill Google's multi-trillion-parameter Gemini AI to run on iPhone
It's impossible to totally avoid generative AI when interacting with technology anymore, but Apple has a bit less of it. That's not entirely by choice, though. The iPhone maker has delayed the AI-enhanced Siri multiple times since first promising it in 2024, but a deal with Google will merge the iconic assistant with Gemini later this year. As we approach WWDC, Apple has been working to bring big AI smarts to the modest processing environment of a smartphone. Apple fans may not like the outcome, though. Apple has long crowed about the privacy value of running AI locally, but a new report suggests that despite Apple's best efforts, the iPhone's Gemini makeover will lean heavily on Google and Nvidia in the cloud. The Information reports that Apple's Gemini-infused Siri will run both on-device and in the cloud, an apparent reversal of its privacy-focused preference for local AI. With every new chip announcement, we hear about how the silicon has been optimized for AI -- even Apple does this with its focus on Neural Engine upgrades. You may think from the grandiose language that smartphones are equipped to handle beefy AI models, but that's not necessarily the case. In fact, the GPUs in most phones can process more AI tokens than the AI-focused NPUs. Components like Apple's Neural Engine are designed for contextual, efficient AI processing. Even if phones had faster AI processing, they lack the RAM to keep enormous models in memory. Even the largest AI models are still middling assistants, and that makes local AI very challenging. The AI models that run on phones are physically smaller, featuring at most a few billion parameters. Compare that to Google's latest Gemini models, which have trillions of parameters, The Information reports. On-device AI models are also "quantized" to run at lower precision, making them faster but affecting the accuracy of token generation. This all adds up to AIs that feel less smart than their cloud brethren, and even big cloud-based models can be pretty dumb sometimes. The amazing, shrinking Gemini Google does have versions of Gemini optimized for mobile devices, which it calls Gemini Nano. However, these are designed for powering contextual features like Magic Cue and audio summarization. Siri, on the other hand, is supposed to be a conversational assistant -- you talk to it and it does things. That's a different experience that requires a different kind of model. On Android, Google doesn't even bother trying to do that locally. Talking to Gemini always goes straight to the cloud. After inking the Google deal, Apple apparently got to work distilling Google's giant cloud-based Gemini models. Distillation is a process in which a small, less resource-intensive model learns to mimic a large, expensive one. With enough time, this can reliably transfer useful capabilities while pruning less important weights from the model. That may enable Siri to handle some tasks with private local compute, but a cloud component looks inevitable. Processing users' AI data in the cloud could be a problem for Apple. At WWDC, the company will probably promote its years of experience designing chips and how well that positions it for AI. However, The Information claims that Apple has struggled to even get Google's massive undistilled Gemini models running on its custom Private Cloud Compute infrastructure, which runs on M-series Mac chips. When the smarter Siri rolls out, it will probably route more complex tasks to Google's cloud infrastructure instead of Apple's, but it won't be running on Google TPUs. Apple has reportedly signed a deal with Nvidia to use its Confidential Computing platform for this purpose. Confidential Computing keeps data encrypted on Nvidia GPUs while it's being processed in the cloud, which could help Apple claim it's still sensitive to user privacy concerns. It might even retain its own Private Cloud Compute branding for the system. The iPhone probably won't tell you which version of Gemini is handling individual Siri requests. Device makers designing hybrid systems that rely on local and cloud-based AI like to talk about making the experience feel "seamless." There might be clues, though. We're all familiar with the sluggishness of big AI models, which can churn for a long time while they generate tokens. Nvidia's fully encrypted Confidential Compute does slow processing compared to other AI options. Users may find it more noticeable when Siri has to talk to a remote server, but local AI will only get you so far when the best models can only run on multi-million-dollar servers.
[2]
New details on Apple-Google AI deal revealed, including Nvidia chips: report
Apple's big unveiling of iOS 27, the new Siri, and other WWDC reveals are little more than one week away. And today, a new report has fresh details on how Apple's partnership with Google for AI features is being implemented behind the scenes. Report details how Apple plans to implement new AI features behind-the-scenes Today The Information has a wide-ranging report from Aaron Tilley on what to expect from Apple's AI announcements at WWDC. It focuses on the core technologies behind the features Apple will announce, not the features themselves. The report stresses that Apple will continue touting on-device processing as a priority for its next wave of AI features. It says, "Apple is using a version of Google's large Gemini model to train a smaller version of the model that can run locally on Apple devices, a process known as distillation." Tilley adds: Apple is also on the lookout to acquire smaller companies that can assist in the effort of shrinking down AI models to run on its devices, people familiar with the company said. One such company it has considered acquiring is Liquid AI, a Cambridge, Mass.-based startup specializing in running AI locally on devices, said people familiar with Apple's strategy. However, many AI queries are expected to still need cloud support. That's because the full Gemini model provided by Google has "trillions of parameters" and "requires so much computing horsepower that Apple has struggled to get it to work on its own internal server infrastructure, called Private Cloud Compute." The solution reportedly involves turning to Google Cloud and Nvidia's AI chips. some user queries to a new version of Siri will run in Google Cloud on a licensed version of the search giant's Gemini model. Apple recently approved the use of a privacy technology from Nvidia in that setting, suggesting it will use Nvidia AI chips for at least some of its computing needs in Google Cloud, according to people familiar with the matter...Confidential compute is a security feature inside Nvidia graphics processing units that encrypts data and AI models as they are being processed. When enabled, it slightly slows down the processing of AI queries in the cloud, but it could help Apple keep its promises about protecting users' privacy. Tilley says that Apple's decision to use Nvidia's confidential compute system is very fresh, happening "in recent weeks." And the company continues seeking more ways to handle AI features in the cloud while still upholding strong privacy protections. On that note, Tilley says Apple is expected to continue using the 'Private Cloud Compute' branding for its next wave of Apple Intelligence features, even though they will no longer run exclusively on Apple's own servers. What are your takeaways from this new report on Apple's plans for Apple Intelligence and its Google deal? Let us know in the comments.
[3]
Report: Apple Plans to Make On-Device AI a Key WWDC Focus
Apple reportedly plans to use next month's Worldwide Developers Conference (WWDC) to highlight its on-device AI capabilities as a competitive advantage, leaning on 15 years of custom silicon expertise to make the case for running AI models locally rather than in the cloud. People familiar with Apple's plans speaking to The Information say the company is expected to showcase how the chips designed for iPhones, Apple Watches, and Macs give it an edge in processing AI queries directly on devices. While cloud-based processing will remain necessary for complex queries, Apple will position local inference as a privacy-preserving, cost-saving alternative to the massive data center buildouts its rivals have pursued. As part of its agreement with Google, Apple is apparently set to use a large version of Google's Gemini model to train a smaller, distilled version capable of running locally on Apple hardware. Apple is also said to be scouting acquisitions to help advance its model-shrinking work, with one company it has reportedly considered being Liquid AI, a Massachusetts startup focused on running AI locally on devices. Some queries will still require cloud processing. Apple is believed to have approved the use of Nvidia's confidential compute technology within Google Cloud to handle processing of the larger Gemini-based model. The security feature encrypts data and AI models during processing, adding a modest performance cost but offering stronger privacy protections. The arrangement represents a noticeable departure from Apple's original Apple Intelligence announcement, in which the company said all cloud-bound queries would be handled exclusively by its own Private Cloud Compute infrastructure running on Apple silicon. Apple is likely to retain the Private Cloud Compute branding despite the change, people familiar with the partnership told The Information. There are also said to be material limits to how far Apple can push on-device processing. Google's full Gemini model runs into the trillions of parameters, and The Information claims that Apple has struggled to run it on its own Private Cloud Compute infrastructure, which uses the same Apple silicon chips found in Mac computers. Apple Intelligence was first announced at WWDC 2024, but the rollout has been hampered by a tepid response to initial features and a protracted delay to the more personal version of Siri. Apple is now expected to use WWDC 2026, which runs from June 8 to reframe the narrative, reintroduce the delayed features, and debut new ones.
Share
Copy Link
Apple is working to distill Google's trillion-parameter Gemini AI models to run locally on iPhones, but faces technical challenges that will force the company to rely heavily on Google Cloud and Nvidia's encrypted processing. The hybrid approach marks a shift from Apple's privacy-first stance, as the enhanced Siri will process complex queries on remote servers rather than exclusively on Apple's own infrastructure.
Apple is preparing to unveil a significantly enhanced Siri at WWDC, but the implementation reveals a notable departure from the company's long-standing privacy commitments. The Apple Google AI deal, first announced in 2024, will bring Google Gemini to iPhones through a hybrid AI approach that splits processing between local devices and cloud infrastructure
1
. Despite Apple's repeated emphasis on local AI processing as a privacy advantage, technical limitations are forcing the company to lean heavily on external cloud services for the Gemini-infused assistant.
Source: Ars Technica
The challenge stems from the sheer scale of modern AI models. Google's latest Gemini models feature trillions of parameters, while on-device AI models running on smartphones typically contain only a few billion parameters
1
. This massive gap in capability means that even Apple's custom silicon, optimized through 15 years of development, cannot handle the full conversational assistant experience locally.Apple has been working to create a distilled version of Gemini that can run on its devices using a process where a smaller, less resource-intensive model learns to mimic Google's giant cloud-based systems
1
. While Google offers Gemini Nano for mobile devices, these versions are designed for contextual features rather than full conversational interactions. On Android, Google doesn't attempt to run conversational Gemini locally at all, routing all queries directly to the cloud1
.The situation becomes more complex when examining Apple's infrastructure challenges. The company has struggled to get Google's massive undistilled Gemini models running on its Private Cloud Compute infrastructure, which operates on M-series Mac chips
1
. This technical roadblock has pushed Apple toward an unexpected solution involving its competitors.
Source: 9to5Mac
To maintain some semblance of its privacy promises, Apple recently approved the use of Nvidia's confidential computing technology within Google Cloud
2
. This decision, made in recent weeks, means that cloud-based AI queries will be processed on Nvidia graphics processing units rather than Apple's own servers or Google TPUs1
.Confidential computing keeps data encrypted on Nvidia GPUs while being processed in the cloud, though it does slightly slow down the processing of AI queries
2
. Apple is expected to retain its Private Cloud Compute branding for the system despite the fundamental shift away from exclusively using Apple silicon for cloud processing2
.Related Stories
Apple continues searching for ways to improve its on-device AI capabilities. The company is actively seeking to acquire smaller firms that can assist in model-shrinking efforts, with Liquid AI, a Cambridge, Massachusetts-based startup specializing in running AI locally on devices, among the companies Apple has considered
2
3
.Source: MacRumors
At WWDC, Apple is expected to emphasize its custom silicon expertise and position local inference as a privacy-preserving, cost-saving alternative to massive data center buildouts pursued by rivals
3
. The company will showcase how chips designed for iPhones, Apple Watches, and Macs provide advantages in processing AI queries directly on devices, even as it acknowledges that complex queries will still require cloud processing3
.The rollout of Apple Intelligence has faced delays since its initial announcement at WWDC 2024, with a tepid response to early features and protracted delays to the more personal version of Siri. Users should watch for how Apple balances its messaging around privacy and on-device processing against the reality of cloud-based AI queries handled by Google and Nvidia infrastructure. The seamless experience Apple promises may come with performance trade-offs, as Nvidia's encrypted processing adds latency compared to other AI options
1
.Summarized by
Navi
[1]
12 Jan 2026•Technology

25 Mar 2026•Technology

30 Jan 2026•Technology

1
Policy and Regulation

2
Policy and Regulation

3
Technology
