5 Sources
[1]
Apple reportedly trying to distill Google's multi-trillion-parameter Gemini AI to run on iPhone
It's impossible to totally avoid generative AI when interacting with technology anymore, but Apple has a bit less of it. That's not entirely by choice, though. The iPhone maker has delayed the AI-enhanced Siri multiple times since first promising it in 2024, but a deal with Google will merge the iconic assistant with Gemini later this year. As we approach WWDC, Apple has been working to bring big AI smarts to the modest processing environment of a smartphone. Apple fans may not like the outcome, though. Apple has long crowed about the privacy value of running AI locally, but a new report suggests that despite Apple's best efforts, the iPhone's Gemini makeover will lean heavily on Google and Nvidia in the cloud. The Information reports that Apple's Gemini-infused Siri will run both on-device and in the cloud, an apparent reversal of its privacy-focused preference for local AI. With every new chip announcement, we hear about how the silicon has been optimized for AI -- even Apple does this with its focus on Neural Engine upgrades. You may think from the grandiose language that smartphones are equipped to handle beefy AI models, but that's not necessarily the case. In fact, the GPUs in most phones can process more AI tokens than the AI-focused NPUs. Components like Apple's Neural Engine are designed for contextual, efficient AI processing. Even if phones had faster AI processing, they lack the RAM to keep enormous models in memory. Even the largest AI models are still middling assistants, and that makes local AI very challenging. The AI models that run on phones are physically smaller, featuring at most a few billion parameters. Compare that to Google's latest Gemini models, which have trillions of parameters, The Information reports. On-device AI models are also "quantized" to run at lower precision, making them faster but affecting the accuracy of token generation. This all adds up to AIs that feel less smart than their cloud brethren, and even big cloud-based models can be pretty dumb sometimes. The amazing, shrinking Gemini Google does have versions of Gemini optimized for mobile devices, which it calls Gemini Nano. However, these are designed for powering contextual features like Magic Cue and audio summarization. Siri, on the other hand, is supposed to be a conversational assistant -- you talk to it and it does things. That's a different experience that requires a different kind of model. On Android, Google doesn't even bother trying to do that locally. Talking to Gemini always goes straight to the cloud. After inking the Google deal, Apple apparently got to work distilling Google's giant cloud-based Gemini models. Distillation is a process in which a small, less resource-intensive model learns to mimic a large, expensive one. With enough time, this can reliably transfer useful capabilities while pruning less important weights from the model. That may enable Siri to handle some tasks with private local compute, but a cloud component looks inevitable. Processing users' AI data in the cloud could be a problem for Apple. At WWDC, the company will probably promote its years of experience designing chips and how well that positions it for AI. However, The Information claims that Apple has struggled to even get Google's massive undistilled Gemini models running on its custom Private Cloud Compute infrastructure, which runs on M-series Mac chips. When the smarter Siri rolls out, it will probably route more complex tasks to Google's cloud infrastructure instead of Apple's, but it won't be running on Google TPUs. Apple has reportedly signed a deal with Nvidia to use its Confidential Computing platform for this purpose. Confidential Computing keeps data encrypted on Nvidia GPUs while it's being processed in the cloud, which could help Apple claim it's still sensitive to user privacy concerns. It might even retain its own Private Cloud Compute branding for the system. The iPhone probably won't tell you which version of Gemini is handling individual Siri requests. Device makers designing hybrid systems that rely on local and cloud-based AI like to talk about making the experience feel "seamless." There might be clues, though. We're all familiar with the sluggishness of big AI models, which can churn for a long time while they generate tokens. Nvidia's fully encrypted Confidential Compute does slow processing compared to other AI options. Users may find it more noticeable when Siri has to talk to a remote server, but local AI will only get you so far when the best models can only run on multi-million-dollar servers.
[2]
Apple may need Google's Gemini to make Siri smart enough to compete
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Forward-looking: Apple's push to make Siri more capable is starting to look less like a purely in-house effort and more like a concession to the realities of modern AI. To close the gap, the company is expected to split Siri's workload between on-device processing and the cloud, including Google's Gemini models. Apple has spent years emphasizing the privacy benefits of keeping computation on-device. Its custom silicon, including the Neural Engine, has been steadily tuned for machine learning workloads. But even with those gains, phones remain limited by memory and processing ceilings. The largest AI models now operate at a scale that simply doesn't fit within those constraints. Smaller models designed for local use can help, but they come with trade-offs. On-device systems typically run with only a few billion parameters and are often compressed using techniques like quantization to improve speed and efficiency. That makes them usable on a phone, but it also reduces accuracy and depth. In practice, they tend to feel less capable than their cloud-based counterparts, especially in open-ended conversations. That gap is part of what Apple is now trying to bridge. After striking a deal with Google, Apple reportedly began working on distilling Gemini's larger models into smaller versions that could run on the iPhone. Distillation allows a compact model to mimic the behavior of a much larger one, capturing useful patterns without the full computational load. It's a way to bring some level of advanced AI onto the device, even if it cannot match the original model's performance. There are limits to how far that approach can go. Google itself doesn't attempt to run its full conversational Gemini experience locally on Android. Instead, those interactions are routed to the cloud, where far more powerful hardware can handle them. Apple appears to be heading in a similar direction, even if it frames the experience differently. According to The Information, more complex Siri requests will likely be processed off-device, potentially using Google's infrastructure. At the same time, Apple is working to maintain some control over how that data is handled. The company has reportedly partnered with Nvidia to use its Confidential Computing platform, which keeps data encrypted even while it is being processed on cloud GPUs. That setup could allow Apple to continue emphasizing privacy, even as more user data leaves the device. Whether users notice the difference may come down to performance. Cloud-based AI systems, especially those running with added encryption layers, can introduce latency. In contrast, simpler on-device tasks should feel faster and more immediate. Apple is unlikely to surface those distinctions directly. Like other companies building hybrid AI systems, it is expected to present the experience as seamless, with requests automatically routed based on what the system determines is most efficient. Under the hood, though, the divide will remain. The broader issue is not unique to Apple. Across the industry, there is a growing gap between what edge devices can handle and what cutting-edge AI models require. Even as mobile chips improve, the most advanced systems still depend on massive infrastructure - clusters of GPUs and specialized hardware far beyond what any phone can support. Apple's evolving approach to Siri suggests that, for now, there is no clean way around that limitation. Keeping everything on-device may be ideal in theory, but in practice, delivering a competitive AI assistant increasingly means leaning on the cloud.
[3]
Google delivering on Gemini promises means Apple Intelligence can do the same
Tests to determine whether Google's agentic AI system Gemini Spark can deliver on the promises made on stage at last month's I/O event shows that, for the most part, it can. Since this is the model Apple will be using to power the new Siri, that's equally good news for the Cupertino company and its customers ... There was a huge argument a little over a year ago when Apple commentator John Gruber launched a blistering attack on the iPhone maker's failure to deliver on its new Siri promises. He said the company had done nothing more than show concept videos of Apple Intelligence features they couldn't actually demonstrate, even in carefully-controlled conditions. When Google introduced its agentic AI Gemini Spark, the company performed live demos on stage. That's a massive step forward from a video simulation, but there's still a sizeable gulf between a carefully-planned demo and real-life usage. The Verge's Jay Peters decided to try the demonstrated features for himself, turning them into real-life tasks on his own data, starting with this one. I asked Gemini to draft an email to my wife that compiles our total monthly average grocery spending in 2026. I figured this test would tell me a few things: Could Spark figure out who my wife was (without me giving Spark her name), could it determine where our budget spreadsheet is in Drive (which does not have "budget" in the file name), and could it actually draft an email in Gmail? People sometimes use the phrase "scarily good" in a colloquial way, but in this case, I think it applies rather literally. When I got the result from Spark shortly after, I really said: "Wow, that's actually nuts." Spark found my wife's email address, pulled the right information from our 2026 budget spreadsheet, grabbed the monthly grocery totals including the incomplete data from May (which still wasn't over when I ran the test), averaged the totals, and put it all in a draft email in my Gmail. The text of the email addressed my wife by her first name, even though her email address does not contain her first name. It even included a sign-off that we use just for each other. It didn't fully deliver on everything demonstrated, but he said that he was "floored by the results, though they were imperfect." The full piece discussing the other examples is definitely worth reading. What Google demonstrated - and Peters found to mostly work in real life - was exactly the type of features Apple showed off in its concept video. The new Siri may be taking an extremely long time to materialize, but this experience does suggest that it really will live up to Apple's promises, even if those promises are actually fulfilled by Google.
[4]
Report: Apple Plans to Make On-Device AI a Key WWDC Focus
Apple reportedly plans to use next month's Worldwide Developers Conference (WWDC) to highlight its on-device AI capabilities as a competitive advantage, leaning on 15 years of custom silicon expertise to make the case for running AI models locally rather than in the cloud. People familiar with Apple's plans speaking to The Information say the company is expected to showcase how the chips designed for iPhones, Apple Watches, and Macs give it an edge in processing AI queries directly on devices. While cloud-based processing will remain necessary for complex queries, Apple will position local inference as a privacy-preserving, cost-saving alternative to the massive data center buildouts its rivals have pursued. As part of its agreement with Google, Apple is apparently set to use a large version of Google's Gemini model to train a smaller, distilled version capable of running locally on Apple hardware. Apple is also said to be scouting acquisitions to help advance its model-shrinking work, with one company it has reportedly considered being Liquid AI, a Massachusetts startup focused on running AI locally on devices. Some queries will still require cloud processing. Apple is believed to have approved the use of Nvidia's confidential compute technology within Google Cloud to handle processing of the larger Gemini-based model. The security feature encrypts data and AI models during processing, adding a modest performance cost but offering stronger privacy protections. The arrangement represents a noticeable departure from Apple's original Apple Intelligence announcement, in which the company said all cloud-bound queries would be handled exclusively by its own Private Cloud Compute infrastructure running on Apple silicon. Apple is likely to retain the Private Cloud Compute branding despite the change, people familiar with the partnership told The Information. There are also said to be material limits to how far Apple can push on-device processing. Google's full Gemini model runs into the trillions of parameters, and The Information claims that Apple has struggled to run it on its own Private Cloud Compute infrastructure, which uses the same Apple silicon chips found in Mac computers. Apple Intelligence was first announced at WWDC 2024, but the rollout has been hampered by a tepid response to initial features and a protracted delay to the more personal version of Siri. Apple is now expected to use WWDC 2026, which runs from June 8 to reframe the narrative, reintroduce the delayed features, and debut new ones.
[5]
New details on Apple-Google AI deal revealed, including Nvidia chips: report
Apple's big unveiling of iOS 27, the new Siri, and other WWDC reveals are little more than one week away. And today, a new report has fresh details on how Apple's partnership with Google for AI features is being implemented behind the scenes. Report details how Apple plans to implement new AI features behind-the-scenes Today The Information has a wide-ranging report from Aaron Tilley on what to expect from Apple's AI announcements at WWDC. It focuses on the core technologies behind the features Apple will announce, not the features themselves. The report stresses that Apple will continue touting on-device processing as a priority for its next wave of AI features. It says, "Apple is using a version of Google's large Gemini model to train a smaller version of the model that can run locally on Apple devices, a process known as distillation." Tilley adds: Apple is also on the lookout to acquire smaller companies that can assist in the effort of shrinking down AI models to run on its devices, people familiar with the company said. One such company it has considered acquiring is Liquid AI, a Cambridge, Mass.-based startup specializing in running AI locally on devices, said people familiar with Apple's strategy. However, many AI queries are expected to still need cloud support. That's because the full Gemini model provided by Google has "trillions of parameters" and "requires so much computing horsepower that Apple has struggled to get it to work on its own internal server infrastructure, called Private Cloud Compute." The solution reportedly involves turning to Google Cloud and Nvidia's AI chips. some user queries to a new version of Siri will run in Google Cloud on a licensed version of the search giant's Gemini model. Apple recently approved the use of a privacy technology from Nvidia in that setting, suggesting it will use Nvidia AI chips for at least some of its computing needs in Google Cloud, according to people familiar with the matter...Confidential compute is a security feature inside Nvidia graphics processing units that encrypts data and AI models as they are being processed. When enabled, it slightly slows down the processing of AI queries in the cloud, but it could help Apple keep its promises about protecting users' privacy. Tilley says that Apple's decision to use Nvidia's confidential compute system is very fresh, happening "in recent weeks." And the company continues seeking more ways to handle AI features in the cloud while still upholding strong privacy protections. On that note, Tilley says Apple is expected to continue using the 'Private Cloud Compute' branding for its next wave of Apple Intelligence features, even though they will no longer run exclusively on Apple's own servers. What are your takeaways from this new report on Apple's plans for Apple Intelligence and its Google deal? Let us know in the comments.
Share
Copy Link
Apple is working to distill Google's multi-trillion-parameter Gemini model to run locally on iPhones, but complex queries will route to Google Cloud using Nvidia chips. The move marks a significant shift from Apple's privacy-focused, on-device AI stance as the company prepares to unveil its enhanced Siri at WWDC, blending local processing with cloud capabilities.
Apple is preparing to merge its iconic Siri assistant with Google Gemini later this year, marking a fundamental shift in how the iPhone maker approaches artificial intelligence. According to reports from The Information, the Apple Google partnership will result in a hybrid system that processes AI queries both on-device and in the cloud, departing from Apple's longstanding emphasis on local processing for user privacy
1
. The company has been working to distill Google's massive Gemini models, which feature trillions of parameters, into smaller versions capable of running on Apple's custom silicon. This model distillation process allows a compact model to mimic the behavior of large language models while pruning less critical weights, potentially enabling Siri to handle some tasks with private local compute1
.
Source: Ars Technica
Despite Apple's 15 years of custom silicon expertise and continuous Neural Engine upgrades, smartphones remain fundamentally constrained by memory and processing capabilities. On-device AI models typically feature only a few billion parameters and are quantized to run at lower precision, making them faster but affecting token generation accuracy
1
2
. Even Google doesn't attempt to run its full conversational Gemini experience locally on Android, instead routing those interactions to the cloud where more powerful hardware handles them1
. Apple has reportedly struggled to get Google's undistilled Gemini models running on its Private Cloud Compute infrastructure, which operates on M-series Mac chips, highlighting the massive gap between what edge devices can handle and what cutting-edge AI models require1
4
.To maintain its commitment to user privacy while leveraging cloud-based AI, Apple has recently approved the use of Nvidia's Confidential Computing platform within Google Cloud. This technology keeps data encrypted on Nvidia GPUs while being processed in the cloud, allowing Apple to claim sensitivity to privacy concerns even as more user data leaves the device
1
5
. The decision to use Nvidia's confidential computing system happened in recent weeks, according to people familiar with the matter5
. While this security feature adds a modest performance cost and slightly slows down AI query processing, it offers stronger privacy protections than standard cloud processing4
5
. Apple is expected to retain its Private Cloud Compute branding despite the infrastructure change, even though queries will no longer be handled exclusively by Apple's own servers1
5
.Related Stories

Source: 9to5Mac
At the upcoming WWDC, Apple plans to highlight its on-device AI capabilities as a competitive advantage, positioning local inference as a privacy-preserving, cost-saving alternative to the massive data center buildouts its rivals have pursued
4
. The company will showcase how chips designed for iPhones, Apple Watches, and Macs give it an edge in processing AI queries directly on devices, while acknowledging that cloud-based processing remains necessary for complex queries4
. The hybrid AI approach means users likely won't know which version of Gemini is handling individual Siri requests, as device makers building such systems aim to make the experience feel seamless1
. However, users may notice latency differences, as Nvidia's fully encrypted confidential compute slows processing compared to other AI options, and cloud-based requests will naturally feel less immediate than local processing1
2
.Apple Intelligence was first announced at WWDC 2024, but the rollout has been hampered by tepid response to initial features and protracted delays to the more personal version of Siri
4
. Real-world testing of Google's agentic AI system Gemini Spark, which will power the new Siri, suggests the technology can largely deliver on promises made during demonstrations. The Verge's Jay Peters tested features like drafting emails that compile budget data from spreadsheets, with results he described as "scarily good" and "actually nuts," though imperfect3
. This real-life performance indicates the new Siri may finally live up to Apple's promises, even if those promises are fulfilled by Google3
. Apple is also reportedly scouting acquisitions to advance its model-shrinking work, with one company under consideration being Liquid AI, a Massachusetts startup focused on running AI locally on devices4
5
.
Source: 9to5Mac
Summarized by
Navi
[1]
09 Jun 2026•Technology

12 Jan 2026•Technology

23 Aug 2025•Technology

1
Policy and Regulation

2
Policy and Regulation

3
Business and Economy
