Apple turns to Google Gemini and Nvidia chips as on-device AI struggles with trillion-parameter models

Reviewed byNidhi Govil

3 Sources

Share

Apple is working to distill Google's trillion-parameter Gemini AI models to run locally on iPhones, but faces technical challenges that will force the company to rely heavily on Google Cloud and Nvidia's encrypted processing. The hybrid approach marks a shift from Apple's privacy-first stance, as the enhanced Siri will process complex queries on remote servers rather than exclusively on Apple's own infrastructure.

Apple AI Shifts Strategy with Google Gemini Partnership

Apple is preparing to unveil a significantly enhanced Siri at WWDC, but the implementation reveals a notable departure from the company's long-standing privacy commitments. The Apple Google AI deal, first announced in 2024, will bring Google Gemini to iPhones through a hybrid AI approach that splits processing between local devices and cloud infrastructure

1

. Despite Apple's repeated emphasis on local AI processing as a privacy advantage, technical limitations are forcing the company to lean heavily on external cloud services for the Gemini-infused assistant.

Source: Ars Technica

Source: Ars Technica

The challenge stems from the sheer scale of modern AI models. Google's latest Gemini models feature trillions of parameters, while on-device AI models running on smartphones typically contain only a few billion parameters

1

. This massive gap in capability means that even Apple's custom silicon, optimized through 15 years of development, cannot handle the full conversational assistant experience locally.

Technical Hurdles Force Cloud Reliance for Gemini AI on iPhone

Apple has been working to create a distilled version of Gemini that can run on its devices using a process where a smaller, less resource-intensive model learns to mimic Google's giant cloud-based systems

1

. While Google offers Gemini Nano for mobile devices, these versions are designed for contextual features rather than full conversational interactions. On Android, Google doesn't attempt to run conversational Gemini locally at all, routing all queries directly to the cloud

1

.

The situation becomes more complex when examining Apple's infrastructure challenges. The company has struggled to get Google's massive undistilled Gemini models running on its Private Cloud Compute infrastructure, which operates on M-series Mac chips

1

. This technical roadblock has pushed Apple toward an unexpected solution involving its competitors.

Source: 9to5Mac

Source: 9to5Mac

Nvidia Confidential Computing Addresses User Privacy Concerns

To maintain some semblance of its privacy promises, Apple recently approved the use of Nvidia's confidential computing technology within Google Cloud

2

. This decision, made in recent weeks, means that cloud-based AI queries will be processed on Nvidia graphics processing units rather than Apple's own servers or Google TPUs

1

.

Confidential computing keeps data encrypted on Nvidia GPUs while being processed in the cloud, though it does slightly slow down the processing of AI queries

2

. Apple is expected to retain its Private Cloud Compute branding for the system despite the fundamental shift away from exclusively using Apple silicon for cloud processing

2

.

Model-Shrinking Efforts and Potential Acquisitions

Apple continues searching for ways to improve its on-device AI capabilities. The company is actively seeking to acquire smaller firms that can assist in model-shrinking efforts, with Liquid AI, a Cambridge, Massachusetts-based startup specializing in running AI locally on devices, among the companies Apple has considered

2

3

.

Source: MacRumors

Source: MacRumors

At WWDC, Apple is expected to emphasize its custom silicon expertise and position local inference as a privacy-preserving, cost-saving alternative to massive data center buildouts pursued by rivals

3

. The company will showcase how chips designed for iPhones, Apple Watches, and Macs provide advantages in processing AI queries directly on devices, even as it acknowledges that complex queries will still require cloud processing

3

.

The rollout of Apple Intelligence has faced delays since its initial announcement at WWDC 2024, with a tepid response to early features and protracted delays to the more personal version of Siri. Users should watch for how Apple balances its messaging around privacy and on-device processing against the reality of cloud-based AI queries handled by Google and Nvidia infrastructure. The seamless experience Apple promises may come with performance trade-offs, as Nvidia's encrypted processing adds latency compared to other AI options

1

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved