Apple's Innovative Approach to AI Improvement: Balancing Privacy and Performance

22 Sources

[1]

Apple details how it plans to improve its AI models by privately analyzing user data | TechCrunch

In the wake of criticism over the underwhelming performance of its AI products, especially in areas like notification summaries, Apple on Monday detailed how it is trying to improve its AI models by analyzing user data privately with the aid of synthetic data. Using an approach called "differential privacy," the company said it would first generate synthetic data and then poll users' devices (provided they've opted-in to share device analytics with Apple) with snippets of the generated synthetic data to compare how accurate its models are, and subsequently improve them. "Synthetic data are created to mimic the format and important properties of user data, but do not contain any actual user generated content," the company wrote in a blog post. "To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics [...] We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length." The company said these embeddings are then sent to a small number of user devices that have opted in to Device Analytics, and the devices then compare them with a sample of emails to tell Apple which embeddings are most accurate. The company said it is using this approach to improve its Genmoji models, and would in the future use synthetic data for Image Playground, Image Wand, Memories Creation and Writing Tools as well as Visual Intelligence. Apple said it would also poll users who opt in to share device analytics with synthetic data to improve email summaries.

[2]

CNET

How Apple Will Analyze Your Data to Train Its AI (While Protecting Your Privacy)

Samantha Kelly is a freelance writer with a focus on consumer technology, AI, social media, Big Tech, emerging trends and how they impact our everyday lives. Her work has been featured on CNN, NBC, NPR, the BBC, Mashable and more. Apple said it will begin analyzing on-device user data as part of a broader push to strengthen its AI platform. In a blog post, the company outlined a new approach designed to expand its AI capabilities while safeguarding user privacy, especially as competitors like OpenAI and Google advance more quickly with fewer restrictions. Apple said it will train its AI models using synthetic data, known as information that mimics the format and characteristics of real-world messages without including any actual user-generated content. "When creating synthetic data, our goal is to produce synthetic sentences or emails that are similar enough in topic or style to the real thing to help improve our models for summarization, but without Apple collecting emails from the device," the company said in a blog post. For Apple Intelligence features including summarization and writing tools that handle longer content, the company said its usual methods, like those used for short-form prompts in Genmoji, aren't effective. Instead, its new approach will generate a large set of synthetic emails on various topics - such as, "Want to play tennis tomorrow?" - without referencing any actual user data. Each message is converted into what Apple calls an "embedding," a numerical summary capturing attributes including topic and length. The embeddings are sent only to opted-in devices, which then compare them to a small, private sample of recent user emails stored locally. "This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy," the company said. Apple said it will start using this approach "soon" with users who opt in to sharing device analytics. Jason Hong, a computer science professor at Carnegie Mellon University, said this type of "differential privacy" is a sophisticated approach for analyzing and using data aggregated from large numbers of people. "Apple could have taken the easy approach of just taking everyone's data and using it to build their AI models," he said. "Instead, Apple chose to deploy these differential privacy approaches for Apple Intelligence, and they should be applauded for putting their customers' privacy first." However, he said there will likely be tradeoffs, including the possibility that Apple Intelligence may not be as effective as some competitors because rivals will have more access to people's data. He also said Apple's models may likely be harder to debug and might take more battery power to deploy.

[3]

The Verge

Apple's complicated plan to improve its AI while protecting privacy

Emma Roth is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO. Apple says it's found a way to make its AI models better without training on its users' data or even copying it from their iPhones and Macs. In a blog post first reported on by Bloomberg, the company outlined its plans to have devices compare a synthetic dataset to samples of recent emails or messages from users who have opted into its Device Analytics program. Apple devices will be able to determine which synthetic inputs are closest to real samples, which they will relay to the company by sending "only a signal indicating which of the variants is closest to the sampled data." That way, according to Apple, it doesn't access user data, and the data never leaves the device. Apple will then use the most frequently picked fake samples to improve its AI text outputs, such as email summaries. Currently, Apple trains its AI models on synthetic data only, potentially resulting in less helpful responses, according to Bloomberg's Mark Gurman. Apple has struggled with the launch of its flagship Apple Intelligence features, as it pushed back the launch of some capabilities and replaced the head of its Siri team. But now, Apple is trying to turn things around by introducing its new AI training system in a beta version of iOS and iPadOS 18.5 and macOS 15.5, according to Gurman. Apple has been talking up its use of a method called differential privacy to keep user data private since at least 2016 with the launch of iOS 10 and has already used it to improve the AI-powered Genmoji feature. This also applies to the company's new AI training plans as well, as Apple says that introducing randomized information into a broader dataset will help prevent it from linking data to any one person.

[4]

ZDNet

How Apple plans to train its AI on your data without sacrificing your privacy

Most AI providers try to enhance their products by training them with both public information and user data. However, the latter method puts a privacy-conscious company like Apple in a difficult position. How can it improve its Apple Intelligence technology without compromising the privacy of its users? It's a tough challenge, but the company believes it has found a solution. OpenAI, Google, Microsoft, and Meta train their products partly by analyzing your chats. The goal is to improve the reliability and accuracy of their AIs by scraping data from real conversations. While you can generally opt out of this type of data sharing, the process for doing so varies for each product. This means the responsibility falls on you to figure out how to sever the connection. Also: Will synthetic data derail generative AI's momentum or be the breakthrough we need? Apple has always prided itself on being more privacy-focused than its tech rivals. To that end, the company has relied on something called synthetic data to train and improve its AI products. Created using Apple's own large language model (LLM), synthetic data attempts to mimic the essence of real data. Also: Want AI to work for your business? Then privacy needs to come first For example, the AI may create a synthetic email that is similar in topic and style to an actual message. The objective is to teach the AI how to summarize that email, a feature already built into Apple Mail. The problem with synthetic data is that it can't replicate the special human touch found in real-world content. This limitation has led Apple to adopt a different approach, known as differential privacy. As described by Apple in a blog post published Monday, differential privacy combines synthetic data with real data. Here's how it works. Also: Apple's AI doctor will be ready to see you next spring Let's say Apple wants to teach its AI how to summarize an email. The company starts by creating a large number of synthetic emails on various topics. Apple then generates an embedding for each synthetic message to capture key elements such as language, topic, and length. These embeddings are sent to Apple users who have opted into analytics sharing on their devices. Each device selects a small sample of actual user emails and generates its own embeddings. The device then determines which synthetic embeddings most closely match the language, topic, and other characteristics of the user emails. Through differential privacy, Apple identifies which synthetic embeddings were the most similar. In the next step, the company can curate these samples to further refine the data or begin using them to train its AI. Also: Forget the new Siri: Here's the advanced AI I use on my iPhone instead As one example provided by Apple, imagine that an email about playing tennis is one of the top embeddings. A similar message is generated by replacing "tennis" with "soccer" or another sport and added to the list for curation or training. Altering the topic and other elements of each email helps the AI learn how to create better summaries for a wider variety of messages. If you're wondering how this process actually protects your privacy, device analytics sharing is turned off by default -- so only those who opt in are involved in data training. You can easily view, opt in, or opt out on any Apple device. Go to Settings (System Settings on a Mac) and select Privacy & Security. Scroll down to the bottom of the screen and tap the setting for Analytics & Improvements. You will now see the name and description of each option so you can decide which data, if any, you want to share. The full range of options will be available with iOS/iPadOS 18.5 and MacOS 15.5. Also: How to turn on Siri's new glow effect on iOS 18 - and other settings you should tweak Furthermore, the data from the sampled user emails never leaves the device and is never shared with Apple. The device of someone who has opted in to analytics sharing sends a signal to the company indicating which synthetic emails are closest to the actual user emails. However, this signal does not reference an IP address, Apple account, or any other data associated with the user. Apple has already been using differential privacy for its Genmoji feature, which uses AI to create custom emoji based on your descriptions. In this case, the company has been able to identify popular prompts and patterns without linking them to specific users. Looking ahead, Apple said it plans to expand its use of differential privacy to other AI features, including Image Playground, Image Wand, Memories Creation, Writing Tools, and Visual Intelligence. Also: How I set ChatGPT as Siri's backup - and what else it can do on my iPhone "Building on our many years of experience using techniques like differential privacy, as well as new techniques like synthetic data generation, we are able to improve Apple Intelligence features while protecting user privacy for users who opt in to the device analytics program," Apple said in its blog post. "These techniques allow Apple to understand overall trends without learning information about any individual, such as what prompts they use or the content of their emails." Get the morning's top stories in your inbox each day with our Tech Today newsletter.

[5]

PC Magazine

Apple's Going to Take a Peek at Your On-Device Data to Improve Its AI

How do you improve your AI without scraping tons of personal data? Apple's approach, at least for text generation, is using a combination of "synthetic data" and on-device messages. To get access to those on-device messages, however, Apple needs permission, so it will use data from those who have opted in to Device Analytics to cross-check the accuracy of its synthetic data, without collecting any actual emails or text from iPhone owners. "Synthetic data are created to mimic the format and important properties of user data, but do not contain any actual user generated content," Apple says in a post on its Machine Learning Research site. Translation: Apple Intelligence makes up a bunch of emails. That works well enough on one specific topic. But "to improve our models we need to generate a set of many emails that cover topics that are most common in messages." Those fake messages are then checked against actual emails on someone's device to pull out the synthetic data that is most like something a human would write. "This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy," Apple says. According to Bloomberg, Apple will roll this out in an upcoming beta version of iOS and iPadOS 18.5 and macOS 15.5. You can see what type of analytics data you're sharing with Apple via Settings > Privacy & Security > Analytics & Improvements. "Only users who have opted-in to send Device Analytics information to Apple participate. The contents of the sampled emails never leave the device and are never shared with Apple," the company says. The company is under pressure to improve its Apple Intelligence features, which thus far have largely focused on writing tools and image tricks like Genmoji and Image Playground. Siri got a small upgrade last fall, but a larger overhaul has been delayed a few times. The rumor mill is all over the place, with some saying we'll get a glimpse of new Siri features in the fall and others saying we may have to wait until 2027. Apple's been hit with multiple lawsuits over those delays.

[6]

The Register

Apple will use made-up emails to train its AI

Apple, having starved its AI models of data by respecting customer privacy, plans to improve its chatbot suggestions by using made-up emails. The iGiant says it will soon start using synthetic data - that is, data generated by computers instead of actual humans - to improve email summaries generated by Apple Intelligence for those who have opted into Device Analytics. This ask-for-permission approach contrasts sharply with social media giant Meta, which recently said it will resume training its AI models on the posts produced by users in Europe unless they opt out. Apple is using an undisclosed large language model to invent email messages on various topics. As an example, the Mac daddy cites the message, "Would you like to play tennis tomorrow at 11:30AM?" By generating variations on this message using an AI model and converting these into embeddings - a vector math representation - Apple can then use a technique called differential privacy [PDF] to compare the synthetic embeddings to embeddings derived from actual email messages from opted-in users, without revealing the contents of the genuine messages. This helps make the training data as close to the real thing as possible. "Synthetic data are created to mimic the format and important properties of user data, but do not contain any actual user generated content," Apple explains in a post to its machine learning research site. "When creating synthetic data, our goal is to produce synthetic sentences or emails that are similar enough in topic or style to the real thing to help improve our models for summarization, but without Apple collecting emails from the device," it says."This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy." Synthetic data is widely used in AI training but has several disadvantages, including potential bias, incompleteness, inaccuracies, and model performance, among others. At the same time, it's private - it's highly unlikely that a model trained on invented information will emit valid personal data in response to a prompt. One hopes the LLM training Apple's AI isn't leaking personal info it may have picked up during its own training into Cupertino's neural networks. While Apple's approach has afforded customers a level of privacy only grudgingly granted by rivals, it has also denied the iPhone maker training data that might have made Apple Intelligence more competitive. The biz was sued last month for exaggerating its AI capabilities, and anecdotally, it appears there's room for improvement. Apple is already using this technique to improve text generation within email messages in its beta software. ®

[7]

TechSpot

Apple to analyze on-device data for AI training, vows to uphold user privacy

A hot potato: The idea of Apple analyzing data from users' devices to train its AI models isn't going to be welcomed by most people. Nevertheless, the company is taking this action as it looks to improve its Apple Intelligence services. However, Cupertino says its unique approach to this process will protect user privacy. Apple writes that its "differential privacy" approach works by first generating synthetic data to mimic the format and important properties of user data, such as emails. "When creating synthetic data, our goal is to produce synthetic sentences or emails that are similar enough in topic or style to the real thing to help improve our models for summarization, but without Apple collecting emails from the device," the company said in a post. The synthetic data is converted into what Apple calls an embedding, a numerical representation that contains key attributes such as language, topic, and length. Apple then randomly polls devices of users who have agreed to share information with the company. It sends the embeddings of its synthetic data to a small section of users who have opted-in to Device Analytics. Participating devices then select a sample of recent user emails and compute their embeddings. Each device decides which of the synthetic embeddings is closest to these samples. This determines how accurate Apple's models are, improving them if needed. Apple emphasizes that it does not collect emails or texts from users, and only sees commonly used prompts. Moreover, data from a device is not associated with an IP address or any ID that could be linked to an Apple Account. According to the post, differential privacy allows Apple to train its AI models to create better text outputs in features like email summaries, while protecting privacy. Apple is using differential privacy to improve its Genmoji models and will eventually use it for Image Playground, Image Wand, Memories Creation, Writing Tools, and Visual Intelligence. Apple will roll out the AI training system in an upcoming beta version of iOS and iPadOS 18.5 and macOS 15.5. The iPhone maker has long prided itself on being a bastion of user privacy. While this differential privacy approach is certainly better than AI companies simply scraping people's data, it could result in Apple Intelligence not being as effective as rival AI platforms.

[8]

9to5Mac

Apple details on-device Apple Intelligence training system using user data - 9to5Mac

Last month, Apple delayed the rollout of its more personal and powerful Siri features. As it looks to right the ship for future Apple Intelligence updates, Bloomberg highlights a shift that Apple is making in how it trains its artificial intelligence models. The report highlights a blog post from Apple's Machine Learning Research website, explaining how Apple generally uses synthetic data to train its AI models. There are limitations to this strategy, however, including the fact that it's hard for synthetic data to "understand trends" in features like summarization or writing tools that operate on longer sentences or entire email messages. To address this limitation, Apple highlights a new technology it will soon start using that compares the synthetic data to a small sample of recent user emails, but without compromising user privacy: To improve our models we need to generate a set of many emails that cover topics that are most common in messages. To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics. For example, we might create a synthetic message, "Would you like to play tennis tomorrow at 11:30AM?" This is done without any knowledge of individual user emails. We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length. These embeddings are then sent to a small number of user devices that have opted in to Device Analytics. Participating devices then select a small sample of recent user emails and compute their embeddings. Each device then decides which of the synthetic embeddings is closest to these samples. Using differential privacy, Apple can then learn the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device. These most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset. For example, if the message about playing tennis is one of the top embeddings, a similar message replacing "tennis" with "soccer" or another sport could be generated and added to the set for the next round of curation (see Figure 1). This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy. Apple explains that these techniques allow it to "understand overall trends, without learning information about any individual. Bloomberg says that Apple will roll out this new system in a future beta of iOS 18.5 and macOS 15.5.

[9]

MacRumors

Here's How Apple is Working to Improve Apple Intelligence

With its uncompromising focus on user privacy, Apple has faced challenges collecting enough data to train the large language models that power Apple Intelligence features and that will ultimately improve Siri. To improve Apple Intelligence, Apple has to come up with privacy preserving options for AI training, and some of the methods the company is using have been outlined in a new Machine Learning Research blog post. Basically, Apple needs user data to improve summarization, writing tools, and other Apple Intelligence features, but it doesn't want to collect data from individual users. So instead, Apple has worked out a way to understand usage trends using differential privacy and data that's not linked to any one person. Apple is creating synthetic data that is representative of aggregate trends in real user data, and it is using on-device detection to make comparisons, providing the company with insight without the need to access sensitive information. It works like this: Apple generates multiple synthetic emails on topics that are common in user emails, such as an invitation to play a game of tennis at 3:00 p.m. Apple then creates an "embedding" from that email with specific language, topic, and length info. Apple might create several embeddings with varying email length and information. Those embeddings are sent to a small number of iPhone users who have Device Analytics turned on, and the iPhones that receive the embeddings select a sample of actual user emails and compute embeddings for those actual emails. The synthetic embeddings that Apple created are compared to the embedding for the real email, and the user's iPhone decides which of the synthetic embeddings is closest to the actual sample. Apple then uses differential privacy to determine which of the synthetic embeddings are most commonly selected across all devices, so it knows how emails are most commonly worded without ever seeing user emails and without knowing which specific devices selected which embeddings as the most similar. Apple says that the most frequently selected synthetic embeddings it collects can be used to generate training or testing data, or can be used as examples for further data refinement. The process provides Apple with a way to improve the topics and language of synthetic emails, which in turn trains models to create better text outputs for email summaries and other features, all without violating user privacy. Apple does something similar for Genmoji, using differential privacy to identify popular prompts and prompt patterns that can be used to improve the image generation feature. Apple uses a technique to ensure that it only receives Genmoji prompts that have been used by hundreds of people, and nothing specific or unique that could identify an individual person. Apple can't see Genmoji associated with a personal device, and all signals that are relayed are anonymized and include random noise to hide user identity. Apple also doesn't link any data with an IP address or ID that could be associated with an Apple Account. With both of these methods, only users that have opted-in to send Device Analytics to Apple participate in the testing, so if you don't want to have your data used in this way, you can turn that option off.

[10]

Macworld

How will Apple improve its AI while protecting your privacy?

Apple uses Differential Privacy to learn trends about how its user base is using AI. With all the problems we've heard about Apple Intelligence lately-delayed Siri improvements, bad news notification summaries, unimpressive image generation, and more-you might wonder what Apple is planning to do to right the ship. Obviously new and improved models are important, and so in increased training, but Apple has a particularly hard time of this because its privacy policies are a lot more strict than other companies creating AI products. In a new post on Apple's Machine Learning Research site, the company explains a technique it will employ to help its AI be more relevant, more often, without training it on your personal data. Differential Privacy is a way to, as Apple puts it, "gain insight into what many Apple users are doing, while helping to preserve the privacy of individual users." Basically, whenever Apple collects data in a system like this, it first strips out any identifying information (device ID, IP address, and so on) and then slightly alters the data. When millions of users submit results, that "noise" cancels out. That's the Differential Privacy part: take enough samples with random noise and identifiers removed, and you can't possibly connect any particular bit of data with a user. It's a good way to, for example, get a good statistical sample of which emoji are picked most often, or which autocorrect word is used the most after a particular misspelling-collecting data on user preferences without actually being able to trace any particular data point back to any user, even if they wanted to. Apple can generate synthetic text that is representative of common prompts, then use those differential privacy techniques to find out which synthetic samples are selected by users most often. Or to determine which words and phrases are common in Genmoji prompts and which results the users are most likely to pick. The AI system could generate common sentences used in emails, for example, and then send multiple variants out to different users. Then, using differential privacy techniques, Apple can find out which ones are selected most frequently (while having no ability to know what any one individual chose). Apple has been using this technique for years to gather data meant to improve QuickType suggestions, emoji suggestions, lookup hints, and more. As anonymous as it is, it is still opt-in. Apple doesn't collect this type of data unless you affirmatively enable device analytics. Techniques like this are already being used to improve Genmoji, and in an upcoming update, they'll be used for Image Generation, Image Wand, Memories Creation, Writing Tools, and Visual Intelligence. A Bloomberg report says the new system will come in a beta update to iOS 18.5, iPadOS 18.5, and macOS 18.5 (the second beta was released today). Of course, this is just data gathering, and it will take weeks or months of data collection and retraining to measurably improve Apple Intelligence features.

[11]

Tom's Guide

Apple will soon train Apple Intelligence on select user data -- here's what you need to know

Apple has recently confirmed how it will start to use certain user data to help train its Apple Intelligence models. There's little doubt that Apple Intelligence has had a few issues lately, including delaying its Siri 2.0 feature launch. To help avoid similar issues in the future, Apple is introducing a change regarding how it trains its AI. This change was detailed in a recent blog post from Apple's Machine Learning Research website, via Bloomberg. The blog details how Apple trains its AI using synthetic data, but this method has limitations. This is because synthetic data struggles to understand trends in features like Summarization or Writing Tools. However, the new method detailed by Apple aims to compare its synthetic data with user data to help solve this issue. The process begins with Apple generating "a set of many emails that cover topics that are most common in messages. To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics." It is worth noting that Apple is adamant that these emails are not generated with any knowledge regarding individual user emails. The said data is then derived into a representation, which is called an embedding, that captures some of the key information in the messages. This includes things like language, topic and length. These embeddings are then "sent to a small number of user devices that have opted in to Device Analytics." You can find more information on Device Analytics on Apple's website. When users receive the data, they will then select a small number of their recent emails to compare them to. The device will then measure these selected emails against the embeddings to find which sample is closest. Apple will also use differential privacy to learn "the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device." According to the blog, the "most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset." This process will, according to Apple, allow them to "improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy." Only time will tell if this will improve Apple Intelligence to the degree that is needed to help Apple catch up with the best AI chatbots and AI assistants. However, it will likely require more than just better training to compete with Gemini 2.0 or ChatGPT.

[12]

TechRadar

Apple has a plan for improving Apple Intelligence, but it needs your help - and your data

Apple Intelligence has not had the best year so far, but if you think Apple is giving up, you're wrong. It has big plans and is moving forward with new model training strategies that could vastly improve its AI performance. However, the changes do involve a closer look at your data - if you opt-in. In a new technical paper from Apple's Machine Learning Research, "Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy," Apple outlined new plans for combining data analytics with user data and synthetic data generation to better train the models behind many of Apple Intelligence features. Up to now, Apple's been training its models on purely synthetic data, which tries to mimic what real data might be like, but there are limitations. In Genmoji's, for instance, Apple's use of synthetic data doesn't always point to how real users engage with the system. From the paper: "For example, understanding how our models perform when a user requests Genmoji that contain multiple entities (like "dinosaur in a cowboy hat") helps us improve the responses to those kinds of requests." Essentially, if users opt-in, the system can poll the device to see if it has seen a data segment. However, your phone doesn't respond with the data; instead, it sends back a noisy and anonymized signal, which is apparently enough for Apple's model to learn. The process is somewhat different for models that work with longer texts like Writing tools and Summarizations. In this case, Apple uses synthetic models, and then they send a representation of these synthetic models to users who have opted into data analytics. On the device, the system then performs a comparison that seems to compare these representations against samples of recent emails. "These most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset." It's complicated stuff. The key, though, is that Apple applies differential privacy to all the user data, which is the process of adding noise that makes it impossible to connect that data to a real user. Still, none of this works if you don't opt into Apple's Data Analytics, which usually happens when you first set up your iPhone, iPad, or MacBook. Doing so does not put your data or privacy at risk, but that training should lead to better models and, hopefully, a better Apple Intelligence experience on your iPhone and other Apple devices. It might also mean smarter and more sensible rewrites and summaries.

[13]

AppleInsider

On-device Apple Intelligence training methods seem to be based on a controversial technology

Apple Intelligence to be trained on anonymized user data on an opt-in basis On Monday, Apple shared its plans to allow users to opt into on-device Apple Intelligence training using Differential Privacy techniques that are incredibly similar to its failed CSAM detection system. Differential Privacy is a concept Apple embraced openly in 2016 with iOS 10. It is a privacy-preserving method of data collection that introduces noise to sample data to prevent the data collectors from figuring out where the data came from. According to a post on Apple's machine learning blog, Apple is working to implement Differential Privacy as a method to gather user data to train Apple Intelligence. The data is provided on an opt-in basis, anonymously, and in a way that can't be traced back to an individual user. The story was first covered by Bloomberg, which explained Apple's report on using synthetic data trained on real-world user information. However, it isn't as simple as grabbing user data off of an iPhone to analyze in a server farm. Instead, Apple will utilize a technique called Differential Privacy, which, if you've forgotten, is a system designed to introduce noise to data collection so individual data points cannot be traced back to the source. Apple takes it a step further by leaving user data on device -- only polling for accuracy and taking the poll results off of the user's device. These methods ensure that Apple's principles behind privacy and security are preserved. Users that opt into sharing device analytics will participate in this system, but none of their data will ever leave their iPhone. Differential Privacy is a concept Apple leaned on and developed since at least 2006, but didn't make a part of its public identity until 2016. It started as a way to learn how people used emojis, to find new words for local dictionaries, to power deep links within apps, and as a Notes search tool. Apple says that starting with iOS 18.5, Differential Privacy will be used to analyze user data and train specific Apple Intelligence systems starting with Genmoji. It will be able to identify patterns of common prompts people use so Apple can better train the AI and get better results for those prompts. Basically, Apple provides artificial prompts it believes are popular, like "dinosaur in a cowboy hat" and it looks for pattern matches in user data analytics. Because of artificially injected noise and a threshold of needing hundreds of fragment matches, there isn't any way to surface unique or individual-identifying prompts. Plus, these searches for fragments of prompts only result in a positive or negative poll, so no user data is derived from the analysis. Again, no data can be isolated and traced back to a single person or identifier. The same technique will be used for analyzing Image Playground, Image Wand, Memories Creation, and Writing Tools. These systems rely on short prompts, so the analysis can be limited to simple prompt pattern matching. Apple wants to take these methods further by implementing them for text generation. Since text generation for email and other systems results in much longer prompts, and likely, more private user data, Apple took extra steps. Apple is using recent research into developing synthetic data that can be used to represent aggregate trends in real user data. Of course, this is done without removing a single bit of text from the user's device. After generating synthetic emails that may represent real emails, they are compared to limited samples of recent user emails that have been computed into synthetic embeddings. The synthetic embeddings closest to the samples across many devices prove which synthetic data generated by Apple are most representative of real human communication. Once a pattern is found across devices, that synthetic data and pattern matching can be refined to work across different topics. The process enables Apple to train Apple Intelligence to produce better summaries and suggestions. Again, the Differential Privacy method of Apple Intelligence training is opt-in and takes place on-device. User data never leaves the device, and gathered polling results have noise introduced, so even while user data isn't present, individual results can't be tied back to a single identifier. If Apple's methods here ring any bells, it's because they are nearly identical to the methods the company planned to implement for CSAM detection. The system would have converted user photos into hashes that were compared to a database of hashes of known CSAM. That analysis would occur either on-device for local photos, or in iCloud photo storage. In either instance, Apple was able to perform the photo hash matching without ever looking at a user photo or removing a photo from the device or iCloud. When enough instances of potential positive results for CSAM hash matches occurred on a single device, it would trigger a system that sent affected images to be analyzed by humans. If the discovered images were CSAM, the authorities were notified. The CSAM detection system preserved user privacy, data encryption, and more, but it also introduced many new attack vectors that may be abused by authoritarian governments. For example, if such a system could be used to find CSAM, people worried governments could compel Apple to use it to find certain kinds of speech or imagery. Apple ultimately abandoned the CSAM detection system. Advocates have spoken out against Apple's decision, suggesting the company is doing nothing to prevent the spread of such content. While the technology backbone is the same, it seems Apple has landed on a much less controversial use. Even so, there are those that would prefer not to offer data, privacy protected or not, to train Apple Intelligence. Nothing has been implemented yet, so don't worry, there's still time to ensure you are opted out. Apple says it will introduce the feature in iOS 18.5 and testing will begin in a future beta. To check if you're opted in or not, open Settings, scroll down and select Privacy & Security, then selecting Analytics & Improvements. Toggle the "Share iPhone & Watch Analytics" setting to opt out of AI training if you haven't already.

[14]

Digital Trends

Apple hopes your emails will fix its misfiring AI

Table of Contents Table of Contents A brief summary of AI training How is Apple planning to fix its AI? Why is it a crucial step forward? Apple's AI efforts haven't made the same kind of impact as Google's Gemini, Microsoft Copilot, or OpenAI's ChatGPT. The company's AI stack, dubbed Apple Intelligence, hasn't moved the functional needle for iPhone and Mac users, even triggering an internal management crisis at the company. It seems user data could rescue the sinking ship. Earlier today, the company published a Machine Learning research paper that details a new approach to train its onboard AI using data stored on your iPhone, starting with emails. These emails will be used to improve features such as email summarization and Writing Tools. Recommended Videos A brief summary of AI training Before we dig into the specifics, here's a brief rundown of how AI tools work. The first step is training, which essentially involves feeding a vast amount of human-created data to an "artificial brain." Think of books, articles, research papers, and more. The more data it is fed, the better its responses get. That's because chatbots, which are technically known as Large Language Models (LLMs), try to understand the pattern and relationship between words. Tools like ChatGPT, which are now integrated within Siri and Apple Intelligence, are essentially word predictors. But there is only so much data out there to train an AI, and the whole process is pretty time-consuming and expensive. So, why not use AI-generated data to train your AI? Well, as per research, it will technically "poison" the AI models. That means more inaccurate responses, spouting nonsense, and delivering misleading outputs. How is Apple planning to fix its AI? Instead of relying solely on synthetic data, one can improve the responses of an AI tool by refining and fine-tuning it. The best approach to train an AI assistant, however, is to give it more human data. The data stored on your phone is the richest source for such information, but a company can't simply do that. It would be a serious privacy violation and an open invitation to lawsuits. What Apple intends to do is take an indirect peek at your emails, without ever copying or sending them to its servers. In a nutshell, all your data remains on your phone. Moreover, Apple is not going to technically "read" your emails. Instead, it will simply compare them to a pile of synthetic emails. The secret sauce here is identifying which synthetic data is the closest match for an email written by a human. That would give Apple an idea about which kind of data is the most realistic way humans engage in a conversation. So far, Apple has "typically" used synthetic data for AI training, reports Bloomberg. "This synthetic data can then be used to test the quality of our models on more representative data and identify areas of improvement for features like summarization," the company explains. It could lead to tangible improvements for the responses you get from Siri and Apple Intelligence down the road. Based on learnings from realistic human data, Apple aims to improve its email summarization system and a few items in the Writing Tools kit. "The contents of the sampled emails never leave the device and are never shared with Apple," assures the company. Apple says it has already put similar privacy-first training systems in place for the Genmoji system. Why is it a crucial step forward? Right now, the summaries you get courtesy of Apple Intelligence in Mail can often be quite confusing, and occasionally, downright gibberish. The status quo of app notifications is no different, and it got so bad that Apple had to temporarily pause it after drawing flak from the BBC for misrepresenting news articles. The situation is so bad that the summarized notifications have become a joke in our team chats. In its bid to summarize conversations or emails, Apple Intelligence often clubs together random sentences that either make no sense, or give an entirely different spin to what's really happening. The core problem is that AI still struggles with context and human intent. The best way to fix it is by training it on more situation-aware material with proper contextual understanding. Recently, AI models capable of reasoning have arrived on the scene, but they haven't quite been a magic pill. The method described by Apple sounds like the best of both worlds. "This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy," says the company. Now, here is the good part. Apple is not going to read all emails stored on iPhones and Macs across the world. Instead, it is taking an opt-in approach. Only users who have explicitly agreed to share Device Analytics data with Apple will be a part of the AI training process. You can enable it by following this path: Settings > Privacy & Security > Analytics & Improvements. The company will reportedly kick the plans into action with the upcoming iOS 18.5, iPad 18.5, and macOS 15.5 beta updates. A corresponding build targeted at developers has already been released.

[15]

Lifehacker

How Apple Plans to Improve Its AI Models While Protecting User Privacy

Even the most loyal Apple users will admit that the company is lagging behind when it comes to AI, with Siri's big Apple Intelligence upgrade now officially delayed (having been heavily promoted throughout last year). In a new blog post, Apple outlines some of the ways it's hoping to get back on track. One of the potential reasons for Apple's generative AI struggles may be that it prioritizes user privacy a lot more than the likes of OpenAI and Google. Apple doesn't gather any user data to train its large language models or LLMs (though it has trained its models on free text on the web), and relies heavily on synthetic data to produce AI text from prompts and from existing writing. The problem with synthetic data is, well, its artificiality. It lacks the nuance and variation of human writing as it changes over time, and without any text written by actual people for comparison, it's difficult to assess the quality of what the AI is outputting. As mentioned in the blog post, Apple is now planning improvements to text generation. In basic terms, the idea is that AI-generated, synthetic text will be compared to a selection of actual writing from users, stored on Apple devices -- but with several layers of protection in place to prevent individual users from being identified, or any personal correspondence being sent to Apple. The approach essentially grades synthetic text by comparing it against real writing samples, but only the aggregated grades get back to Apple. What's actually happening doesn't involve actual words or sentences at all, in fact: Both the synthetic text and human writing get converted into "embeddings," which are essentially mathematical representations of the text. There's enough data to rank the quality of the AI text, without getting to the level of doing any real reading. All of this information is encrypted as it's transferred, and comparisons are only made on devices where users have opted into Device Analytics on their gadgets (the option can be found in Privacy & Security > Analytics & Improvements in Settings on iOS, for example). Apple never knows which AI text sample was picked by an individual device, only which samples have better rankings from all the devices pinged. This anonymized grading system can be used to improve text made or rewritten by its generative AI models, Apple says, and should also mean more accurate, more intelligent summaries of text as well. Outputs that rank the highest could be tweaked with a different word or two, before being sent back for another round of assessments. A simpler version of the same approach is already being used by Apple to power its Genmoji AI feature, where you can magic up an octopus on a surfboard or a cowboy wearing headphones. Apple aggregates data from multiple devices to see which prompts are proving popular, while applying safeguards to ensure unique, individual requests aren't seen or tied to specific users or devices. Again, this only happens on iPhones, iPads, and Macs that have been opted into Device Analytics. By getting devices to report "noisy" signals without any specific user information in them, Apple can improve its AI models based on aggregate data and users don't have to worry about their Genmoji prompts being discovered. Similar techniques will soon be used in other Apple Intelligence features, Apple says. Those features will include Image Playground, Image Wand, Memories Creation, Writing Tools, and Visual Intelligence, which have all been among the first Apple AI capabilities to actually make it out to devices. According to Bloomberg, the new and improved systems will be tested in upcoming beta releases of iOS 18.5, iPadOS 18.5, and macOS 15.5. We may well hear more about them, and about Apple Intelligence in general, at this year's Apple Worldwide Developer Conference, which is scheduled to get underway on Monday, June 9. Meanwhile, Apple's rivals in the AI space aren't showing any signs of slowing down -- and have fewer scruples about using text written by their users to train their AI models further. In recent days we've seen Microsoft push out a range of updates for Copilot (including Copilot Vision and file search), Google add video generation to Gemini, and OpenAI upgrade the memory capabilities of ChatGPT.

[16]

Analytics India Magazine

New Training Methods to Save Apple Intelligence? | AIM Media House

On the company's Machine Learning Research blog, Apple outlined new privacy-preserving training methods to enhance its suite of AI features. Apple is introducing new methods to train and improve the performance of its Apple Intelligence features, the company announced in a blog post on Monday. The Cupertino giant employs synthetic data to enhance Apple Intelligence's features that handle long text chunks for summarisation or writing. It says the process will be improved further. For example, in the case of emails, it begins with Apple generating artificial mails that mimic real ones but contain no user information. Then, to make this data useful, user devices provide feedback privately. Each device compares these artificial examples to the user's genuine emails locally and anonymously, indicating which types of artificial emails are the closest match. Apple utilises this collective feedback from many users to enhance its synthetic training data, improving the AI without ever accessing anyone's personal email content. "We will soon begin using synthetic data with users who opt into device analytics to improve email summaries," said Apple. Source: Apple Apple stated it will apply the same privacy-preserving techniques used for enhancing Genmoji to improve features such as Image Playground, Image Wand, Memories Creation, and Writing Tools within Apple Intelligence and Visual Intelligence. To enhance its Genmoji feature, Apple analyses popular prompts from users who opt to share analytics. "For example, understanding how our models perform when a user requests Genmoji that contain multiple entities (like "dinosaur in a cowboy hat") helps us improve the responses to those kinds of requests," said Apple. This process employs privacy-preserving techniques, ensuring that individual data remains confidential, devices are untraceable, and rare prompts stay undisclosed. "These techniques allow Apple to understand overall trends, without learning information about any individual, like what prompts they use or the content of their emails," said the company. Apple Intelligence is in dire need of improvement, especially in terms of generating accurate summaries. Last year, Apple came under fire for inaccurate AI-generated summaries of news articles, particularly from BBC News. Moreover, there is an entire subreddit called r/AppleIntelligenceFail, where users share some of the most confusing and out-of-context results derived from Apple Intelligence. Recently, Bloomberg reported that Mike Rockwell, the Apple Vision Pro creator, will replace John Giannandrea as the AI head. As per the reports, CEO Tim Cook had "lost confidence" in Giannandrea's ability to develop products. Furthermore, Apple also announced that the release of a more personalised version of Siri had been delayed until 2026. Last month, it was reported that the company was struggling to mitigate various bugs and engineering problems within Siri.

[17]

NDTV Gadgets 360

Apple Is Analysing User Data Patterns to Improve Its AI Features

Apple is developing new techniques to analyse user data patterns and aggregated insights to improve its artificial intelligence (AI) features. The Cupertino-based tech giant shared these differential privacy techniques on Monday, highlighting that these methods will not breach users' privacy. Instead, the company is focusing on gathering data such as usage trends and data embeddings to measure and improve its text generation tools and Genmoji. Notably, Apple said that this information will be taken only from those devices that have opted in to share Device Analytics. In a post on its Machine Learning Research domain, the iPhone maker detailed the new technique it is developing to improve some of the Apple Intelligence features. The tech giant's AI offerings have been underwhelming so far, and the company claims one of the reasons for that is its ethical practices around pretraining and sourcing data for its AI models. Apple claims that its generative AI models are trained on synthetic data (data that is created by other AI models or digital sources and not by any human). While this is still a fair way to train large language models (LLMs), as it does provide them with knowledge about the world, since the models are not learning from the human style of writing and presentation, the output could come off as bland and generic. This is also known as AI slop. To fix these issues and to improve the output quality of its AI models, the tech giant is now looking at the option to learn from user data without really looking into users' private data. Apple calls this technique "differential privacy." For Genmoji, Apple will use differentially private methods to identify popular prompts and prompt patterns from users who have opted in to share Device Analytics with the company. The iPhone maker says it will provide a mathematical guarantee that unique or rare prompts will not be discovered and that specific prompts cannot be linked to any individual. Collecting this information will help the company evaluate the types of prompts that are "most representative of a real user engagement." Essentially, Apple will be looking into the kind of prompts that lead to satisfactory output and where users repeatedly add prompts to get to the desired result. One example shared in the post included the models' performance in generating multiple entities. Apple plans to expand this approach for Image Playground, Image Wand, Memories Creation, and Writing Tools in Apple Intelligence, as well as in Visual Intelligence with future releases. Differential Privacy in Apple Intelligence's text generation feature Photo Credit: Apple Another key area where the tech giant is using this technique is text generation. The approach is somewhat different from the one used with Genmoji. To assess the capability of its tools in email generation, the company created a set of emails that cover common topics. For each topic, the company generated multiple variations and then derived representations of the emails, which included key dimensions such as language, topic, and length. Apple calls these embeddings. These embeddings were then sent to a small number of users that have opted in to Device Analytics. The synthetic embeddings were then matched against a sample of the users' emails. "As a result of these protections, Apple can construct synthetic data that is reflective of aggregate trends, without ever collecting or reading any user email content," the tech giant said. In essence, the company would not know the content of the emails but could still understand how people prefer their emails to be worded. Apple is currently using this method to improve text generation in emails, and says that in the future, it will also use the same approach for email summaries.

[18]

Wccftech

Apple Uses New Tech That Compares Synthetic Data With Real Emails To Train AI Models, Then Applies Embeddings And Privacy Tools To Improve Text Output Quality

Apple was supposed to release its highly anticipated Personalized Siri feature last month with the release of iOS 18.4. Still, it was later confirmed that the new utility would be delayed until next year. A new report has emerged that shares details on how Apple trains its AI models with reference to Apple Intelligence. Even though Apple officially stated that the Personalized Siri features will be delayed until next year, employees within the company are growing confident that the feature will be ready for launch later this year. In a new report, Bloomberg highlights how Apple trains its AI models for Apple Intelligence. The report cites a blog post from Apple's Machine Learning Research website, describing how Apple uses synthetic data to train its AI models. We have previously reported on several occasions that Apple is lagging behind in the AI race with its competitors, and the company's strategy to use synthetic data to train AI models is a bit unconventional and has limitations. For one, it is cumbersome for the data to "understand trends" when it comes to summarization or writing tools that require longer sentences or full-fledged emails. Apple took note of this and highlighted a new technology that will allow it to circumvent the limitations by comparing the synthetic data to a sample of recent user emails. However, the process does not compromise user privacy. To improve our models we need to generate a set of many emails that cover topics that are most common in messages. To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics. For example, we might create a synthetic message, "Would you like to play tennis tomorrow at 11:30AM?" This is done without any knowledge of individual user emails. We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length. These embeddings are then sent to a small number of user devices that have opted in to Device Analytics. Participating devices then select a small sample of recent user emails and compute their embeddings. Each device then decides which of the synthetic embeddings is closest to these samples. Using differential privacy, Apple can then learn the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device. These most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset. For example, if the message about playing tennis is one of the top embeddings, a similar message replacing "tennis" with "soccer" or another sport could be generated and added to the set for the next round of curation (see Figure 1). This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy. While the company is aware of the limitations, it explains that the new technology will allow it to better understand the overall trends without compromising user privacy or gathering information. Bloomberg also claims that the company will release the new technology in a new beta of iOS 18.5 and macOS 15.5. You can check out Apple's full post on the matter for more details.

[19]

PYMNTS

Apple to Tap User Data for LLM Training | PYMNTS.com

Apple is planning to analyze user data to improve its large language model (LLM) software while upholding user privacy. The company has been using synthetic data to train its artificial intelligence (AI) models but has found that method to be ineffective, Apple wrote in a Monday (April 14) blog post. Now, Apple will still use synthetic data as a starting point but will compare the generated text to a sample of emails from participating user to determine which generated output best lines up with real-world messages. "Only users who have opted-in to send Device Analytics information to Apple participate," Apple said in the blog post. "The contents of the sampled emails never leave the device and are never shared with Apple. A participating device will send only a signal indicating which of the variants is closest to the sampled data on the device, and Apple learns which selected synthetic emails are most often selected across all devices." The new technique aims to improve text-related features from the Apple Intelligence platform, like summaries in notifications, the ability to synthesize thoughts in its Writing Tools and recaps of user messages.

[20]

MediaNama

How Apple Plans to Improve AI Using Synthetic User Data

Apple intends to improve the functioning of its artificial intelligence (AI) models by comparing the synthetic data it trains its models on to real-life data samples from its users, the company announced in a recent blog post. Synthetic data is data created to mimic the format and important properties of user data but does not contain any actual user data. This comes after Apple introduced Apple Intelligence, an AI-based messaging, email, audio, and webpage summary generation feature as part of the iOS 18.1.1 update. The feature notably made a lot of mistakes in summarisation; most notably, it created a fake news headline attributed to the BBC, following which the company suspended the service. Apple explains that to improve the quality of the synthetic data that its models work on, it generates a set of synthetic email topics that users commonly mention in their emails. It then derives a representation (or embedding) of these emails, which includes the key dimensions of the message, like the language, topic, and length of the message. Apple sends these embeddings to devices that have opted in for device analytics. Users who opt in to provide Apple access to device analytics allow the company to access details about hardware and operating system specifications, performance statistics, and data about how they use the device/apps. Apple compares the synthetic embeddings to embeddings from a small sample of recent user emails from devices that have opted in for device analytics. Such devices then select which synthetic embeddings are closest to real user email samples. Apple explains that it uses 'differential privacy' to learn the synthetic samples that most devices have selected as closest to real user samples, without learning which synthetic embedding was selected on any given device. This implies that the company does not get to learn which synthetic sample corresponds to a specific user's actual emails. Apple says that it uses the closest matching synthetic samples as training or testing data and can even run curation steps to further refine the data. The company specifies that the real-user email samples that it compares synthetic data with never leave the user's device, and Apple also never gets access to this information. As companies seek to compete in the race to make the biggest and best-performing AI models, access to readily available data to train these models is becoming a challenge. In November last year, Reuters reported that AI companies are hitting a scaling wall. This meant that making models bigger and feeding them more data was no longer providing proportional capability improvements, with access to data reportedly being one of the developers' key challenges. While using synthetic data, similar to what Apple is doing, makes sense in this case, some, like OpenAI founder Sam Altman, find reliance on synthetic data strange. "It's really strange if the best way to train a model was to just generate a quadrillion tokens of synthetic data and feed that back in. You'd say that somehow that seems inefficient, and there ought to be something where you can just learn more from the data as you're training," Altman mentioned in a live interview during the AI for Good Global Summit in June 2024. At the same time, he also admitted that the company was experimenting with generating and training on synthetic data. Discussing the quality of output from a model trained on synthetic data, Altman said that what was important was that the training data was of high quality. As such, if Apple can improve the quality of its synthetic training data through this comparative exercise, it should be able to improve the quality of Apple Intelligence outputs.

[21]

Softonic

No, Apple Intelligence won't be trained with your data: This is the system Apple will use to improve its AI - Softonic

Apple is taking a different approach from many tech giants when it comes to training its artificial intelligence. Instead of using your personal data directly, the company has designed a system that uses synthetic content while still capturing real-world trends. The goal is clear: to improve Apple Intelligence without compromising user privacy. Apple relies on large sets of synthetic emails to train its AI tools. These are artificially generated messages covering a wide range of topics. Each message is converted into a digital representation, or embedding, capturing its language, subject, and structure. These embeddings are then sent to a limited number of user devices -- only those that have opted into Apple's Device Analytics. On these devices, the system compares the synthetic embeddings to recent user emails stored locally, using a privacy-preserving method known as differential privacy. No individual email content is shared with Apple. By measuring which synthetic embeddings are most similar to user messages, Apple determines which topics are most relevant. The most frequently selected ones are used to generate new synthetic emails -- refinements like changing "tennis" to "soccer" in a message, for example. These refined messages then become part of the next round of training, improving Apple's AI models used in features like email summarization and writing suggestions. Crucially, Apple never sees your private content, nor does it store or analyze your specific data. This privacy-first method will be introduced in upcoming betas of iOS 18.5 and macOS 15.5, reinforcing Apple's commitment to responsible AI development.

[22]

Digit

Your iPhone will now training Apple's AI, here's how it works

Apple is now adopting a new strategy to train its AI models while upholding its strong privacy commitments in order to enhance the AI-backed features. The company is altering how it uses artificial intelligence (AI) to improve tools like Siri and email summarisation in future software updates, according to a recent blog post by the Apple Machine Learning Research site and a Bloomberg report. Apple has historically trained its AI models using artificially generated content, or synthetic data. There are certain drawbacks, particularly when training models to perform complex tasks like long-form summarisation, even though this has removed the need to analyse actual user data. Also read: iPhone 16 Plus price drops by over 14,900 on Flipkart: How this deal works Apple has created a new privacy-preserving system to address this issue, which will use tiny samples of recent user emails on devices that have chosen to participate in Device Analytics. According to Apple, the procedure doesn't give it access to user identities or specific emails. Rather, it employs a technique known as embeddings, which represents emails according to their language, topic, and length. These are compared with synthetic messages on the device. Also read: OnePlus 12 price drops by Rs 19,001 on Amazon, here's how to grab this deal Without ever seeing the real emails or knowing which device chose what, Apple is able to determine which kinds of fake messages most closely resemble typical communication patterns thanks to differential privacy. In the end, this will enhance how AI features create or summarise content across all applications, including Mail and Notes, and assist Apple in improving its synthetic training data. This is expected to be released in the next beta versions of macOS 15.5, as well as iOS 18.5. The precise rollout timeline and other specifics of this "innovation," however, are still unknown.

Twitter

Facebook

Copy Link

Apple unveils a new strategy to enhance its AI models using synthetic data and differential privacy, aiming to improve features like email summaries while protecting user privacy.

Apple's New AI Training Strategy

Apple has unveiled an innovative approach to improve its AI models while maintaining its commitment to user privacy. The tech giant plans to use a combination of synthetic data and differential privacy to enhance its AI capabilities, particularly in areas like email summarization and other Apple Intelligence features 1.

The Synthetic Data Approach

At the core of Apple's strategy is the use of synthetic data, which mimics the format and properties of real user data without containing any actual user-generated content. This approach allows Apple to create a large set of artificial messages on various topics, which are then converted into "embeddings" - numerical summaries capturing attributes such as topic and length 2.

Differential Privacy and User Opt-in

To refine its synthetic data, Apple will employ differential privacy techniques. The company will send embeddings to a small number of user devices that have opted into the Device Analytics program. These devices will compare the synthetic embeddings with a sample of the user's actual emails, determining which synthetic data most closely matches real-world content 3.

Protecting User Privacy

Apple emphasizes that this process is designed to protect user privacy. The actual content of user emails never leaves the device or is shared with Apple. Instead, the devices only send signals indicating which synthetic embeddings are closest to the sampled data 4.

Expanding AI Capabilities

This new approach is expected to improve various Apple Intelligence features, including:

Email summaries
Genmoji
Image Playground
Image Wand
Memories Creation
Writing Tools
Visual Intelligence

Implementation and Availability

Apple plans to introduce this AI training system in beta versions of iOS and iPadOS 18.5 and macOS 15.5. Users can control their participation in Device Analytics through their device settings 5.

Industry Context and Challenges

This move comes as Apple faces criticism over the performance of its AI products, particularly in comparison to competitors like OpenAI, Google, and Microsoft. While these rivals often train their AI models on user data, Apple's privacy-focused approach presents unique challenges in improving AI performance 1.

Expert Opinion

Jason Hong, a computer science professor at Carnegie Mellon University, commends Apple's approach, stating that the company "should be applauded for putting their customers' privacy first." However, he also notes that this method may result in trade-offs, potentially affecting the effectiveness of Apple's AI compared to competitors 2.

References

Summarized by

Navi

[1]

TechCrunch

|Apple details how it plans to improve its AI models by privately analyzing user data | TechCrunch

[2]

CNET

|How Apple Will Analyze Your Data to Train Its AI (While Protecting Your Privacy)

[3]

The Verge

|Apple's complicated plan to improve its AI while protecting privacy

[4]

ZDNet

|How Apple plans to train its AI on your data without sacrificing your privacy

[5]

PC Magazine

|Apple's Going to Take a Peek at Your On-Device Data to Improve Its AI

Explore today's top stories

Apple Explores Google's Gemini AI to Revamp Siri Amid AI Race Pressure

Apple is in early talks with Google to potentially use Gemini AI for a Siri revamp, signaling a shift in Apple's AI strategy as it faces delays in its own development efforts.

18 Sources

Technology

19 hrs ago

Apple Explores Google's Gemini AI to Revamp Siri Amid AI

18 Sources

Technology

19 hrs ago

The Hidden Environmental Cost of AI's Growing Presence in Everyday Life

As artificial intelligence becomes increasingly integrated into daily activities, concerns arise about its substantial energy consumption and environmental impact, prompting experts to suggest ways to mitigate these effects.

8 Sources

Technology

19 hrs ago

The Hidden Environmental Cost of AI's Growing Presence in

8 Sources

Technology

19 hrs ago

Meta Partners with Midjourney to Boost AI Image and Video Generation Capabilities

Meta has announced a partnership with Midjourney to license and integrate the startup's AI image and video generation technology into its future models and products, signaling a shift in Meta's AI strategy.

9 Sources

Technology

19 hrs ago

Meta Partners with Midjourney to Boost AI Image and Video

9 Sources

Technology

19 hrs ago

Elon Musk Launches 'Macrohard': An AI-Driven Rival to Microsoft

Elon Musk announces the creation of 'Macrohard', an AI-focused software company aimed at challenging Microsoft's dominance in the tech industry.

3 Sources

Technology

19 hrs ago

Elon Musk Launches 'Macrohard': An AI-Driven Rival to

3 Sources

Technology

19 hrs ago

NVIDIA's Next-Gen 'Rubin' AI Architecture: A Revolutionary Leap in Compute Technology

NVIDIA CEO Jensen Huang confirms the development of the company's most advanced AI architecture, 'Rubin', with six new chips currently in trial production at TSMC.

2 Sources

Technology

11 hrs ago

NVIDIA's Next-Gen 'Rubin' AI Architecture: A Revolutionary

2 Sources

Technology

11 hrs ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About