Curated by THEOUTPOST
On Tue, 15 Apr, 12:02 AM UTC
16 Sources
[1]
Apple details how it plans to improve its AI models by privately analyzing user data | TechCrunch
In the wake of criticism over the underwhelming performance of its AI products, especially in areas like notification summaries, Apple on Monday detailed how it is trying to improve its AI models by analyzing user data privately with the aid of synthetic data. Using an approach called "differential privacy," the company said it would first generate synthetic data and then poll users' devices (provided they've opted-in to share device analytics with Apple) with snippets of the generated synthetic data to compare how accurate its models are, and subsequently improve them. "Synthetic data are created to mimic the format and important properties of user data, but do not contain any actual user generated content," the company wrote in a blog post. "To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics [...] We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length." The company said these embeddings are then sent to a small number of user devices that have opted in to Device Analytics, and the devices then compare them with a sample of emails to tell Apple which embeddings are most accurate. The company said it is using this approach to improve its Genmoji models, and would in the future use synthetic data for Image Playground, Image Wand, Memories Creation and Writing Tools as well as Visual Intelligence. Apple said it would also poll users who opt in to share device analytics with synthetic data to improve email summaries.
[2]
How Apple Will Analyze Your Data to Train Its AI (While Protecting Your Privacy)
Samantha Kelly is a freelance writer with a focus on consumer technology, AI, social media, Big Tech, emerging trends and how they impact our everyday lives. Her work has been featured on CNN, NBC, NPR, the BBC, Mashable and more. Apple said it will begin analyzing on-device user data as part of a broader push to strengthen its AI platform. In a blog post, the company outlined a new approach designed to expand its AI capabilities while safeguarding user privacy, especially as competitors like OpenAI and Google advance more quickly with fewer restrictions. Apple said it will train its AI models using synthetic data, known as information that mimics the format and characteristics of real-world messages without including any actual user-generated content. "When creating synthetic data, our goal is to produce synthetic sentences or emails that are similar enough in topic or style to the real thing to help improve our models for summarization, but without Apple collecting emails from the device," the company said in a blog post. For Apple Intelligence features including summarization and writing tools that handle longer content, the company said its usual methods, like those used for short-form prompts in Genmoji, aren't effective. Instead, its new approach will generate a large set of synthetic emails on various topics - such as, "Want to play tennis tomorrow?" - without referencing any actual user data. Each message is converted into what Apple calls an "embedding," a numerical summary capturing attributes including topic and length. The embeddings are sent only to opted-in devices, which then compare them to a small, private sample of recent user emails stored locally. "This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy," the company said. Apple said it will start using this approach "soon" with users who opt in to sharing device analytics. Jason Hong, a computer science professor at Carnegie Mellon University, said this type of "differential privacy" is a sophisticated approach for analyzing and using data aggregated from large numbers of people. "Apple could have taken the easy approach of just taking everyone's data and using it to build their AI models," he said. "Instead, Apple chose to deploy these differential privacy approaches for Apple Intelligence, and they should be applauded for putting their customers' privacy first." However, he said there will likely be tradeoffs, including the possibility that Apple Intelligence may not be as effective as some competitors because rivals will have more access to people's data. He also said Apple's models may likely be harder to debug and might take more battery power to deploy.
[3]
Apple's complicated plan to improve its AI while protecting privacy
Emma Roth is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO. Apple says it's found a way to make its AI models better without training on its users' data or even copying it from their iPhones and Macs. In a blog post first reported on by Bloomberg, the company outlined its plans to have devices compare a synthetic dataset to samples of recent emails or messages from users who have opted into its Device Analytics program. Apple devices will be able to determine which synthetic inputs are closest to real samples, which they will relay to the company by sending "only a signal indicating which of the variants is closest to the sampled data." That way, according to Apple, it doesn't access user data, and the data never leaves the device. Apple will then use the most frequently picked fake samples to improve its AI text outputs, such as email summaries. Currently, Apple trains its AI models on synthetic data only, potentially resulting in less helpful responses, according to Bloomberg's Mark Gurman. Apple has struggled with the launch of its flagship Apple Intelligence features, as it pushed back the launch of some capabilities and replaced the head of its Siri team. But now, Apple is trying to turn things around by introducing its new AI training system in a beta version of iOS and iPadOS 18.5 and macOS 15.5, according to Gurman. Apple has been talking up its use of a method called differential privacy to keep user data private since at least 2016 with the launch of iOS 10 and has already used it to improve the AI-powered Genmoji feature. This also applies to the company's new AI training plans as well, as Apple says that introducing randomized information into a broader dataset will help prevent it from linking data to any one person.
[4]
Apple details on-device Apple Intelligence training system using user data - 9to5Mac
Last month, Apple delayed the rollout of its more personal and powerful Siri features. As it looks to right the ship for future Apple Intelligence updates, Bloomberg highlights a shift that Apple is making in how it trains its artificial intelligence models. The report highlights a blog post from Apple's Machine Learning Research website, explaining how Apple generally uses synthetic data to train its AI models. There are limitations to this strategy, however, including the fact that it's hard for synthetic data to "understand trends" in features like summarization or writing tools that operate on longer sentences or entire email messages. To address this limitation, Apple highlights a new technology it will soon start using that compares the synthetic data to a small sample of recent user emails, but without compromising user privacy: To improve our models we need to generate a set of many emails that cover topics that are most common in messages. To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics. For example, we might create a synthetic message, "Would you like to play tennis tomorrow at 11:30AM?" This is done without any knowledge of individual user emails. We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length. These embeddings are then sent to a small number of user devices that have opted in to Device Analytics. Participating devices then select a small sample of recent user emails and compute their embeddings. Each device then decides which of the synthetic embeddings is closest to these samples. Using differential privacy, Apple can then learn the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device. These most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset. For example, if the message about playing tennis is one of the top embeddings, a similar message replacing "tennis" with "soccer" or another sport could be generated and added to the set for the next round of curation (see Figure 1). This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy. Apple explains that these techniques allow it to "understand overall trends, without learning information about any individual. Bloomberg says that Apple will roll out this new system in a future beta of iOS 18.5 and macOS 15.5.
[5]
Here's How Apple is Working to Improve Apple Intelligence
With its uncompromising focus on user privacy, Apple has faced challenges collecting enough data to train the large language models that power Apple Intelligence features and that will ultimately improve Siri. To improve Apple Intelligence, Apple has to come up with privacy preserving options for AI training, and some of the methods the company is using have been outlined in a new Machine Learning Research blog post. Basically, Apple needs user data to improve summarization, writing tools, and other Apple Intelligence features, but it doesn't want to collect data from individual users. So instead, Apple has worked out a way to understand usage trends using differential privacy and data that's not linked to any one person. Apple is creating synthetic data that is representative of aggregate trends in real user data, and it is using on-device detection to make comparisons, providing the company with insight without the need to access sensitive information. It works like this: Apple generates multiple synthetic emails on topics that are common in user emails, such as an invitation to play a game of tennis at 3:00 p.m. Apple then creates an "embedding" from that email with specific language, topic, and length info. Apple might create several embeddings with varying email length and information. Those embeddings are sent to a small number of iPhone users who have Device Analytics turned on, and the iPhones that receive the embeddings select a sample of actual user emails and compute embeddings for those actual emails. The synthetic embeddings that Apple created are compared to the embedding for the real email, and the user's iPhone decides which of the synthetic embeddings is closest to the actual sample. Apple then uses differential privacy to determine which of the synthetic embeddings are most commonly selected across all devices, so it knows how emails are most commonly worded without ever seeing user emails and without knowing which specific devices selected which embeddings as the most similar. Apple says that the most frequently selected synthetic embeddings it collects can be used to generate training or testing data, or can be used as examples for further data refinement. The process provides Apple with a way to improve the topics and language of synthetic emails, which in turn trains models to create better text outputs for email summaries and other features, all without violating user privacy. Apple does something similar for Genmoji, using differential privacy to identify popular prompts and prompt patterns that can be used to improve the image generation feature. Apple uses a technique to ensure that it only receives Genmoji prompts that have been used by hundreds of people, and nothing specific or unique that could identify an individual person. Apple can't see Genmoji associated with a personal device, and all signals that are relayed are anonymized and include random noise to hide user identity. Apple also doesn't link any data with an IP address or ID that could be associated with an Apple Account. With both of these methods, only users that have opted-in to send Device Analytics to Apple participate in the testing, so if you don't want to have your data used in this way, you can turn that option off.
[6]
How will Apple improve its AI while protecting your privacy?
Apple uses Differential Privacy to learn trends about how its user base is using AI. With all the problems we've heard about Apple Intelligence lately-delayed Siri improvements, bad news notification summaries, unimpressive image generation, and more-you might wonder what Apple is planning to do to right the ship. Obviously new and improved models are important, and so in increased training, but Apple has a particularly hard time of this because its privacy policies are a lot more strict than other companies creating AI products. In a new post on Apple's Machine Learning Research site, the company explains a technique it will employ to help its AI be more relevant, more often, without training it on your personal data. Differential Privacy is a way to, as Apple puts it, "gain insight into what many Apple users are doing, while helping to preserve the privacy of individual users." Basically, whenever Apple collects data in a system like this, it first strips out any identifying information (device ID, IP address, and so on) and then slightly alters the data. When millions of users submit results, that "noise" cancels out. That's the Differential Privacy part: take enough samples with random noise and identifiers removed, and you can't possibly connect any particular bit of data with a user. It's a good way to, for example, get a good statistical sample of which emoji are picked most often, or which autocorrect word is used the most after a particular misspelling-collecting data on user preferences without actually being able to trace any particular data point back to any user, even if they wanted to. Apple can generate synthetic text that is representative of common prompts, then use those differential privacy techniques to find out which synthetic samples are selected by users most often. Or to determine which words and phrases are common in Genmoji prompts and which results the users are most likely to pick. The AI system could generate common sentences used in emails, for example, and then send multiple variants out to different users. Then, using differential privacy techniques, Apple can find out which ones are selected most frequently (while having no ability to know what any one individual chose). Apple has been using this technique for years to gather data meant to improve QuickType suggestions, emoji suggestions, lookup hints, and more. As anonymous as it is, it is still opt-in. Apple doesn't collect this type of data unless you affirmatively enable device analytics. Techniques like this are already being used to improve Genmoji, and in an upcoming update, they'll be used for Image Generation, Image Wand, Memories Creation, Writing Tools, and Visual Intelligence. A Bloomberg report says the new system will come in a beta update to iOS 18.5, iPadOS 18.5, and macOS 18.5 (the second beta was released today). Of course, this is just data gathering, and it will take weeks or months of data collection and retraining to measurably improve Apple Intelligence features.
[7]
Apple will soon train Apple Intelligence on select user data -- here's what you need to know
Apple has recently confirmed how it will start to use certain user data to help train its Apple Intelligence models. There's little doubt that Apple Intelligence has had a few issues lately, including delaying its Siri 2.0 feature launch. To help avoid similar issues in the future, Apple is introducing a change regarding how it trains its AI. This change was detailed in a recent blog post from Apple's Machine Learning Research website, via Bloomberg. The blog details how Apple trains its AI using synthetic data, but this method has limitations. This is because synthetic data struggles to understand trends in features like Summarization or Writing Tools. However, the new method detailed by Apple aims to compare its synthetic data with user data to help solve this issue. The process begins with Apple generating "a set of many emails that cover topics that are most common in messages. To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics." It is worth noting that Apple is adamant that these emails are not generated with any knowledge regarding individual user emails. The said data is then derived into a representation, which is called an embedding, that captures some of the key information in the messages. This includes things like language, topic and length. These embeddings are then "sent to a small number of user devices that have opted in to Device Analytics." You can find more information on Device Analytics on Apple's website. When users receive the data, they will then select a small number of their recent emails to compare them to. The device will then measure these selected emails against the embeddings to find which sample is closest. Apple will also use differential privacy to learn "the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device." According to the blog, the "most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset." This process will, according to Apple, allow them to "improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy." Only time will tell if this will improve Apple Intelligence to the degree that is needed to help Apple catch up with the best AI chatbots and AI assistants. However, it will likely require more than just better training to compete with Gemini 2.0 or ChatGPT.
[8]
Apple has a plan for improving Apple Intelligence, but it needs your help - and your data
Apple Intelligence has not had the best year so far, but if you think Apple is giving up, you're wrong. It has big plans and is moving forward with new model training strategies that could vastly improve its AI performance. However, the changes do involve a closer look at your data - if you opt-in. In a new technical paper from Apple's Machine Learning Research, "Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy," Apple outlined new plans for combining data analytics with user data and synthetic data generation to better train the models behind many of Apple Intelligence features. Up to now, Apple's been training its models on purely synthetic data, which tries to mimic what real data might be like, but there are limitations. In Genmoji's, for instance, Apple's use of synthetic data doesn't always point to how real users engage with the system. From the paper: "For example, understanding how our models perform when a user requests Genmoji that contain multiple entities (like "dinosaur in a cowboy hat") helps us improve the responses to those kinds of requests." Essentially, if users opt-in, the system can poll the device to see if it has seen a data segment. However, your phone doesn't respond with the data; instead, it sends back a noisy and anonymized signal, which is apparently enough for Apple's model to learn. The process is somewhat different for models that work with longer texts like Writing tools and Summarizations. In this case, Apple uses synthetic models, and then they send a representation of these synthetic models to users who have opted into data analytics. On the device, the system then performs a comparison that seems to compare these representations against samples of recent emails. "These most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset." It's complicated stuff. The key, though, is that Apple applies differential privacy to all the user data, which is the process of adding noise that makes it impossible to connect that data to a real user. Still, none of this works if you don't opt into Apple's Data Analytics, which usually happens when you first set up your iPhone, iPad, or MacBook. Doing so does not put your data or privacy at risk, but that training should lead to better models and, hopefully, a better Apple Intelligence experience on your iPhone and other Apple devices. It might also mean smarter and more sensible rewrites and summaries.
[9]
On-device Apple Intelligence training methods seem to be based on a controversial technology
Apple Intelligence to be trained on anonymized user data on an opt-in basis On Monday, Apple shared its plans to allow users to opt into on-device Apple Intelligence training using Differential Privacy techniques that are incredibly similar to its failed CSAM detection system. Differential Privacy is a concept Apple embraced openly in 2016 with iOS 10. It is a privacy-preserving method of data collection that introduces noise to sample data to prevent the data collectors from figuring out where the data came from. According to a post on Apple's machine learning blog, Apple is working to implement Differential Privacy as a method to gather user data to train Apple Intelligence. The data is provided on an opt-in basis, anonymously, and in a way that can't be traced back to an individual user. The story was first covered by Bloomberg, which explained Apple's report on using synthetic data trained on real-world user information. However, it isn't as simple as grabbing user data off of an iPhone to analyze in a server farm. Instead, Apple will utilize a technique called Differential Privacy, which, if you've forgotten, is a system designed to introduce noise to data collection so individual data points cannot be traced back to the source. Apple takes it a step further by leaving user data on device -- only polling for accuracy and taking the poll results off of the user's device. These methods ensure that Apple's principles behind privacy and security are preserved. Users that opt into sharing device analytics will participate in this system, but none of their data will ever leave their iPhone. Differential Privacy is a concept Apple leaned on and developed since at least 2006, but didn't make a part of its public identity until 2016. It started as a way to learn how people used emojis, to find new words for local dictionaries, to power deep links within apps, and as a Notes search tool. Apple says that starting with iOS 18.5, Differential Privacy will be used to analyze user data and train specific Apple Intelligence systems starting with Genmoji. It will be able to identify patterns of common prompts people use so Apple can better train the AI and get better results for those prompts. Basically, Apple provides artificial prompts it believes are popular, like "dinosaur in a cowboy hat" and it looks for pattern matches in user data analytics. Because of artificially injected noise and a threshold of needing hundreds of fragment matches, there isn't any way to surface unique or individual-identifying prompts. Plus, these searches for fragments of prompts only result in a positive or negative poll, so no user data is derived from the analysis. Again, no data can be isolated and traced back to a single person or identifier. The same technique will be used for analyzing Image Playground, Image Wand, Memories Creation, and Writing Tools. These systems rely on short prompts, so the analysis can be limited to simple prompt pattern matching. Apple wants to take these methods further by implementing them for text generation. Since text generation for email and other systems results in much longer prompts, and likely, more private user data, Apple took extra steps. Apple is using recent research into developing synthetic data that can be used to represent aggregate trends in real user data. Of course, this is done without removing a single bit of text from the user's device. After generating synthetic emails that may represent real emails, they are compared to limited samples of recent user emails that have been computed into synthetic embeddings. The synthetic embeddings closest to the samples across many devices prove which synthetic data generated by Apple are most representative of real human communication. Once a pattern is found across devices, that synthetic data and pattern matching can be refined to work across different topics. The process enables Apple to train Apple Intelligence to produce better summaries and suggestions. Again, the Differential Privacy method of Apple Intelligence training is opt-in and takes place on-device. User data never leaves the device, and gathered polling results have noise introduced, so even while user data isn't present, individual results can't be tied back to a single identifier. If Apple's methods here ring any bells, it's because they are nearly identical to the methods the company planned to implement for CSAM detection. The system would have converted user photos into hashes that were compared to a database of hashes of known CSAM. That analysis would occur either on-device for local photos, or in iCloud photo storage. In either instance, Apple was able to perform the photo hash matching without ever looking at a user photo or removing a photo from the device or iCloud. When enough instances of potential positive results for CSAM hash matches occurred on a single device, it would trigger a system that sent affected images to be analyzed by humans. If the discovered images were CSAM, the authorities were notified. The CSAM detection system preserved user privacy, data encryption, and more, but it also introduced many new attack vectors that may be abused by authoritarian governments. For example, if such a system could be used to find CSAM, people worried governments could compel Apple to use it to find certain kinds of speech or imagery. Apple ultimately abandoned the CSAM detection system. Advocates have spoken out against Apple's decision, suggesting the company is doing nothing to prevent the spread of such content. While the technology backbone is the same, it seems Apple has landed on a much less controversial use. Even so, there are those that would prefer not to offer data, privacy protected or not, to train Apple Intelligence. Nothing has been implemented yet, so don't worry, there's still time to ensure you are opted out. Apple says it will introduce the feature in iOS 18.5 and testing will begin in a future beta. To check if you're opted in or not, open Settings, scroll down and select Privacy & Security, then selecting Analytics & Improvements. Toggle the "Share iPhone & Watch Analytics" setting to opt out of AI training if you haven't already.
[10]
Apple hopes your emails will fix its misfiring AI
Table of Contents Table of Contents A brief summary of AI training How is Apple planning to fix its AI? Why is it a crucial step forward? Apple's AI efforts haven't made the same kind of impact as Google's Gemini, Microsoft Copilot, or OpenAI's ChatGPT. The company's AI stack, dubbed Apple Intelligence, hasn't moved the functional needle for iPhone and Mac users, even triggering an internal management crisis at the company. It seems user data could rescue the sinking ship. Earlier today, the company published a Machine Learning research paper that details a new approach to train its onboard AI using data stored on your iPhone, starting with emails. These emails will be used to improve features such as email summarization and Writing Tools. Recommended Videos A brief summary of AI training Before we dig into the specifics, here's a brief rundown of how AI tools work. The first step is training, which essentially involves feeding a vast amount of human-created data to an "artificial brain." Think of books, articles, research papers, and more. The more data it is fed, the better its responses get. That's because chatbots, which are technically known as Large Language Models (LLMs), try to understand the pattern and relationship between words. Tools like ChatGPT, which are now integrated within Siri and Apple Intelligence, are essentially word predictors. But there is only so much data out there to train an AI, and the whole process is pretty time-consuming and expensive. So, why not use AI-generated data to train your AI? Well, as per research, it will technically "poison" the AI models. That means more inaccurate responses, spouting nonsense, and delivering misleading outputs. How is Apple planning to fix its AI? Instead of relying solely on synthetic data, one can improve the responses of an AI tool by refining and fine-tuning it. The best approach to train an AI assistant, however, is to give it more human data. The data stored on your phone is the richest source for such information, but a company can't simply do that. It would be a serious privacy violation and an open invitation to lawsuits. What Apple intends to do is take an indirect peek at your emails, without ever copying or sending them to its servers. In a nutshell, all your data remains on your phone. Moreover, Apple is not going to technically "read" your emails. Instead, it will simply compare them to a pile of synthetic emails. The secret sauce here is identifying which synthetic data is the closest match for an email written by a human. That would give Apple an idea about which kind of data is the most realistic way humans engage in a conversation. So far, Apple has "typically" used synthetic data for AI training, reports Bloomberg. "This synthetic data can then be used to test the quality of our models on more representative data and identify areas of improvement for features like summarization," the company explains. It could lead to tangible improvements for the responses you get from Siri and Apple Intelligence down the road. Based on learnings from realistic human data, Apple aims to improve its email summarization system and a few items in the Writing Tools kit. "The contents of the sampled emails never leave the device and are never shared with Apple," assures the company. Apple says it has already put similar privacy-first training systems in place for the Genmoji system. Why is it a crucial step forward? Right now, the summaries you get courtesy of Apple Intelligence in Mail can often be quite confusing, and occasionally, downright gibberish. The status quo of app notifications is no different, and it got so bad that Apple had to temporarily pause it after drawing flak from the BBC for misrepresenting news articles. The situation is so bad that the summarized notifications have become a joke in our team chats. In its bid to summarize conversations or emails, Apple Intelligence often clubs together random sentences that either make no sense, or give an entirely different spin to what's really happening. The core problem is that AI still struggles with context and human intent. The best way to fix it is by training it on more situation-aware material with proper contextual understanding. Recently, AI models capable of reasoning have arrived on the scene, but they haven't quite been a magic pill. The method described by Apple sounds like the best of both worlds. "This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy," says the company. Now, here is the good part. Apple is not going to read all emails stored on iPhones and Macs across the world. Instead, it is taking an opt-in approach. Only users who have explicitly agreed to share Device Analytics data with Apple will be a part of the AI training process. You can enable it by following this path: Settings > Privacy & Security > Analytics & Improvements. The company will reportedly kick the plans into action with the upcoming iOS 18.5, iPad 18.5, and macOS 15.5 beta updates. A corresponding build targeted at developers has already been released.
[11]
New Training Methods to Save Apple Intelligence? | AIM Media House
On the company's Machine Learning Research blog, Apple outlined new privacy-preserving training methods to enhance its suite of AI features. Apple is introducing new methods to train and improve the performance of its Apple Intelligence features, the company announced in a blog post on Monday. The Cupertino giant employs synthetic data to enhance Apple Intelligence's features that handle long text chunks for summarisation or writing. It says the process will be improved further. For example, in the case of emails, it begins with Apple generating artificial mails that mimic real ones but contain no user information. Then, to make this data useful, user devices provide feedback privately. Each device compares these artificial examples to the user's genuine emails locally and anonymously, indicating which types of artificial emails are the closest match. Apple utilises this collective feedback from many users to enhance its synthetic training data, improving the AI without ever accessing anyone's personal email content. "We will soon begin using synthetic data with users who opt into device analytics to improve email summaries," said Apple. Source: Apple Apple stated it will apply the same privacy-preserving techniques used for enhancing Genmoji to improve features such as Image Playground, Image Wand, Memories Creation, and Writing Tools within Apple Intelligence and Visual Intelligence. To enhance its Genmoji feature, Apple analyses popular prompts from users who opt to share analytics. "For example, understanding how our models perform when a user requests Genmoji that contain multiple entities (like "dinosaur in a cowboy hat") helps us improve the responses to those kinds of requests," said Apple. This process employs privacy-preserving techniques, ensuring that individual data remains confidential, devices are untraceable, and rare prompts stay undisclosed. "These techniques allow Apple to understand overall trends, without learning information about any individual, like what prompts they use or the content of their emails," said the company. Apple Intelligence is in dire need of improvement, especially in terms of generating accurate summaries. Last year, Apple came under fire for inaccurate AI-generated summaries of news articles, particularly from BBC News. Moreover, there is an entire subreddit called r/AppleIntelligenceFail, where users share some of the most confusing and out-of-context results derived from Apple Intelligence. Recently, Bloomberg reported that Mike Rockwell, the Apple Vision Pro creator, will replace John Giannandrea as the AI head. As per the reports, CEO Tim Cook had "lost confidence" in Giannandrea's ability to develop products. Furthermore, Apple also announced that the release of a more personalised version of Siri had been delayed until 2026. Last month, it was reported that the company was struggling to mitigate various bugs and engineering problems within Siri.
[12]
Apple Is Analysing User Data Patterns to Improve Its AI Features
Apple is developing new techniques to analyse user data patterns and aggregated insights to improve its artificial intelligence (AI) features. The Cupertino-based tech giant shared these differential privacy techniques on Monday, highlighting that these methods will not breach users' privacy. Instead, the company is focusing on gathering data such as usage trends and data embeddings to measure and improve its text generation tools and Genmoji. Notably, Apple said that this information will be taken only from those devices that have opted in to share Device Analytics. In a post on its Machine Learning Research domain, the iPhone maker detailed the new technique it is developing to improve some of the Apple Intelligence features. The tech giant's AI offerings have been underwhelming so far, and the company claims one of the reasons for that is its ethical practices around pretraining and sourcing data for its AI models. Apple claims that its generative AI models are trained on synthetic data (data that is created by other AI models or digital sources and not by any human). While this is still a fair way to train large language models (LLMs), as it does provide them with knowledge about the world, since the models are not learning from the human style of writing and presentation, the output could come off as bland and generic. This is also known as AI slop. To fix these issues and to improve the output quality of its AI models, the tech giant is now looking at the option to learn from user data without really looking into users' private data. Apple calls this technique "differential privacy." For Genmoji, Apple will use differentially private methods to identify popular prompts and prompt patterns from users who have opted in to share Device Analytics with the company. The iPhone maker says it will provide a mathematical guarantee that unique or rare prompts will not be discovered and that specific prompts cannot be linked to any individual. Collecting this information will help the company evaluate the types of prompts that are "most representative of a real user engagement." Essentially, Apple will be looking into the kind of prompts that lead to satisfactory output and where users repeatedly add prompts to get to the desired result. One example shared in the post included the models' performance in generating multiple entities. Apple plans to expand this approach for Image Playground, Image Wand, Memories Creation, and Writing Tools in Apple Intelligence, as well as in Visual Intelligence with future releases. Differential Privacy in Apple Intelligence's text generation feature Photo Credit: Apple Another key area where the tech giant is using this technique is text generation. The approach is somewhat different from the one used with Genmoji. To assess the capability of its tools in email generation, the company created a set of emails that cover common topics. For each topic, the company generated multiple variations and then derived representations of the emails, which included key dimensions such as language, topic, and length. Apple calls these embeddings. These embeddings were then sent to a small number of users that have opted in to Device Analytics. The synthetic embeddings were then matched against a sample of the users' emails. "As a result of these protections, Apple can construct synthetic data that is reflective of aggregate trends, without ever collecting or reading any user email content," the tech giant said. In essence, the company would not know the content of the emails but could still understand how people prefer their emails to be worded. Apple is currently using this method to improve text generation in emails, and says that in the future, it will also use the same approach for email summaries.
[13]
Apple Uses New Tech That Compares Synthetic Data With Real Emails To Train AI Models, Then Applies Embeddings And Privacy Tools To Improve Text Output Quality
Apple was supposed to release its highly anticipated Personalized Siri feature last month with the release of iOS 18.4. Still, it was later confirmed that the new utility would be delayed until next year. A new report has emerged that shares details on how Apple trains its AI models with reference to Apple Intelligence. Even though Apple officially stated that the Personalized Siri features will be delayed until next year, employees within the company are growing confident that the feature will be ready for launch later this year. In a new report, Bloomberg highlights how Apple trains its AI models for Apple Intelligence. The report cites a blog post from Apple's Machine Learning Research website, describing how Apple uses synthetic data to train its AI models. We have previously reported on several occasions that Apple is lagging behind in the AI race with its competitors, and the company's strategy to use synthetic data to train AI models is a bit unconventional and has limitations. For one, it is cumbersome for the data to "understand trends" when it comes to summarization or writing tools that require longer sentences or full-fledged emails. Apple took note of this and highlighted a new technology that will allow it to circumvent the limitations by comparing the synthetic data to a sample of recent user emails. However, the process does not compromise user privacy. To improve our models we need to generate a set of many emails that cover topics that are most common in messages. To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics. For example, we might create a synthetic message, "Would you like to play tennis tomorrow at 11:30AM?" This is done without any knowledge of individual user emails. We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length. These embeddings are then sent to a small number of user devices that have opted in to Device Analytics. Participating devices then select a small sample of recent user emails and compute their embeddings. Each device then decides which of the synthetic embeddings is closest to these samples. Using differential privacy, Apple can then learn the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device. These most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset. For example, if the message about playing tennis is one of the top embeddings, a similar message replacing "tennis" with "soccer" or another sport could be generated and added to the set for the next round of curation (see Figure 1). This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy. While the company is aware of the limitations, it explains that the new technology will allow it to better understand the overall trends without compromising user privacy or gathering information. Bloomberg also claims that the company will release the new technology in a new beta of iOS 18.5 and macOS 15.5. You can check out Apple's full post on the matter for more details.
[14]
Apple to Tap User Data for LLM Training | PYMNTS.com
Apple is planning to analyze user data to improve its large language model (LLM) software while upholding user privacy. The company has been using synthetic data to train its artificial intelligence (AI) models but has found that method to be ineffective, Apple wrote in a Monday (April 14) blog post. Now, Apple will still use synthetic data as a starting point but will compare the generated text to a sample of emails from participating user to determine which generated output best lines up with real-world messages. "Only users who have opted-in to send Device Analytics information to Apple participate," Apple said in the blog post. "The contents of the sampled emails never leave the device and are never shared with Apple. A participating device will send only a signal indicating which of the variants is closest to the sampled data on the device, and Apple learns which selected synthetic emails are most often selected across all devices." The new technique aims to improve text-related features from the Apple Intelligence platform, like summaries in notifications, the ability to synthesize thoughts in its Writing Tools and recaps of user messages.
[15]
How Apple Plans to Improve AI Using Synthetic User Data
Apple intends to improve the functioning of its artificial intelligence (AI) models by comparing the synthetic data it trains its models on to real-life data samples from its users, the company announced in a recent blog post. Synthetic data is data created to mimic the format and important properties of user data but does not contain any actual user data. This comes after Apple introduced Apple Intelligence, an AI-based messaging, email, audio, and webpage summary generation feature as part of the iOS 18.1.1 update. The feature notably made a lot of mistakes in summarisation; most notably, it created a fake news headline attributed to the BBC, following which the company suspended the service. Apple explains that to improve the quality of the synthetic data that its models work on, it generates a set of synthetic email topics that users commonly mention in their emails. It then derives a representation (or embedding) of these emails, which includes the key dimensions of the message, like the language, topic, and length of the message. Apple sends these embeddings to devices that have opted in for device analytics. Users who opt in to provide Apple access to device analytics allow the company to access details about hardware and operating system specifications, performance statistics, and data about how they use the device/apps. Apple compares the synthetic embeddings to embeddings from a small sample of recent user emails from devices that have opted in for device analytics. Such devices then select which synthetic embeddings are closest to real user email samples. Apple explains that it uses 'differential privacy' to learn the synthetic samples that most devices have selected as closest to real user samples, without learning which synthetic embedding was selected on any given device. This implies that the company does not get to learn which synthetic sample corresponds to a specific user's actual emails. Apple says that it uses the closest matching synthetic samples as training or testing data and can even run curation steps to further refine the data. The company specifies that the real-user email samples that it compares synthetic data with never leave the user's device, and Apple also never gets access to this information. As companies seek to compete in the race to make the biggest and best-performing AI models, access to readily available data to train these models is becoming a challenge. In November last year, Reuters reported that AI companies are hitting a scaling wall. This meant that making models bigger and feeding them more data was no longer providing proportional capability improvements, with access to data reportedly being one of the developers' key challenges. While using synthetic data, similar to what Apple is doing, makes sense in this case, some, like OpenAI founder Sam Altman, find reliance on synthetic data strange. "It's really strange if the best way to train a model was to just generate a quadrillion tokens of synthetic data and feed that back in. You'd say that somehow that seems inefficient, and there ought to be something where you can just learn more from the data as you're training," Altman mentioned in a live interview during the AI for Good Global Summit in June 2024. At the same time, he also admitted that the company was experimenting with generating and training on synthetic data. Discussing the quality of output from a model trained on synthetic data, Altman said that what was important was that the training data was of high quality. As such, if Apple can improve the quality of its synthetic training data through this comparative exercise, it should be able to improve the quality of Apple Intelligence outputs.
[16]
Your iPhone will now training Apple's AI, here's how it works
Apple is now adopting a new strategy to train its AI models while upholding its strong privacy commitments in order to enhance the AI-backed features. The company is altering how it uses artificial intelligence (AI) to improve tools like Siri and email summarisation in future software updates, according to a recent blog post by the Apple Machine Learning Research site and a Bloomberg report. Apple has historically trained its AI models using artificially generated content, or synthetic data. There are certain drawbacks, particularly when training models to perform complex tasks like long-form summarisation, even though this has removed the need to analyse actual user data. Also read: iPhone 16 Plus price drops by over 14,900 on Flipkart: How this deal works Apple has created a new privacy-preserving system to address this issue, which will use tiny samples of recent user emails on devices that have chosen to participate in Device Analytics. According to Apple, the procedure doesn't give it access to user identities or specific emails. Rather, it employs a technique known as embeddings, which represents emails according to their language, topic, and length. These are compared with synthetic messages on the device. Also read: OnePlus 12 price drops by Rs 19,001 on Amazon, here's how to grab this deal Without ever seeing the real emails or knowing which device chose what, Apple is able to determine which kinds of fake messages most closely resemble typical communication patterns thanks to differential privacy. In the end, this will enhance how AI features create or summarise content across all applications, including Mail and Notes, and assist Apple in improving its synthetic training data. This is expected to be released in the next beta versions of macOS 15.5, as well as iOS 18.5. The precise rollout timeline and other specifics of this "innovation," however, are still unknown.
Share
Share
Copy Link
Apple unveils a new strategy to enhance its AI models using differential privacy and synthetic data, aiming to improve features like email summaries and Genmoji without compromising user privacy.
Apple has unveiled an innovative approach to improve its AI models while maintaining its commitment to user privacy. The tech giant is addressing criticism over the performance of its AI products, particularly in areas like notification summaries, by implementing a new system that uses synthetic data and differential privacy 1.
At the core of Apple's new strategy is the use of synthetic data, which mimics the format and important properties of user data without containing any actual user-generated content. This approach allows Apple to train its AI models on a diverse range of topics and styles without accessing real user information 2.
Apple employs differential privacy techniques to ensure that the data collected cannot be linked to individual users. This method introduces randomized information into the broader dataset, preventing the identification of any single person's data 4.
The new system will be used to enhance various Apple Intelligence features, including:
Apple plans to roll out this new system in future beta versions of iOS 18.5 and macOS 15.5. Importantly, only users who have opted into the Device Analytics program will participate in this data collection process, ensuring user consent and control over their data usage 1.
Jason Hong, a computer science professor at Carnegie Mellon University, commends Apple's approach, stating that the company "should be applauded for putting their customers' privacy first." However, he also notes potential trade-offs, including the possibility that Apple's AI may not be as effective as competitors who have more direct access to user data 2.
Apple's new AI training strategy represents a significant step in balancing the need for improved AI performance with the company's longstanding commitment to user privacy. As the tech industry grapples with the ethical implications of AI development, Apple's approach could set a new standard for privacy-preserving AI advancement.
Reference
[1]
Apple's upcoming AI platform, Apple Intelligence, is set to launch with iOS 18, bringing new features to iPhones, iPads, and Macs. This article explores the platform's capabilities, rollout strategy, and how it compares to competitors.
23 Sources
23 Sources
Apple's rollout of Apple Intelligence, its AI suite, showcases a measured approach to AI integration. Despite initial limitations, it could normalize AI use and significantly impact user perceptions.
3 Sources
3 Sources
Apple's AI initiative, Apple Intelligence, encounters significant setbacks and delays, raising questions about the company's ability to compete in the rapidly advancing AI market.
5 Sources
5 Sources
Apple's recent iPhone 16 launch event introduced 'Apple Intelligence', their approach to AI integration. While the tech giant aims to revolutionize user experience, questions and skepticism arise about its implementation and impact.
7 Sources
7 Sources
Apple introduces on-device AI capabilities for iPhones, iPads, and Macs, promising enhanced user experiences while maintaining privacy. The move puts Apple in direct competition with other tech giants in the AI race.
6 Sources
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved