Curated by THEOUTPOST
On Thu, 18 Jul, 8:01 AM UTC
14 Sources
[1]
[Update] Apple, Salesforce break silence on claims they used 'swiped YouTube videos' to train AI
UPDATE: Jul. 18, 2024, 4:44 PM EDT Salesforce reached out to Mashable with a comment in response to Wired's report. A new report claimed that tech giants including Apple, Nvidia, Anthropic, and Salesforce used data from "thousands of YouTube videos" to train AI. The investigation, performed by Proof News and published on Wired, alleged that subtitles from 173,000 YouTube videos were swiped for the companies' AI models. Called "YouTube Subtitles," the dataset contains video transcripts from educational channels like Khan Academy, MIT, and Harvard, as well as the Wall Street Journal, NPR, and the BBC. Material from YouTube stars like PewDiePie, Marques Brownlee, and MrBeast were discovered, too. We haven't heard from Anthropic yet after reaching out for comment, but Apple and Salesforce has issued a response to Wired's report. Will Apple use this data for Apple Intelligence and other AI services? The short answer is no, but here's the longer response for those who don't identify with the "TLDR" crowd: In an email to Mashable, Apple said that its open-source language model, OpenELM, indeed used the dataset, but not in the way some may be thinking. The OpenELM project is a part of Apple's ongoing effort to benefit the broader research community. In other words, according to Apple, the OpenELM model was created for research purposes only and will not underpin any of Apple's machine learning-powered hardware or AI services, including Apple Intelligence. For the uninitiated, Apple Intelligence is the company's new suite of AI features, which were revealed at WWDC 2024 (Apple's annual event where the company spills the beans on what's to come with its software offerings, including iOS and iPadOS). Apple Intelligence, for example, can help summarize text, whether it's an email or text message, for quicker interactions with friends, loved ones, coworkers, and more. It will also underpin more entertainment-focused features like Genmoji, which generates new iOS emojis with a prompt. There's also Image Playground, which lets users create AI-generated images on the fly. When it comes to AI utilities for its consumers, Apple highlighted that it offers websites an option to opt out of having their content used for AI training. Apple assured that its generative models are built and fine-tuned using high-quality data, including licensed content from publishers and stock image companies, alongside publicly available data on the web. To put it succinctly, Apple doesn't deny that its open-source language model, OpenELM, used the dataset, but wants to make clear that it will not underpin any of its AI services, including Apple Intelligence. Salesforce claims academic-based usage In an email to Mashable, Salesforce also offered its side of the story: "The Pile dataset referred to in the research paper was used to train an AI model in 2021 for academic and research purposes," a Salesforce rep said. "The dataset was publicly available and released under a permissive license." What does Nvidia have to say? We also reached out to Nvidia for comment, but the company, known for bringing AI to many of its gaming hardware and services, declined to issue a statement. We will update this article if we hear anything from Anthropic.
[2]
Apple breaks silence on claims it used 'swiped YouTube videos' to train AI
A new report claimed that tech giants, including Apple, Nvidia, Anthropic, and Salesforce, used data from "thousands of YouTube videos" to train AI. The investigation, performed by Proof News and published on Wired, alleged that subtitles from 173,000 YouTube videos were swiped for the companies' AI models. Called "YouTube Subtitles," the dataset contains video transcripts from educational channels like Khan Academy, MIT, and Harvard, as well as the Wall Street Journal, NPR, and the BBC. Material from YouTube stars like PewDiePie, Marques Brownlee, and MrBeast were discovered, too. We haven't heard from Anthropic and Salesforce yet (we reached out for comment), but Apple has issued a response to Wired's report. The short answer is no, but here's the longer response for those who don't identify with the "TLDR" crowd: In an email to Mashable, Apple said that its open-source language model, OpenELM, indeed used the dataset, but not in the way some may be thinking. The OpenELM project is a part of Apple's ongoing effort to benefit the broader research community. In other words, according to Apple, the OpenELM model was created for research purposes only, making it clear that it will not underpin any of its machine learning-powered hardware nor AI services, including Apple Intelligence. For the uninitiated, Apple Intelligence is the company's new suite of AI features, which were revealed at WWDC 2024 (Apple's annual event where it spills the beans on what's to come with its software offerings, including iOS and iPadOS). Apple Intelligence, for example, can help summarize text, whether it's an email or text message, for quicker interactions with friends, loved ones, co-workers, and more. It will also underpin more entertainment-focused features like Genmoji, which generates new iOS emojis with a prompt. There's also Image Playground, which lets users create AI-generated images on the fly. When it comes to AI utilities that does service its consumers, Apple highlighted that it offers websites an option to opt out of having their content used for AI training. Apple assured that its generative models are built and fine-tuned using high-quality data, including licensed content from publishers and stock image companies, alongside publicly available data on the web. To put it succinctly, Apple doesn't deny that its open-source language model, OpenELM, used the dataset, but wants to make clear that it will not underpin any of its AI services, including Apple Intelligence. We also reached out to Nvidia for comment, but the company, known for bringing AI to many of its gaming hardware and services, declined to issue a statement. We will update this article if we hear anything from Anthropic and Salesforce.
[3]
Apple breaks silence on claims it used 'swiped YouTube videos' to train AI
The Cupertino-based tech giant clears up recent accusations. A new report claimed that tech giants, including Apple, Nvidia, Anthropic, and Salesforce, used data from "thousands of YouTube videos" to train AI. The investigation, performed by Proof News and published on Wired, alleged that subtitles from 173,000 YouTube videos were swiped for the companies' AI models. Called "YouTube Subtitles," the dataset contains video transcripts from educational channels like Khan Academy, MIT, and Harvard, as well as the Wall Street Journal, NPR, and the BBC. Material from YouTube stars like PewDiePie, Marques Brownlee, and MrBeast were discovered, too. We haven't heard from Anthropic and Salesforce yet (we reached out for comment), but Apple has issued a response to Wired's report. Will Apple use this data for Apple Intelligence and other AI services? The short answer is no, but here's the longer response for those who don't identify with the "TLDR" crowd: In an email to Mashable, Apple said that its open-source language model, OpenELM, indeed used the dataset, but not in the way some may be thinking. The OpenELM project is a part of Apple's ongoing effort to benefit the broader research community. In other words, according to Apple, the OpenELM model was created for research purposes only, making it clear that it will not underpin any of its machine learning-powered hardware nor AI services, including Apple Intelligence. For the uninitiated, Apple Intelligence is the company's new suite of AI features, which were revealed at WWDC 2024 (Apple's annual event where it spills the beans on what's to come with its software offerings, including iOS and iPadOS). Apple Intelligence, for example, can help summarize text, whether it's an email or text message, for quicker interactions with friends, loved ones, co-workers, and more. It will also underpin more entertainment-focused features like Genmoji, which generates new iOS emojis with a prompt. There's also Image Playground, which lets users create AI-generated images on the fly. When it comes to AI utilities that does service its consumers, Apple highlighted that it offers websites an option to opt out of having their content used for AI training. Apple assured that its generative models are built and fine-tuned using high-quality data, including licensed content from publishers and stock image companies, alongside publicly available data on the web. To put it succinctly, Apple doesn't deny that its open-source language model, OpenELM, used the dataset, but wants to make clear that it will not underpin any of its AI services, including Apple Intelligence. What does Nvidia have to say? We also reached out to Nvidia for comment, but the company, known for bringing AI to many of its gaming hardware and services, declined to issue a statement. We will update this article if we hear anything from Anthropic and Salesforce.
[4]
Apple Intelligence Not Trained on YouTube Content, Says Apple
Apple on Thursday addressed concerns about its use of AI training data, following an investigation that revealed Apple, along with other major tech companies, had used YouTube subtitles to train their artificial intelligence models. The investigation by Wired earlier this week reported that over 170,000 videos from popular content creators were part of a dataset used to train AI models. Apple specifically used this dataset in the development of its open-source OpenELM models, which were made public in April. However, Apple has now confirmed to 9to5Mac that OpenELM does not power any of its AI or machine learning features, including the company's Apple Intelligence system. Apple clarified that OpenELM was created solely for research purposes, with the aim of advancing open-source large language model development. On releasing OpenELM on the Hugging Face Hub, a community for sharing AI code, Apple researchers described it as a "state-of-the-art open language model" that had been designed to "empower and enrich the open research community." The model is also available through Apple's Machine Learning Research website. Apple has stated that it has no plans to develop new versions of the OpenELM model. The company emphasized that since OpenELM is not integrated into Apple Intelligence, the "YouTube Subtitles" dataset is not being used to power any of its commercial AI features. Apple reiterated its previous statement that Apple Intelligence models are trained on "licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler." The Wired report detailed how companies including Apple, Anthropic, and NVIDIA had used the "YouTube Subtitles" dataset for AI model training. This dataset is part of a larger collection known as "The Pile," which is compiled by the non-profit organization EleutherAI.
[5]
Apple confirms that Apple Intelligence has not been trained with YouTube videos - Softonic
With the presentation of Apple Intelligence during the last WWDC 2024, Apple claimed that Apple Intelligence models are trained with licensed data, including data selected to improve specific features, as well as publicly available data collected by your web crawler. This claim was questioned earlier this week, and now Apple is providing a defense. A few days ago, an investigation revealed that certain companies allegedly used YouTube subtitles to train their artificial-intelligence models. Not directly, but through a dataset compiled by a non-profit organization called EleutherAI. With or without knowledge of the fact, the report concluded that, therefore, Anthropic, Nvidia, SalesForce, Apple, and others had used the content of over 170,000 videos by popular creators such as MKBHD and Mr. Beast, generating considerable controversy regarding the ethics and legality of their methods. However, Apple has made an important clarification regarding the use of this data. Apple confirmed to 9to5Mac that its OpenELM model, although trained with this data, is not used to power any of the features of its artificial-intelligence suite, known as Apple Intelligence. According to the company, OpenELM was created solely for research purposes, to contribute to the scientific community and the development of open-source language models. The OpenELM model was published as open-source and is still widely available, including on Apple's Machine Learning research website. This allows researchers from around the world to access and use this model in their own research projects. Alongside this statement, Apple confirmed that they have no plans to develop new versions of the OpenELM model. As far as we know, this model has already fulfilled its purpose and will become less relevant as the rest of Apple Intelligence products evolve without it. The whole event is undoubtedly a reminder of the complexity and ethical challenges that companies face in the era of big data and AI. The reliability and performance of an artificial intelligence depend to a large extent on the dataset used for its training. A careful and measured selection, as we see Apple doing, is definitely the way to go to achieve product efficiency and quality.
[6]
Apple Intelligence wasn't trained on YouTube videos
Apple has refuted using unethically obtained data to train Apple Intelligence -- but it has acknowledged its use for another project. On Tuesday, it was learned that an AI research lab called EleutherAI had harvested subtitles from YouTube videos without express permission from the creators. It also gathered data from Wikipedia, the English Parliament, and Enron staff emails. The data was then added to a dataset called "the Pile." EleutherAI notes that its goal was to lower the barrier to AI development for those outside Big Tech. However, companies such as Nvidia, Salesforce, and Apple have all used the Pile to train various AI projects. Now, Apple has spoken out, saying that while it had used the Pile, the dataset was not used for Apple Intelligence. Instead, it was used to train its open-source OpenELM models, which it released in April. Apple has since confirmed to AppleInsider that OpenELM models don't power any of its AI or machine learning features. Instead, the tech giant claims that it created OpenELM to contribute to the research community. It also notes that OpenELM models were never intended to be used for Apple Intelligence. It also says it has no plans to build any new versions of the OpenELM model. Apple has repeatedly claimed that its sources for its artificial intelligence projects are ethical, and it's known to have paid millions to publishers, and licensed images from photo library firms.
[7]
Apple refutes claim it trained Apple Intelligence on stolen data -- here's what we know
Apple has clarified its position amid reports it used stolen YouTube video data to train Apple Intelligence. The company said that while it had used the data in the past, none of it was used to train Apple Intelligence. Recently it was revealed that an AI research lab called EleutherAI had harvested subtitles from YouTube videos without the creators' consent. This data was then combined with data from Wikipedia, the U.K. Parliament and Enron Staff emails and added to a dataset called "the Pile." Apple had been accused of using the Pile's data to train Apple Intelligence alongside companies like Nvidia and Salesforce. Recently, Apple stated that, while it has used the dataset in the past, it was only to train the OpenELM models it released in April. Apple then confirmed to Apple Insider that OpenELM models do not power any AI or machine learning features and were created to contribute to the research community. Apple has stated that OpenELM models were never intended to be used for Apple Intelligence and has no plans to build any new versions of the OpenELM model. Apple has repeatedly claimed that it only uses ethical sources for its artificial intelligence projects, including paying millions to publishers and licensing images from photo library firms. Apple Intelligence aims to revolutionize Apple's products with major changes coming to Siri including the ability to maintain conversation context, making it feel more natural. Added to this is the news that Siri will be smart enough to understand and take action around the app you have on the screen at any time, which includes several new in-app actions. However, these features won't all release at once and some won't be seen until at least 2025. AI production has shot up recently, leading to some concern about how data is gathered. The news that so many companies have used the Pile is concerning, but it is good to know that Apple is mostly focused on working with ethically sourced data. For more information about everything coming with the next generation of Apple software then check out our full breakdown of Apple's WWDC presentation.
[8]
Amidst Ethical AI Controversy, Apple Clarifies Apple Intelligence Was Not Trained By Transcribed YouTube Videos
Earlier this week, we covered the ethical and infringement concerns raised by the tech community over big companies using transcribed YouTube videos to train their AI models without obtaining the consent of the content creators. OpenAI, Meta, and Google received criticism for violating the rules regarding the independent use of YouTube videos. Apple also recently found itself in hot waters for the claim that it steals content for its OpenELM model. The Cupertino tech giant has now responded to the controversy by presenting its side of the story and clearing the air regarding unethical practices for training its LLM model. Google, Meta, and OpenAI have received backlash for using subtitles harvested from more than 170,000 videos of well-known YouTubers to train their AI models. An earlier report highlighted that Apple also uses transcribed YouTube content for its OpenELM model, joining other companies in the highlighted ongoing unethical AI practices. The company has now come to its defense and shed some light on the matter. As reported by 9toMac, Apple has confirmed to the channel that the OpenELM model is not linked to its other AI initiatives. Apple Intelligence and its LLM models are trained through licensed data. Apple has explained to users and the tech community that the OpenELM was part of a research initiative, and the company used the Pile dataset to train its open-source model. It was created to showcase its open language model development to the public by making it readily available on Apple's Machine Learning Research site. Apple further clarifies that the OpenELM model released in April is not linked to Apple Intelligence or its AI-powered features. The tech giant also expressed that it had no plans to release any versions of the OpenELM and that it was specifically a research contribution. However, Apple Intelligence is claimed to rely on completely ethical practices for training, with millions being paid to publishers and for licensed data. Apple explained this in detail in a research paper and the heed it places on responsible AI development. Big companies, however, have been using the YouTube Subtitles datasets of the non-profit organization EleutherAI to train their AI models and raising serious concerns regarding permission, ethical AI, and copyright infringements. Even though Apple Intelligence is not part of the ongoing controversy, all the big companies should be more transparent with their material extraction techniques and AI training methods to avoid becoming part of such issues.
[9]
Apple Intelligence wasn't trained using data taken without permission, company claims
Apple has come forward to deny a report that Apple Intelligence may have been unwittingly trained using data taken from YouTube without permission after it was found that data collected by EleutherAI included YouTube subtitles from thousands of YouTube videos. In a report earlier this week, it was revealed that an AI training dataset generated by EleutherAI had been created using subtitles from 174,536 YouTube videos from over 48,000 channels. The data, taken without permission and in violation of YouTube's rules, was then picked up by companies including Apple, which reportedly used it to train its OpenELM model. As you can imagine, the idea that Apple Intelligence might have been built using stolen YouTube content caused quite a stir, but Apple has issued a staunch rebuttal of the claims. According to 9to5Mac, Apple has confirmed "that OpenELM doesn't power any of its AI or machine learning features - including Apple Intelligence." Rather, OpenELM is a research project that's not used to power any of the new features coming to iPhone, iPad, and Mac later this year. Apple did confirm that Apple Intelligence had been trained "on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler," but not the pilfered YouTube subtitles it was unwittingly presented by EleutherAI. Understandably, YouTubers such as MKBHD have been angered by the outcome of the investigation. MKBHD released a video on July 17 to explain the situation so far. Although OpenELM has reportedly been using the transcripts of subtitles to help train some AI models, it's believed that this method is still a violation of YouTube's terms of service. With Apple distancing itself from the findings, let's highlight the benefits of what Apple Intelligence will bring later this year as a beta for U.S. users. These collections of AI features, set to debut with iOS 18, macOS Sequoia, and iPadOS 18 later this year, are meant to help you in various ways. For instance, Siri, Apple's assistant that first debuted in 2011 with the iPhone 4S, is getting an upgrade. OpenAI's ChatGPT will be integrated into Siri, allowing you to ask the AI model detailed queries, such as creating a 3-course meal or an in-depth fitness plan. The improved Siri will learn about you. For example, if you've forgotten about an event you spoke to your friend about, you can ask Siri what time the event starts. In addition to Apple's Photos app being redesigned in iOS 18 and iPadOS 18, a few Apple Intelligence features will also be available. The 'Clean Up' feature allows you to remove any objects in a photo, and 'Memory Movies' will use Apple's AI to create a movie with a few prompts. It's important to note, however, that recent reports suggest not all of the Apple Intelligence features showcased at WWDC in June will be made available later this year. This could include the updates to Siri. Nevertheless, we're hoping that AI features like those in the Photos app will still be available for U.S. users later this year.
[10]
Apple isn't using YouTube data in Apple Intelligence
After a report revealed that numerous companies relied in part on YouTube video transcription data to train their AIs, Apple is stepping forward to clarify its use of and plans for OpenELM, which was trained on the controversial Pile data. Apple contacted TechRadar after reading the report detailing how the company that provided Pile, EleutherAI, apparently used the YouTube Subtitles data set, an act that would be counter to the social video platform's data use policies. While not speaking directly to the issue of YouTube data, Apple reiterated its commitment to the rights of creators and publishers and added that it does offer websites the ability to opt out of their data being used to train Apple Intelligence, which Apple unveiled during WWDC 2024 and is expected to arrive in iOS 18. The company also confirmed that it trains its models, including those for its upcoming Apple Intelligence, using high-quality data that includes licensed data from publishers, stock images, and some publicly available data from the web. YouTube's transcription data is not intended to be a public resource but it's not clear if it's fully hidden from view. Apple also builds research models and that's essentially what OpenELM is, a tool for learning more about language models. In a paper on OpenELM (PDF), researchers note that they did train it on Pile data. Apple says, however, that OpenELM is for research purposes only and it's not used to power AI features in any Apple devices, which would include, among other things, the best iPhones, best iPads, and best Macs. What's more, it appears OpenELM's moment in the sun is almost done. Apple told us it has no plans to build future versions of the model. While all this may offer some solace to the YouTube creators (including TechRadar) whose data was scrapped for Pile and used in, among other models, Apple's OpenELM, it does not address the fact that EleutherAI apparently did the scraping without YouTube or the creators' permission and then handed it to companies like Apple. What remains to be seen is what YouTube does next. For now, though, Apple's made it clear that it was one and done with OpenELM and that data will never be a part of Apple Intelligence.
[11]
Apple says its OpenELM model doesn't power Apple Intelligence amid YouTube controversy - 9to5Mac
Earlier this week, an investigation detailed that Apple and other tech giants had used YouTube subtitles to train their AI models. This included over 170,000 videos from the likes of MKBHD, Mr. Beast, and more. Apple then used this dataset to train its open-source OpenELM models, which were released back in April. Apple has now confirmed to 9to5Mac, however, that OpenELM doesn't power any of its AI or machine learning features - including Apple Intelligence. Apple says that it created the OpenELM model as a way of contributing to the research community and advancing open source large language model development. In the past, Apple researchers have described OpenELM as a "state-of-the-art open language model." According to Apple, OpenELM was created only for research purposes, not for use to power any of its Apple Intelligence features. The model was published open-source and is widely available, including on Apple's Machine Learning Research website. Because OpenELM isn't used as part of Apple Intelligence, this means the "YouTube Subtitles" dataset isn't used to power Apple Intelligence. In the past, Apple has said that Apple Intelligence models were trained "on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler." Finally, Apple also tells me that it has no plans to build any new versions of the OpenELM model. As Wired reported earlier this week, companies including Apple, Anthropic, and NVIDIA all used this "YouTube Subtitles" dataset to train their AI models. This dataset is part of a larger collection called "The Pile," from the non-profit EleutherAI.
[12]
Apple Denies Using YouTube Videos To Train Its AI Features After Report Claims Tim Cook's Company Used Creators' Content Without Consent - Apple (NASDAQ:AAPL)
Apple Inc. AAPL has stated that its OpenELM model is not used to power any of its AI or machine learning features, including Apple Intelligence. This clarification comes amid reports suggesting the use of YouTube subtitles for training. What Happened: In a statement to 9to5Mac on Thursday, Apple confirmed that the OpenELM model, launched in April, was developed solely for research purposes and to contribute to the open-source large language model development community. Apple's statement follows an investigation that revealed the use of over 170,000 YouTube subtitles by Apple and other tech companies to train their AI models. See Also: Apple Rolls Out First iOS 18 Public Beta With Home Screen Customization And More: Here's How You Can Get It Now However, Apple has now clarified that the OpenELM model, accessible on Apple's Machine Learning Research website, does not power any of its Apple Intelligence features. Apple has previously stated that its Intelligence models were trained "on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web crawler." Subscribe to the Benzinga Tech Trends newsletter to get all the latest tech developments delivered to your inbox. The "YouTube Subtitles" dataset, part of a larger collection called "The Pile" from non-profit EleutherAI, is not used to power Apple Intelligence. Moreover, Apple has no plans to develop any new versions of the OpenELM model, the company added. Why It Matters: The issue of tech companies using YouTube videos to train their AI models without creators' consent has been a contentious one. Apple was accused of such practices, with Tech YouTuber Marques Brownlee, or MKBHD, voicing concerns about Apple's use of YouTube content for AI training. Furthermore, AI startups like OpenAI and Anthropic have been accused of ignoring web scraping rules, leading to controversies. This led to platforms like Reddit Inc. updating their policies to block automated website scraping. Check out more of Benzinga's Consumer Tech coverage by following this link. Read Next: Production Delays Force Apple To Skip TSMC's 2nm Chip For iPhone 17: Report Disclaimer: This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors. Photo courtesy: Apple Market News and Data brought to you by Benzinga APIs
[13]
Apple Trained Its AI on YouTube Transcripts Without Permission, Report Says
YouTube creators were unaware that tech companies were using transcripts of their content to train AI systems. An investigation from Proof News claims some of the world's largest tech companies, including Apple and Nvidia, are training AI systems with YouTube video transcripts without creators' permission. The report, which includes a search tool to determine if a YouTube channel is in the dataset, says "subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Silicon Valley heavyweights, including Anthropic, Nvidia, Apple and Salesforce." Some of the YouTube channels included in the dataset are of late-night shows such as The Late Show with Stephen Colbert and Jimmy Kimmel Live as well as content from popular YouTube personalities including MrBeast, tech reviewer Marques Brownlee and PewDiePie. Proof News said that the dataset was part of a compilation called the Pile that came from a nonprofit, EleutherAI. In a 2020 research paper, the nonprofit described the Pile as containing 22 separate datasets. Apple, Anthropic, Nvidia and EleutherAI didn't immediately respond to requests for comment. In an email to CNET, a spokesperson from Google said the company stands by its previous statements on the subject, linking to a Bloomberg article from April. In the article, Google CEO Neal Mohan said he doesn't know if OpenAI did in fact use YouTube videos to train its text-to-video generator, but that if it did, that is a violation of the platform's terms of service. He didn't address whether Google itself used the videos in this way. While AI continues to be a key technology pursued by tech titans including Apple, Google, Microsoft, Meta and IBM, evolving the technology requires feeding AI models gigantic amounts of data. Leaders in the space, including OpenAI, have acknowledged that it's getting harder and harder to find datasets to train AI systems. That has led OpenAI, the creator of ChatGPT, to negotiate deals with content companies, including News Corp. and Reddit, in order to acquire content to feed the AI systems. The information in the report, however, suggests that tech companies such as Apple and Nvidia may be gobbling up datasets containing information that, at least in spirit, doesn't align with what content creators would expect from a platform like YouTube, which ostensibly prohibits data mining of videos or transcripts of videos. A spokesperson for Anthropic, a public benefit AI startup, told Proof News that it uses the Pile to train its AI assistant Claude and said, "The Pile includes a very small subset of YouTube subtitles." Spokesperson Jennifer Martinez said, "YouTube's terms cover direct use of its platform, which is distinct from use of The Pile dataset. On the point about potential violations of YouTube's terms of service, we'd have to refer you to The Pile authors." As the report points out, Google itself has been taken to task for mining YouTube content. The company told the New York Times that its agreement with content creators allows for YouTube content to be used for AI training.
[14]
Apple denies using YouTube videos for training Apple Intelligence: Report
Apple has clarified that its artificial intelligence features that the company collectively calls Apple Intelligence is not powered by the company's OpenELM AI model. According to a report by 9To5Mac, the cupertino-based technology giant, in a statement to the media outlet stated that "OpenELM doesn't power any of its AI or machine learning features - including Apple Intelligence."
Share
Share
Copy Link
Apple and Salesforce have responded to allegations that they used YouTube videos without permission to train their AI models. Both companies deny these claims, stating that their AI systems were not trained on such content.
Apple has broken its silence regarding claims that it used YouTube videos without permission to train its AI models. The tech giant firmly stated that its AI system, known as Apple Intelligence, was not trained on YouTube content 1. This response comes amid growing concerns about the ethical use of data in AI development.
An Apple spokesperson emphasized, "Apple Intelligence has not been trained on YouTube videos or any other video content from the internet" 2. The company's clear denial aims to address the allegations and maintain transparency about its AI training practices.
Salesforce, another tech company implicated in these claims, has also denied using YouTube videos for AI training. A Salesforce representative stated, "We have not used YouTube data to train our AI models" 1. This unified front from both companies suggests a strong stance against the accusations.
The allegations stem from a broader conversation about AI companies potentially using copyrighted material without permission. YouTube CEO Neal Mohan had previously expressed concerns about AI companies "siphoning" content from the platform 3. This led to speculation about which companies might be engaging in such practices.
Apple's response sheds light on its approach to AI development. The company clarified that Apple Intelligence is designed to leverage on-device processing and cloud-based machine learning 4. This approach aligns with Apple's long-standing commitment to user privacy and data protection.
These denials from Apple and Salesforce highlight the growing scrutiny faced by AI companies regarding their data sources and training methods. As AI technology continues to advance, questions about ethical data usage and intellectual property rights are becoming increasingly important 5.
The controversy surrounding the use of online content for AI training is likely to continue. It raises important questions about the balance between technological advancement and respect for intellectual property. As the AI industry evolves, companies may need to be more transparent about their training data sources to maintain public trust and comply with emerging regulations.
Reference
[1]
Major tech companies, including Apple, Nvidia, and Anthropic, are facing allegations of using thousands of YouTube videos to train their AI models without proper authorization, sparking controversy and frustration among content creators.
27 Sources
27 Sources
Apple's efforts to train its AI models using web content are meeting opposition from prominent publishers. The company's web crawler, Applebot, has been increasingly active, raising concerns about data usage and copyright issues.
3 Sources
3 Sources
A leaked document suggests that Runway, a Google-backed AI startup, may have used publicly available YouTube videos and copyrighted content to train its Gen-3 AI video generation tool without proper authorization.
4 Sources
4 Sources
Apple unveils a new strategy to enhance its AI models using synthetic data and differential privacy, aiming to improve features like email summaries while protecting user privacy.
22 Sources
22 Sources
Apple has reportedly opted for Google's Tensor Processing Units (TPUs) instead of Nvidia's GPUs for its AI training needs. This decision marks a significant shift in the tech industry's AI hardware landscape and could have far-reaching implications for future AI developments.
7 Sources
7 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved