5 Sources
5 Sources
[1]
Google is using YouTube videos to train its AI video generator
The tech company is turning to its catalog of 20 billion YouTube videos to train these new-age AI tools, according to a person who was not authorized to speak publicly about the matter. Google confirmed to CNBC that it relies on its vault of YouTube videos to train its AI models, but the company said it only uses a subset of its videos for the training and that it honors specific agreements with creators and media companies. "We've always used YouTube content to make our products better, and this hasn't changed with the advent of AI," said a YouTube spokesperson in a statement. "We also recognize the need for guardrails, which is why we've invested in robust protections that allow creators to protect their image and likeness in the AI era -- something we're committed to continuing." Such use of YouTube videos has the potential to lead to an intellectual property crisis for creators and media companies, experts said. While YouTube says it has shared this information previously, experts who spoke with CNBC said it's not widely understood by creators and media organizations that Google is training its AI models using its video library. YouTube didn't say how many of the 20 billion videos on its platform or which ones are used for AI training. But given the platform's scale, training on just 1% of the catalog would amount to 2.3 billion minutes of content, which experts say is more than 40 times the training data used by competing AI models. The company shared in a blog post published in September that YouTube content could be used to "improve the product experience ... including through machine learning and AI applications." Users who have uploaded content to the service have no way of opting out of letting Google train on their videos. "It's plausible that they're taking data from a lot of creators that have spent a lot of time and energy and their own thought to put into these videos," said Luke Arrigoni, CEO of Loti, a company that works to protect digital identity for creators. "It's helping the Veo 3 model make a synthetic version, a poor facsimile, of these creators. That's not necessarily fair to them." CNBC spoke with multiple leading creators and IP professionals, none were aware or had been informed by YouTube that their content could be used to train Google's AI models.
[2]
YouTube creators unaware Google uses their videos to train AI
A hot potato: When it comes to tech companies training their AI models, it seems everything is fair game. Google, for example, uses some of the billions of videos on YouTube to train Gemini and Veo 3, and many creators are unaware that it's happening. With more than 20 billion videos on the platform, YouTube is a treasure trove of data for AI companies to exploit - and many already have. YouTube owner Google is also using the content to train its AI models, reports CNBC. The company later confirmed that it does do this, but it only uses a subset of videos and that it honors specific agreements with creators and media companies. "We've always used YouTube content to make our products better, and this hasn't changed with the advent of AI," said a YouTube spokesperson in a statement. YouTube admitted that there was a need for safeguards in this area, which is why it has invested in protections to allow creators to protect their image and likeness. But many experts point out that most creators and companies don't know that Google is training its models on their content. There's also no way for people to opt out of having their creations used this way. The report notes that the size of YouTube's video library means that even if just 1% of the videos are used for training purposes, that amounts to 2.3 billion minutes of content, which is more than 40 times greater than the training data used by competing AI models, according to experts. The situation has become more relevant since Google announced its Veo 3 video model that can create incredibly realistic video clips. As with many industries, the irony is that the content people create is being used to train an AI that could eventually replace them, or at least impact their income in what is a competitive market. Some creators take a different point of view; they're using or planning to use Veo 3 to create content, even if it has been trained on their own original work. There have been cases of other companies using YouTube to train their AIs without creators' knowledge. It was reported last year that OpenAI has transcribed over a million hours of YouTube videos to train its LLMs. Nvidia did the same thing, and at one point was scraping 80 years of videos daily - the company argued this was in "the spirit of copyright law." Anthropic, Apple, and Salesforce also turned to YouTube for their AI training data. Google now allows creators to opt out of third-party training from AI companies such as Amazon and Nvidia, but there's no option to stop Google from doing the same.
[3]
YouTubers Surprised That Google Uses Their Videos to Train AI Models
Google is reportedly using its expansive library of YouTube videos to train its AI models like Gemini and Veo 3, shocking many content creators. Last month, Google launched its latest AI video model Veo 3, positioning it as one of the most advanced AI video generators on the market and stunning viewers with the incredibly impressive synthetic videos generated from it. According to a report by CNBC, Google is tapping into YouTube's library of 20 billion videos to train its AI models. The news outlet cited a source not authorized to speak publicly about the matter. Google later confirmed to CNBC that it does use YouTube videos to train its AI, but says it only relies on a subset of content and adheres to specific agreements with creators and media partners. "We've always used YouTube content to make our products better, and this hasn't changed with the advent of AI," a YouTube spokesperson tells the news outlet in a statement. "We also recognize the need for guardrails, which is why we've invested in robust protections that allow creators to protect their image and likeness in the AI era -- something we're committed to continuing." While YouTube says it has previously disclosed this practice, many creators and media organizations remain unaware that Google is using its video library to train AI models, according to CNBC. Creators interviewed by the outlet say they had not been informed or consulted, and were caught off guard by the revelation. When uploading a video to the platform, the user is agreeing that YouTube has a broad license to the content. But while YouTube allows creators to opt out of third-party AI training by companies like Amazon, Apple, and Nvidia, they reportedly cannot prevent Google from using their videos to train its own models. YouTube has not disclosed how many -- or which -- of its 20 billion videos are used for AI training. However, experts cited by CNBC note that even using just 1% of the library would provide around 2.3 billion minutes of footage -- more than 40 times the volume of training data reportedly used by some rival AI models. Digital rights advocates argue that years of work by YouTube creators are being used to develop AI systems without their consent or compensation. One example is Vermillio, which has developed a tool called Trace ID to detect similarities between AI-generated videos and original content. In some cases, the tool has found matches exceeding 90%. "We've seen a growing number of creators discover fake versions of themselves circulating across platforms -- new tools like Veo 3 are only going to accelerate the trend," Dan Neely, CEO of Vermillio, which helps individuals protect their likeness from being misused and also facilitates secure licensing of authorized content, tells CNBC. The latest news comes after Disney and Universal filed a joint lawsuit against generative AI company Midjourney, accusing it of widespread copyright infringement.
[4]
Google Used YouTube Videos to Train Veo 3, Creators Unaware: Report | AIM
YouTube confirms training the model using YouTube data, but only a subset and not in its entirety. Google is utilising the extensive library of videos uploaded on its platform, YouTube, to train Veo 3, the company's latest AI-enabled video generation tool. Google confirmed this to CNBC, stating that it uses only a subset and not the entirety of YouTube videos to train its AI models. The report added that several creators are "concerned that they may be unknowingly" helping train the model, which could "eventually compete with/replace them", and multiple leading creators were neither aware nor informed by YouTube that their content could be used to train the Veo models. While YouTube allows creators to opt out of providing their content for training to third-party AI companies, they cannot prevent Google from doing so. Third-party companies, including Apple, Anthropic, Amazon, Meta, and Microsoft, can use YouTube's data to train their generative AI models. Veo 3 is the company's third-generation AI video generation model and the latest one. It was announced last month at Google's I/O 2025 and can generate eight-second videos. Moreover, Veo 3 can add audio to the generated videos -- a capability not present in several competing platforms, including OpenAI's Sora. It was initially available in the United States alone, but the company has since expanded access to over 70 countries. Users on Google Gemini's Pro plan, which costs $20 a month, get limited access to the model. However, the higher-end Gemini Ultra plan, which is $249 per month, provides the highest usage for Veo 3.
[5]
Is Google Using YouTube's Video Library for AI Training?
Inside Google's Gemini and Veo 3: AI Models Powered by Billions of YouTube Videos During the 2025 Google I/O developer conference, Google introduced Veo 3. This innovation has been recognized as the most advanced AI video-generating system to date. It can create cinematic, "entertainment-quality" videos that include both audio and dialogue, marking a significant advancement in the field of generative AI. According to reports from CNBC, the model was trained on an extensive collection of YouTube videos, which has raised concerns among content creators.
Share
Share
Copy Link
Google confirms using YouTube videos to train its AI models, including Veo 3, raising concerns about intellectual property rights and creator consent.
Google has confirmed that it is using its vast library of YouTube videos to train artificial intelligence models, including its latest video generation tool, Veo 3. This revelation has sparked controversy and raised concerns about intellectual property rights and creator consent in the AI era
1
2
.Source: CNBC
With over 20 billion videos on YouTube, Google's AI training dataset is potentially massive. Even if only 1% of the catalog is used, it would amount to 2.3 billion minutes of content, which experts say is more than 40 times the training data used by competing AI models
1
. Google has stated that it only uses a subset of videos for training and honors specific agreements with creators and media companies3
.Many YouTube creators and media organizations were unaware that their content could be used to train Google's AI models. CNBC reported that multiple leading creators and IP professionals they spoke with had not been informed by YouTube about this practice
1
. This lack of transparency has led to concerns about the potential impact on creators' livelihoods and intellectual property rights2
.While YouTube allows creators to opt out of third-party AI training from companies like Amazon and Nvidia, there is currently no option to prevent Google from using their videos to train its own models
2
. When uploading content to YouTube, users agree to grant the platform a broad license to their content3
. However, the extent to which this license covers AI training has become a point of contention.Source: TechSpot
The use of creator content for AI training raises questions about fair compensation and the future of the creator economy. Some experts argue that years of work by YouTube creators are being used to develop AI systems without their consent or compensation
3
. There are concerns that AI models like Veo 3 could eventually compete with or replace human creators in certain aspects of content creation4
.Related Stories
Google has acknowledged the need for safeguards in this area and claims to have invested in protections to allow creators to protect their image and likeness in the AI era
2
. However, the company has not disclosed specific details about how many or which videos are used for AI training1
.Source: Analytics India Magazine
This issue is not unique to Google. Other tech giants like OpenAI, Nvidia, Anthropic, Apple, and Salesforce have also reportedly used YouTube content for AI training
2
. The practice has led to legal challenges in some cases, such as the recent lawsuit filed by Disney and Universal against AI company Midjourney for alleged copyright infringement3
.As AI technology continues to advance, the debate over the use of publicly available content for AI training is likely to intensify. The situation highlights the need for clearer guidelines and regulations surrounding AI development and the use of creator content in the rapidly evolving digital landscape.
Summarized by
Navi
[4]
[5]
1
Business and Economy
2
Business and Economy
3
Policy and Regulation