3 Sources
3 Sources
[1]
Amazon Found 'High Volume' Of Child Sex Abuse Material in AI Training Data
Amazon.com Inc. reported hundreds of thousands of pieces of content last year that it believed included child sexual abuse, which it found in data gathered to improve its artificial intelligence models. Though Amazon removed the content before training its models, child safety officials said the company has not provided information about its source, potentially hindering law enforcement from finding perpetrators and protecting victims. Throughout last year, Amazon detected the material in its AI training data and reported it to the National Center for Missing and Exploited Children, or NCMEC. The organization, which was established by Congress to field tips about child sexual abuse and share them with law enforcement, recently started tracking the number of reports specifically tied to AI products and their development. In 2025, NCMEC saw at least a fifteen-fold increase in these AI-related reports, with "the vast majority" coming from Amazon. The findings haven't been previously reported. An Amazon spokesperson said the training data was obtained from external sources, and the company doesn't have the details about its origin that could aid investigators. It's common for companies to use data scraped from publicly available sources, such as the open web, to train their AI models. Other large tech companies have also scanned their training data and reported potentially exploitative material to NCMEC. However, the clearinghouse pointed to "glaring differences" between Amazon and its peers. The other companies collectively made just "a handful of reports," and provided more detail on the origin of the material, a top NCMEC official said. In an emailed statement, the Amazon spokesperson said that the company is committed to preventing child sexual abuse material across all of its businesses. "We take a deliberately cautious approach to scanning foundation model training data, including data from the public web, to identify and remove known [child sexual abuse material] and protect our customers," the spokesperson said. The spike in Amazon's reports coincides with a fast moving AI race that has left companies large and small scrambling to acquire and ingest huge volumes of data to improve their models. But that race has also complicated the work of child safety officials -- who are struggling to keep up with the changing technology -- and challenged regulators tasked with safeguarding AI from abuse. AI safety experts warn that quickly amassing large datasets without proper safeguards comes with grave risks. Amazon accounted for most of the more than 1 million AI-related reports of child sexual abuse material submitted to NCMEC in 2025, the organization said. It marks a jump from the 67,000 AI-related reports that came from across the tech and media industry a year prior, and just 4,700 in 2023. This category of AI-related reports can include AI-generated photos and videos, or sexually explicit conversations with AI chatbots. It can also include photos of real victims of sexual abuse that were collected, even unintentionally, in an effort to improve AI models. Training AI on illegal and exploitative content raises newfound concerns. It could risk shaping a model's underlying behaviors, potentially improving its ability to digitally alter and sexualize photos of real children or create entirely new images of sexualized children that never existed. It also raises the threat of continuing the circulation of the images that models were trained on -- re-victimizing children who have suffered abuse. The Amazon spokesperson said that, as of January, the company is "not aware of any instances" of its models generating child sexual abuse material. None of its reports submitted to NCMEC were of AI-generated material, the spokesperson added. Instead, the content was flagged by an automatic detection tool that compared it against a database of known child abuse material involving real victims, a process called "hashing." Approximately 99.97% of the reports resulted from scanning "non-proprietary training data," the spokesperson said. Amazon believes it over-reported these cases to NCMEC to avoid accidentally missing something. "We intentionally use an over-inclusive threshold for scanning, which yields a high percentage of false positives," the spokesperson added. The AI-related reports received last year are just a fraction of the total number submitted to NCMEC. The larger category of reports also includes suspected child sexual abuse material sent in private messages or uploaded to social media feeds and the cloud. In 2024, for example, NCMEC received more than 20 million reports from across industry, with most coming from Meta Platforms Inc. subsidiaries Facebook, Instagram and WhatsApp. Not all reports are ultimately confirmed as containing child sexual abuse material, referred to with the acronym CSAM. Still, the volume of suspected CSAM that Amazon detected across its AI pipeline in 2025 stunned child safety experts interviewed by Bloomberg News. The hundreds of thousands of reports made to NCMEC marked a drastic surge for the company. In 2024, Amazon and all of its subsidiaries made a total of 64,195 reports. "This is really an outlier," said Fallon McNulty, the executive director of NCMEC's CyberTipline, the entity to which US-based social media platforms, cloud providers and other companies are legally required to report suspected CSAM. "Having such a high volume come in throughout the year begs a lot of questions about where the data is coming from, and what safeguards have been put in place." McNulty, speaking in an interview, said she has little visibility into what's driving the surge of sexually exploitative material in Amazon's initial training data sets. Amazon has provided "very little to almost no information" in their reports about where the illicit material originally came from, who had shared it, or if it remains actively available on the internet, she said. While Amazon is not required to share this level of detail, the lack of information makes it impossible for NCMEC to track down the material's origin and work to get it removed, McNulty said. It also limits relevant law enforcement agencies tasked with searching for sex offenders and children in active danger. "There's nothing then that can be done with those reports," she said. "Our team has been really clear with [Amazon] that those reports are inactionable." When asked why the company didn't disclose information about the possible origin of the material, or other key details, the Amazon spokesperson replied, "because of how this data is sourced, we don't have the data that comprises an actionable report." The spokesperson did not explain how the third-party data was sourced or why the company did not have sufficient information to create actionable reports. "While our proactive safeguards cannot provide the same detail in NCMEC reports as consumer-facing tools, we stand by our commitment to responsible AI and will continue our work to prevent CSAM," the spokesperson said. NCMEC, a nonprofit, receives funding both from the US government and private industry. Amazon is among its funders and holds a corporate seat on its board. "There should be more transparency on how companies are gathering and analyzing the data to train their models -- and how they're training them," said David Thiel, the former chief technologist at the Stanford Internet Observatory, who has researched the prevalence of child sexual abuse material in AI training data. Such data can be licensed, purchased or scraped from the internet, or could be so-called synthetic data, which is text or images created by other AI tools. As AI companies seek to release new models quickly, "the rapid gathering of data is a much higher priority than doing safety analyses," Thiel said. He warned that there are "always some errors" when it comes to sifting out CSAM from training data, and believes the industry needs to be more open about where its data is coming from. Amazon's Bedrock offering, which gives customers access to various AI models so they can build their own AI products, includes automated detection for known CSAM and rejects and reports positive matches. The company's consumer-facing generative AI products also allow users to report content that escapes its controls. The Seattle-based tech giant scans for CSAM across its other businesses, too, including its consumer photo storage service. Amazon's cloud computing division, Amazon Web Services, also removes CSAM when it's discovered on the web services it hosts. McNulty said AWS submitted far fewer reports than came from Amazon's AI efforts. Amazon declined to break out specific reporting data across its various business units, but noted it would share broad data in March. Only recently have technology companies really begun to scrutinize their AI models and training data for CSAM, said David Rust-Smith, a data scientist at Thorn, a nonprofit organization that provides tools to companies, including Amazon, to detect the exploitative material. "There's definitely been a big shift in the last year of people coming to us asking for help cleaning data sets," said Rust-Smith. He noted that "some of the biggest players" have sought to apply Thorn's detection tools to their training data, but declined to speak about any individual company. Amazon did not use Thorn's technology to scan its training data, the spokesperson confirmed. Rust-Smith said AI-focused companies are approaching Thorn with a newfound urgency. "People are learning what we already knew, which is, if you hoover up a ton of the internet, you're going to get [child sexual abuse material]," he said. Amazon was not the only company to spot and report potential CSAM from its AI workflows last year. Alphabet Inc.'s Google and OpenAI told Bloomberg News that they scan AI training data for exploitative material -- a process that has surfaced potential CSAM, which the companies then reported to NCMEC. Meta and Anthropic PBC said they, too, search training data for CSAM. Meta did not comment on whether it had identified the material, but said it would report to NCMEC if it did. Anthropic said it has not reported such material out of its training data. Meta and Google said that they've taken efforts to ensure that reports related to their AI workflows are distinguishable from those generated by others parts of their business. McNulty said that, with the exception of Amazon, the AI-related reports it received last year came in "really, really small volumes," and included key details that allowed the clearinghouse to pass on actionable information to law enforcement. "Simply flagging that you came across something but not providing any type of actionable detail doesn't help the larger child safety space," McNulty said.
[2]
Amazon discovered a 'high volume' of CSAM in its AI training data but isn't saying where it came from
The National Center for Missing and Exploited Children said it received more than 1 million reports of AI-related child sexual abuse material (CSAM) in 2025. The "vast majority" of that content was reported by Amazon, which found the material in its training data, according to an investigation by Bloomberg. In addition, Amazon said only that it obtained the inappropriate content from external sources used to train its AI services and claimed it could not provide any further details about where the CSAM came from. "This is really an outlier," Fallon McNulty, executive director of NCMEC's CyberTipline, told Bloomberg. The CyberTipline is where many types of US-based companies are legally required to report suspected CSAM. "Having such a high volume come in throughout the year begs a lot of questions about where the data is coming from, and what safeguards have been put in place." She added that aside from Amazon, the AI-related reports the organization received from other companies last year included actionable data that it could pass along to law enforcement for next steps. Since Amazon isn't disclosing sources, McNulty said its reports have proved "inactionable." "We take a deliberately cautious approach to scanning foundation model training data, including data from the public web, to identify and remove known [child sexual abuse material] and protect our customers," an Amazon representative said in a statement to Bloomberg. The spokesperson also said that Amazon aimed to over-report its figures to NCMEC in order to avoid missing any cases. The company said that it removed the suspected CSAM content before feeding training data into its AI models. Safety questions for minors have emerged as a critical concern for the artificial intelligence industry in recent months. CSAM has skyrocketed in NCMEC's records; compared with the more than 1 million AI-related reports the organization received last year, the 2024 total was 67,000 reports while 2023 only saw 4,700 reports. In addition to issues such as abusive content being used to train models, AI chatbots have also been implicated in several dangerous or tragic cases involving young users. OpenAI and Character.AI have both been sued after teenagers planned their suicides with those companies' platforms. Meta is also being sued for alleged failures to protect teen users from sexually explicit conversations with chatbots.
[3]
Amazon found 'high volume' of child sex abuse material in AI training data, center says
Amazon reported hundreds of thousands of pieces of content last year that it believed included child sexual abuse, which it found in data gathered to improve its artificial intelligence models. Though Amazon removed the content before training its models, child safety officials said the company has not provided information about its source, potentially hindering law enforcement from finding perpetrators and protecting victims. Throughout last year, Amazon detected the material in its AI training data and reported it to the National Center for Missing and Exploited Children. The organization, which was established by Congress to field tips about child sexual abuse and share them with law enforcement, recently started tracking the number of reports specifically tied to AI products and their development. In 2025, NCMEC saw at least a 15-fold increase in these AI-related reports, with "the vast majority" coming from Amazon. The findings haven't been previously reported. An Amazon spokesperson said the training data was obtained from external sources, and the company doesn't have the details about its origin that could aid investigators. It's common for companies to use data scraped from publicly available sources, such as the open web, to train their AI models. Other large tech companies have also scanned their training data and reported potentially exploitative material to NCMEC. However, the clearinghouse pointed to "glaring differences" between Amazon and its peers. The other companies collectively made just "a handful of reports," and provided more detail on the origin of the material, a top NCMEC official said. In an emailed statement, the Amazon spokesperson said that the company is committed to preventing child sexual abuse material across all of its businesses. "We take a deliberately cautious approach to scanning foundation model training data, including data from the public web, to identify and remove known (child sexual abuse material) and protect our customers," the spokesperson said. The spike in Amazon's reports coincides with a fast-moving AI race that has left companies large and small scrambling to acquire and ingest huge volumes of data to improve their models. But that race has also complicated the work of child safety officials -- who are struggling to keep up with the changing technology -- and challenged regulators tasked with safeguarding AI from abuse. AI safety experts warn that quickly amassing large data sets without proper safeguards comes with grave risks. Amazon accounted for most of the more than 1 million AI-related reports of child sexual abuse material submitted to NCMEC in 2025, the organization said. It marks a jump from the 67,000 AI-related reports that came from across the tech and media industry a year prior, and just 4,700 in 2023. This category of AI-related reports can include AI-generated photos and videos, or sexually explicit conversations with AI chatbots. It can also include photos of real victims of sexual abuse that were collected, even unintentionally, in an effort to improve AI models. Training AI on illegal and exploitative content raises newfound concerns. It could risk shaping a model's underlying behaviors, potentially improving its ability to digitally alter and sexualize photos of real children or create entirely new images of sexualized children that never existed. It also raises the threat of continuing the circulation of the images that models were trained on -- revictimizing children who have suffered abuse. The Amazon spokesperson said that, as of January, the company is "not aware of any instances" of its models generating child sexual abuse material. None of its reports submitted to NCMEC were of AI-generated material, the spokesperson added. Instead, the content was flagged by an automatic detection tool that compared it against a database of known child abuse material involving real victims, a process called "hashing." Approximately 99.97% of the reports resulted from scanning "nonproprietary training data," the spokesperson said. Amazon believes it overreported these cases to NCMEC to avoid accidentally missing something. "We intentionally use an over-inclusive threshold for scanning, which yields a high percentage of false positives," the spokesperson added. The AI-related reports received last year are just a fraction of the total number submitted to NCMEC. The larger category of reports also includes suspected child sexual abuse material sent in private messages or uploaded to social media feeds and the cloud. In 2024, for example, NCMEC received more than 20 million reports from across the industry, with most coming from Meta Platforms subsidiaries Facebook, Instagram and WhatsApp. Not all reports are ultimately confirmed as containing child sexual abuse material, referred to with the acronym CSAM. Still, the volume of suspected CSAM that Amazon detected across its AI pipeline in 2025 stunned child safety experts interviewed by Bloomberg News. The hundreds of thousands of reports made to NCMEC marked a drastic surge for the company. In 2024, Amazon and all of its subsidiaries made a total of 64,195 reports. "This is really an outlier," said Fallon McNulty, the executive director of NCMEC's CyberTipline, the entity to which U.S.-based social media platforms, cloud providers and other companies are legally required to report suspected CSAM. "Having such a high volume come in throughout the year begs a lot of questions about where the data is coming from, and what safeguards have been put in place." McNulty, speaking in an interview, said she has little visibility into what's driving the surge of sexually exploitative material in Amazon's initial training data sets. Amazon has provided "very little to almost no information" in its reports about where the illicit material originally came from, who had shared it or if it remains actively available on the internet, she said. While Amazon is not required to share this level of detail, the lack of information makes it impossible for NCMEC to track down the material's origin and work to get it removed, McNulty said. It also limits relevant law enforcement agencies tasked with searching for sex offenders and children in active danger. "There's nothing then that can be done with those reports," she said. "Our team has been really clear with (Amazon) that those reports are inactionable." When asked why the company didn't disclose information about the possible origin of the material, or other key details, the Amazon spokesperson replied, "because of how this data is sourced, we don't have the data that comprises an actionable report." The spokesperson did not explain how the third-party data was sourced or why the company did not have sufficient information to create actionable reports. "While our proactive safeguards cannot provide the same detail in NCMEC reports as consumer-facing tools, we stand by our commitment to responsible AI and will continue our work to prevent CSAM," the spokesperson said. NCMEC, a nonprofit, receives funding both from the U.S. government and private industry. Amazon is among its funders and holds a corporate seat on its board. "There should be more transparency on how companies are gathering and analyzing the data to train their models -- and how they're training them," said David Thiel, the former chief technologist at the Stanford Internet Observatory, who has researched the prevalence of child sexual abuse material in AI training data. Such data can be licensed, purchased or scraped from the internet, or could be so-called synthetic data, which is text or images created by other AI tools. As AI companies seek to release new models quickly, "the rapid gathering of data is a much higher priority than doing safety analyses," Thiel said. He warned that there are "always some errors" when it comes to sifting out CSAM from training data, and believes the industry needs to be more open about where its data is coming from. Amazon's Bedrock offering, which gives customers access to various AI models so they can build their own AI products, includes automated detection for known CSAM and rejects and reports positive matches. The company's consumer-facing generative AI products also allow users to report content that escapes its controls. The Seattle-based tech giant scans for CSAM across its other businesses, too, including its consumer photo storage service. Amazon's cloud computing division, Amazon Web Services, also removes CSAM when it's discovered on the web services it hosts. McNulty said AWS submitted far fewer reports than came from Amazon's AI efforts. Amazon declined to break out specific reporting data across its various business units, but noted it would share broad data in March. Only recently have technology companies really begun to scrutinize their AI models and training data for CSAM, said David Rust-Smith, a data scientist at Thorn, a nonprofit organization that provides tools to companies, including Amazon, to detect the exploitative material. "There's definitely been a big shift in the last year of people coming to us asking for help cleaning data sets," said Rust-Smith. He noted that "some of the biggest players" have sought to apply Thorn's detection tools to their training data, but declined to speak about any individual company. Amazon did not use Thorn's technology to scan its training data, the spokesperson confirmed. Rust-Smith said AI-focused companies are approaching Thorn with a newfound urgency. "People are learning what we already knew, which is, if you hoover up a ton of the internet, you're going to get (child sexual abuse material)," he said. Amazon was not the only company to spot and report potential CSAM from its AI workflows last year. Alphabet's Google and OpenAI told Bloomberg News that they scan AI training data for exploitative material -- a process that has surfaced potential CSAM, which the companies then reported to NCMEC. Meta and Anthropic said they, too, search training data for CSAM. Meta did not comment on whether it had identified the material, but said it would report to NCMEC if it did. Anthropic said it has not reported such material out of its training data. Meta and Google said that they've taken efforts to ensure that reports related to their AI workflows are distinguishable from those generated by other parts of their business. McNulty said that, with the exception of Amazon, the AI-related reports it received last year came in "really, really small volumes," and included key details that allowed the clearinghouse to pass on actionable information to law enforcement. "Simply flagging that you came across something but not providing any type of actionable detail doesn't help the larger child safety space," McNulty said. Bloomberg's Alexandra S. Levine contributed.
Share
Share
Copy Link
Amazon reported hundreds of thousands of suspected child sexual abuse material cases found in its AI training data to NCMEC in 2025, accounting for the vast majority of over 1 million AI-related reports. Child safety officials express concern over Amazon's lack of origin details, which prevents law enforcement from identifying perpetrators and protecting victims.
Amazon detected and reported hundreds of thousands of suspected child sexual abuse material cases in its AI training data throughout 2025, according to a Bloomberg investigation
1
. The National Center for Missing and Exploited Children received more than 1 million AI-related CSAM reports last year, with the vast majority coming from Amazon2
. This marks a dramatic 15-fold increase from the 67,000 AI-related reports NCMEC received in 2024 and just 4,700 in 20233
. The scale of Amazon's reporting stands in stark contrast to other major tech companies, which collectively submitted only "a handful of reports" during the same period.
Source: Engadget
Child safety officials have identified "glaring differences" between Amazon and its peers in how they handle these discoveries. While Amazon removed the content before training its AI models, the company has not provided information about the source of the material, potentially hindering law enforcement from finding perpetrators and protecting victims
1
. An Amazon spokesperson stated that the training data was obtained from external sources and the company doesn't have details about its origin that could aid investigators. "This is really an outlier," said Fallon McNulty, executive director of NCMEC's CyberTipline. "Having such a high volume come in throughout the year begs a lot of questions about where the data is coming from, and what safeguards have been put in place."2
McNulty noted that reports from other companies included actionable data for law enforcement, while Amazon's reports have proved "inactionable."Amazon employs a detection tool that uses "hashing" to compare content against a database of known child abuse material involving real victims. The company stated that approximately 99.97% of the reports resulted from scanning "non-proprietary training data"
3
. In an emailed statement, an Amazon spokesperson said the company "takes a deliberately cautious approach to scanning foundation model training data, including data from the public web, to identify and remove known child sexual abuse material and protect our customers." The company indicated it intentionally uses an over-inclusive threshold for scanning, which yields a high percentage of false positives, believing it over-reported cases to avoid accidentally missing something. As of January, Amazon stated it is "not aware of any instances" of its AI models generating child sexual abuse material, and none of its reports to NCMEC were of AI-generated content.
Source: Bloomberg
Related Stories
The spike in Amazon's reports coincides with a fast-moving AI race that has companies scrambling to acquire and ingest huge volumes of datasets to improve their models
3
. This rush has complicated the work of child safety officials who are struggling to keep up with changing technology and challenged regulators tasked with safeguarding AI from abuse. AI safety experts warn that quickly amassing large datasets without proper safeguards comes with grave risks. Training AI models on illegal and exploitative content raises concerns about shaping a model's underlying behaviors, potentially improving its ability to digitally alter and sexualize photos of real children or create entirely new images that never existed. It also raises the threat of continuing the circulation of images that models were trained on, re-victimizing children who have suffered abuse.The AI-related reports represent a fraction of the total reports submitted to NCMEC. In 2024, NCMEC received more than 20 million reports from across the industry, with most coming from Meta Platforms subsidiaries including Facebook, Instagram, and WhatsApp
3
. Not all reports are ultimately confirmed as containing CSAM. The category of AI-related CSAM reports can include AI-generated photos and videos, sexually explicit conversations with AI chatbots, or photos of real victims collected unintentionally during efforts to improve AI models. Recent months have seen safety questions for minors emerge as a critical concern for the artificial intelligence industry. OpenAI and Character.AI have both faced lawsuits after teenagers planned their suicides using those companies' platforms, while Meta is being sued for alleged failures to protect teen users from sexually explicit conversations with chatbots2
.Summarized by
Navi
[2]
1
Policy and Regulation

2
Policy and Regulation

3
Technology
