Amazon Found High Volume of CSAM in AI Training Data, Raising Questions About Safeguards

3 Sources

Share

Amazon reported hundreds of thousands of suspected child sexual abuse material cases found in its AI training data to NCMEC in 2025, accounting for the vast majority of over 1 million AI-related reports. Child safety officials express concern over Amazon's lack of origin details, which prevents law enforcement from identifying perpetrators and protecting victims.

Amazon Dominates AI-Related CSAM Reports to NCMEC

Amazon detected and reported hundreds of thousands of suspected child sexual abuse material cases in its AI training data throughout 2025, according to a Bloomberg investigation

1

. The National Center for Missing and Exploited Children received more than 1 million AI-related CSAM reports last year, with the vast majority coming from Amazon

2

. This marks a dramatic 15-fold increase from the 67,000 AI-related reports NCMEC received in 2024 and just 4,700 in 2023

3

. The scale of Amazon's reporting stands in stark contrast to other major tech companies, which collectively submitted only "a handful of reports" during the same period.

Source: Engadget

Source: Engadget

Lack of Origin Details Hampers Law Enforcement Efforts

Child safety officials have identified "glaring differences" between Amazon and its peers in how they handle these discoveries. While Amazon removed the content before training its AI models, the company has not provided information about the source of the material, potentially hindering law enforcement from finding perpetrators and protecting victims

1

. An Amazon spokesperson stated that the training data was obtained from external sources and the company doesn't have details about its origin that could aid investigators. "This is really an outlier," said Fallon McNulty, executive director of NCMEC's CyberTipline. "Having such a high volume come in throughout the year begs a lot of questions about where the data is coming from, and what safeguards have been put in place."

2

McNulty noted that reports from other companies included actionable data for law enforcement, while Amazon's reports have proved "inactionable."

Scanning Foundation Model Training Data Reveals High Volume of CSAM

Amazon employs a detection tool that uses "hashing" to compare content against a database of known child abuse material involving real victims. The company stated that approximately 99.97% of the reports resulted from scanning "non-proprietary training data"

3

. In an emailed statement, an Amazon spokesperson said the company "takes a deliberately cautious approach to scanning foundation model training data, including data from the public web, to identify and remove known child sexual abuse material and protect our customers." The company indicated it intentionally uses an over-inclusive threshold for scanning, which yields a high percentage of false positives, believing it over-reported cases to avoid accidentally missing something. As of January, Amazon stated it is "not aware of any instances" of its AI models generating child sexual abuse material, and none of its reports to NCMEC were of AI-generated content.

Source: Bloomberg

Source: Bloomberg

Risks in AI Development and the Fast-Moving AI Race

The spike in Amazon's reports coincides with a fast-moving AI race that has companies scrambling to acquire and ingest huge volumes of datasets to improve their models

3

. This rush has complicated the work of child safety officials who are struggling to keep up with changing technology and challenged regulators tasked with safeguarding AI from abuse. AI safety experts warn that quickly amassing large datasets without proper safeguards comes with grave risks. Training AI models on illegal and exploitative content raises concerns about shaping a model's underlying behaviors, potentially improving its ability to digitally alter and sexualize photos of real children or create entirely new images that never existed. It also raises the threat of continuing the circulation of images that models were trained on, re-victimizing children who have suffered abuse.

Broader Context of AI-Related CSAM Reports

The AI-related reports represent a fraction of the total reports submitted to NCMEC. In 2024, NCMEC received more than 20 million reports from across the industry, with most coming from Meta Platforms subsidiaries including Facebook, Instagram, and WhatsApp

3

. Not all reports are ultimately confirmed as containing CSAM. The category of AI-related CSAM reports can include AI-generated photos and videos, sexually explicit conversations with AI chatbots, or photos of real victims collected unintentionally during efforts to improve AI models. Recent months have seen safety questions for minors emerge as a critical concern for the artificial intelligence industry. OpenAI and Character.AI have both faced lawsuits after teenagers planned their suicides using those companies' platforms, while Meta is being sued for alleged failures to protect teen users from sexually explicit conversations with chatbots

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo