OpenAI Accused of Training GPT-4o on Copyrighted O'Reilly Books Without Permission

AI Watchdog Accuses OpenAI of Copyright Infringement

A new study by the AI Disclosures Project, a nonprofit co-founded by Tim O'Reilly and Ilan Strauss, has accused OpenAI of training its GPT-4o model on copyrighted O'Reilly Media books without permission 1

. The research, which used a method called DE-COP, suggests that OpenAI's latest model demonstrates strong recognition of paywalled O'Reilly book content compared to earlier models 1

Study Methodology and Findings

The researchers used 13,962 paragraph excerpts from 34 O'Reilly books to probe GPT-4o, GPT-3.5 Turbo, and other OpenAI models 1

. The study found that GPT-4o "recognized" far more paywalled O'Reilly book content than older models, even after accounting for potential confounding factors 1

Implications and Industry Trends

This accusation comes amid ongoing debates about AI companies' use of copyrighted material for training purposes. OpenAI has been advocating for looser restrictions on developing models using copyrighted data 2

. The company has some content licensing deals in place but faces several lawsuits over its training data practices 1

Broader Copyright Concerns in AI Training

A separate study by researchers from the University of Washington, the University of Copenhagen, and Stanford proposed a new method for identifying training data "memorized" by models 2

. This study suggested that GPT-4 showed signs of having memorized portions of popular fiction books and New York Times articles 2

Industry Response and Future Implications

The findings highlight the need for increased transparency regarding pre-training data sources and the development of formal licensing frameworks for AI content training 3

. There are concerns that failure to adequately compensate creators could lead to a decline in internet content quality and diversity 3

OpenAI's Position and Industry Trends

OpenAI has been seeking higher-quality training data and has hired experts in various domains to fine-tune its models' outputs 1

. The company has also urged the US government to relax copyright restrictions to facilitate AI model training 3

As the AI industry grapples with these issues, some companies are introducing measures to protect copyrighted material. For instance, Cloudflare has developed an AI-powered system designed to deter unauthorized web scraping 3

OpenAI Accused of Training GPT-4o on Copyrighted O'Reilly Books Without Permission

AI Watchdog Accuses OpenAI of Copyright Infringement

Study Methodology and Findings

Implications and Industry Trends

Broader Copyright Concerns in AI Training

Industry Response and Future Implications

OpenAI's Position and Industry Trends

References

Researchers suggest OpenAI trained AI models on paywalled O'Reilly books | TechCrunch

OpenAI's models 'memorized' copyrighted content, new study suggests | TechCrunch

Study suggests OpenAI isn't waiting for copyright exemption

An AI Watchdog accused OpenAI of using copyrighted books without permission

OpenAI might have trained its AI on stolen books

Related Stories

Former OpenAI Researcher Condemns Company's Data Practices, Alleging Copyright Violations

AI Giants Heavily Rely on Premium Publisher Content for LLM Training, Raising Copyright Concerns

OpenAI and Google Push for Relaxed Copyright Laws in AI Development

Weekly Highlights

Google Unveils Gemini 3 AI Model with Record-Breaking Performance and New Coding IDE

Nvidia Reports Record $57B Revenue as CEO Dismisses AI Bubble Concerns

Microsoft Transforms Windows 11 Into 'Agentic OS' with AI Agents That Work in Background

Weekly Highlights

Today's Top Stories

Figure AI Faces Whistleblower Lawsuit Over Allegedly Dangerous Humanoid Robots

OpenAI CEO Sam Altman Warns Staff of 'Rough Vibes' as Google's AI Advances Challenge Company's Leadership

Larry Page Surpasses Jeff Bezos as World's Third Richest Person Following Google's Gemini 3 AI Launch

Tech Giants Turn to Bond Markets as AI Infrastructure Spending Triggers Market Concerns