OpenAI Accidentally Deletes Potential Evidence in Copyright Lawsuit with The New York Times

13 Sources

Share

OpenAI faces challenges in a copyright lawsuit as it accidentally erases crucial data during the discovery process, leading to delays and complications in the legal battle with The New York Times and Daily News.

News article

OpenAI's Accidental Data Deletion Complicates Copyright Lawsuit

In a significant development in the ongoing copyright lawsuit against OpenAI, the artificial intelligence company has accidentally deleted potential evidence, causing delays and complications in the legal proceedings. The New York Times and Daily News, plaintiffs in the case, have reported that OpenAI engineers inadvertently erased crucial data during the discovery process

1

.

The Incident and Its Implications

On November 14, 2024, OpenAI engineers erased programs and search result data stored on one of the dedicated virtual machines provided for the plaintiffs' counsel to perform searches for copyrighted content in OpenAI's training datasets

2

. While OpenAI managed to recover much of the data, the folder structure and file names were irretrievably lost, rendering the recovered data unusable for determining where the plaintiffs' copied articles were used in building OpenAI's models

3

.

Conflicting Accounts and Technical Challenges

The incident has led to conflicting accounts from both parties. The plaintiffs' counsel claims that over 150 hours of work has been lost, necessitating the recreation of their work from scratch

4

. OpenAI, however, denies deleting any evidence and attributes the issue to a system misconfiguration requested by the plaintiffs, which led to technical problems

2

.

Fair Use Defense and Licensing Deals

At the core of this legal battle is OpenAI's assertion that training AI models using publicly available data, including articles from The New York Times and Daily News, constitutes fair use

5

. The company maintains that it is not required to license or pay for the examples used in training its models, even if it profits from them

3

.

Broader Implications for AI and Copyright

This case highlights the complex intersection of AI technology and copyright law. As AI companies continue to develop large language models trained on vast amounts of data, questions about fair use, compensation, and the rights of content creators remain at the forefront of legal and ethical discussions in the tech industry

5

.

Moving Forward

The accidental deletion of data has underscored the challenges in conducting discovery in such technologically complex cases. The plaintiffs argue that this incident demonstrates that OpenAI is best positioned to search its own datasets for potentially infringing content

4

. As the legal proceedings continue, the outcome of this case could have significant implications for the future of AI development and copyright law in the digital age.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo