Reddit Sues Perplexity and Data Scrapers in AI Content Battle

Reviewed byNidhi Govil

24 Sources

Share

Reddit files a lawsuit against AI search engine Perplexity and three data scraping companies for allegedly harvesting its user-generated content without permission. The case highlights the growing value of quality data in AI training and the legal challenges in the industry.

Reddit Initiates Legal Battle Against AI Scrapers

Reddit, the prominent social media platform, has filed a lawsuit against AI search engine Perplexity and three data scraping companies for allegedly illicitly harvesting user-generated content for AI model training

1

2

. This federal court action in Manhattan underscores a significant escalation in the ongoing conflict over data rights in the AI industry. The lawsuit specifically names Perplexity AI, alongside data scraping firms Oxylabs UAB (Lithuania), AWMProxy (formerly a Russian botnet), and SerpApi (Texas)

3

. Reddit claims these entities circumvented its defenses to collect data from its platform and Google search results, subsequently using or selling it for AI development

2

.

Source: Digit

Source: Digit

The Growing Value of AI Training Data

Ben Lee, Reddit's chief legal officer, highlighted the surging demand for quality human-generated content in AI training, describing it as an "arms race" fueling an "industrial-scale data laundering economy"

4

. Reddit's extensive user discussions make it a prime target for scrapers aiming to improve AI models

5

. The lawsuit alleges violations of the US Digital Millennium Copyright Act (DMCA), unfair competition, unjust enrichment, and civil conspiracy, seeking an injunction and unspecified damages

2

4

.

Source: Financial Times News

Source: Financial Times News

Perplexity's Defense and Broader Implications

Perplexity has denied Reddit's allegations, stating it only summarizes and cites public Reddit discussions, not trains AI models on the content

5

. The company accused Reddit of "extortion" and opposition to an open internet. This lawsuit is part of a wider trend of copyright disputes where content owners are challenging AI firms over unauthorized use of material for large language model training

2

3

. Reddit has proactively monetized its content through licensing deals with major AI companies like Google and OpenAI, providing controlled access for training and generating revenue

5

. The outcome of this case could establish crucial precedents for user-generated content in AI development, raising important questions about the balance between open access and content rights in the digital age.

Source: AP NEWS

Source: AP NEWS

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo