3 Sources
[1]
AI scrapers would be forced to ask permission under bill
If it passes, the law would redefine the boundaries of fair use A bipartisan pair of US Senators introduced a bill this week that would protect copyrighted content from being used for AI training without the owner's permission. Content creators from large media companies to individual bloggers could effectively block Google, Meta, OpenAI, Anthropic, and others from appropriating their work. If passed into law, the AI Accountability and Personal Data Protection Act [PDF] from Senators Josh Hawley (R-MO) and Richard Blumenthal (D-CT) would add a new federal tort allowing individuals to sue companies that use copyrighted works or personally identifiable information to train AI without the owner's express prior consent. Arguably the most important question in the media industry today is whether AI companies' use of copyrighted training materials constitutes "fair use," a legal shield against infringement claims. Fair use allows third parties to use copyrighted works for criticism, news reporting, commentary, and research. AI makers claim that training their models is protected by this doctrine and some courts have agreed. Last month, a group of authors lost in court when a judge accepted Anthropic's claim that the company has the right to use their books to train Claude AI, all without compensation or permission. That kind of thing doesn't seem to sit well with Hawley. "AI companies are robbing the American people blind while leaving artists, writers, and other creators with zero recourse," the Republican Senator noted in a press release. "My bipartisan legislation would finally empower working Americans who now find their livelihoods in the crosshairs of Big Tech's lawlessness." The AI Accountability and Data Protection Act's text does not mention fair use. However, it does present both personally identifiable information and copyrighted material as types of "covered data" that require the data owner's prior consent to be used for training. Blumenthal, a frequent legislative partner of Hawley's, agreed with his take, noting that AI safeguards are urgently needed. "Consumers must be given rights and remedies -- and legal tools to make them real -- not relying on government enforcement alone," Blumenthal added in the press release. The bill spells out what it considers to be express prior consent, and those rules are strict, too. AI vendors have to clearly inform individuals of what their data is being used for and who will have access to it. Companies have to ask for consent explicitly, and can't tie it to the usability of a product if said data collection isn't reasonably necessary. Consent requests can't be mixed into other agreements, and they can't just link out to a full explanation, either - it's all gotta be stated up front to meet the terms of this legislation. The bill also proposes to make illegal any arbitration agreements that prevent individuals from suing companies who improperly collected or used their data, freeing victims up to lob sueballs at AI companies to their heart's content. Covered data includes unique identifiers such as device IDs, IP addresses, advertising IDs, geolocation data, biometric identifiers, behavioral data (e.g., browning history and purchase patterns) and even information companies use to build profiles. If this bill redefines fair use in favor of content creators, the entire information economy could change. At present, online publishers are suffering from a "traffic apocalypse" as Google's AI Overviews compete with their content, depriving them of the ad impressions they need to stay in business. AI Overviews, ChatGPT, and almost every other LLM has been built by scraping huge portions of the web without permission. Major AI companies like Google have long argued that AI scraping of websites constitutes fair use, but the matter is hardly settled, as demonstrated by a recent research paper commissioned by the EU Parliament that concluded AI scraping does not, in fact, constitute fair use, because AIs don't learn like humans do. The head of the US Copyright Office similarly said last month that AI scraping went beyond the limits of fair use, and while the opinion may have cost him his job, it seems that elected officials have been paying attention. Introduced Monday and referred to committee, the bill may be a hard sell. There's no indication when it could be up for review by the Senate Judiciary Committee, nor if it would pass muster for a full Senate vote after that. Neither Hawley nor Blumenthal's offices responded to our questions. ®
[2]
AI and copyright - vendors must not overreach for the prize, say US policymakers
The belief that the copyright wars have been won by vendors is mistaken. Granted, summary judgments in the Meta and Anthropic actions (see diginomica's report last month) went against the plaintiffs, but it was a Pyrrhic victory for defendants. The federal judge in the Meta case was clear about two things: first, Judge Chhabria was not saying that scraping copyrighted works is legal, merely that the plaintiffs had taken the wrong approach against the social media giant; and second, that wealthy AI companies should license works from rightsholders and not scrape them from pirate libraries to save money. Indeed, he implied that Chief Executive Officer Mark Zuckerberg was personally responsible for the decision to scrape the free LibGen library for training data, rather than licensing those texts. In the Anthropic case, Judge Alsup noted his belief that vendors' use of Artificial Intelligence (AI) was transformative - a claim contradicted by a US Copyright Office report earlier this year. However, had the authors claimed that Anthropic generated "infringing knockoffs" it would have been a "different case", Alsup explains, paving the way for lawsuits to proceed on that basis. So, lawyers in the 50 or so other cases worldwide will learn from these judgments: wholesale scraping of pirated content and the ability to generate direct digital competitors to copyrighted works are indictments that seem likely to succeed. The first successful case brought by rightsholders proved the point earlier this year: Thomson Reuters won its action against Ross Intelligence, with the judge striking down the defendant's fair-use claim in copying the plaintiff's content to create a competing legal services product. Even so, the sense that Big Tech companies and AI vendors are getting too big for their boots is hard to avoid, with some emboldened by a (partial) reading of those judgments. Meta this month refused to sign the European Union's (EU) General-Purpose AI Code of Practice, which was released on 10 July, claiming it will "stunt growth". Bear in mind, the code is a non-binding voluntary statement, which merely calls for transparency, ethical use of copyrighted data, and users' safety and security. On copyright, the Code says firms should: Yet apparently, Meta believes those non-binding terms are unreasonable and put Europe on "the wrong path" to AI. Make no mistake, Meta thinks your data is its data, by default. We are merely products for Zuckerberg to pitch an endless stream of Suggested Content to, with no means of - ever - just saying no. But even in US President Trump's America - backed by an alliance of tech vendors, whose Chief Executive Officers stood shoulder to shoulder with the President as he rolled back AI safety initiatives and drew up his AI Action Plan - not every voice is in favor of overreach for the prize, even among Trump's supporters. Josh Hawley is both an attorney and the Republican Senator from Missouri. Together with fellow lawyer Richard Blumenthal, a former Marine and now the Democrat Senator from Connecticut, he has introduced a new Bill, whose aim is to roll out The AI Accountability and Personal Data Protection Act. Writing on X yesterday, Senator Hawley explains: Time for new approach on AI. Give every American the right to protect their own name, image & likeness & all their copyrighted material. How? Let them sue AI companies that take property w/o consent. AI isn't worth having if it doesn't protect our rights. Though the focus is on protecting the individual, the Bill seeks to bar AI companies from training their models on copyrighted works, while also requiring vendors to disclose which third parties have been granted access to data where consent has been given. In a US Senate hearing last week, Hawley accused the likes of Meta and ChatGPT maker OpenAI of scraping millions of pirated texts - an allegation we know to be true in Meta's case, as it was cited in Judge Chhabria's judgment in the class action against the company. Earlier this year, the Rettigheds Alliancen - Denmark's Rights Alliance - released a 17-page document, 'Report on Pirated Content Used in the Training of Generative AI'. This provided evidence that Apple, Anthropic, DeepSeek, Meta, Microsoft, NVIDIA, OpenAI, Runway AI, and music platform Suno have scraped pirated data sources for training data. According to the Alliance, resources scraped by vendors include: According to the Alliance, AI companies have also scraped (among others): The Pile dataset; academic platform ArXiv; Stack Exchange; Project Gutenberg; YouTube; OpenSubtitles.org (giving AI companies access to entire movie scripts), Netflix, and Wikipedia, while music AI company Suno has admitted to scraping nearly every high-res audio file off the internet. In a prepared statement this week, Senator Hawley comments: AI companies are robbing the American people blind while leaving artists, writers, and other creators with zero recourse. [...] It's time for Congress to give the American worker their day in court to protect their personal data and creative works. Senator Blumenthal adds: Tech companies must be held accountable - and liable legally - when they breach consumer privacy, collecting, monetizing or sharing personal information without express consent. The Senators' focus on the individual, especially the American worker, seems likely to play well with both Trump supporters and Democrats, though it will be fiercely opposed by vendors. And in this sense, Denmark may prove to be an ally yet again. Last month, the Danish Government moved ahead with a Bill that aims to grant all citizens intellectual property rights over their own faces and voices - an outcome I predicted several years ago was a logical outcome of a world in which actors' likenesses, for example, might be faked by studios without their consent. While the Bill - likely to be passed into law this Fall - is designed to protect citizens against deep fakes and voice cloning by forcing vendors to take responsibility for any misuse of their systems, it will surely have implications for copyright in other fields. For example, generative music systems that can fake singers' performances seem likely to fall foul of such legislation, while any generative image or video platform will probably do the same. So, Denmark as the world's venue for multibillion-dollar lawsuits against US providers? That could be the case before Christmas. Whether you agree with my perspective that the training of many AI models on copyrighted material at scale is little more than data laundering, one thing should be clear: generative AI seems to be a license for vendors to abdicate responsibility for harm, misuse, and other user actions. But the prevailing mood in Europe, and among artists in the UK - if not yet the British Government - and even among some US Republicans seems to be turning against Big Tech arrogance and overreach, despite AI's undoubted importance for US prosperity. And something else has changed: it is no longer certain that Big Tech companies can be as certain of having the President's ear as they were in January. His relationship with X, Tesla, and SpaceX supremo has famously soured, and Chief Executive Officers may look at the damage to Musk from his public association with the White House with alarm.
[3]
US senators introduce bipartisan bill to make it easier to sue tech...
Sens. Josh Hawley (R-Mo.) and Richard Blumenthal (D-Conn.) rolled out bipartisan legislation to make it easier for people to sue tech companies for pirating their data to train artificial intelligence models -- calling the rampant practice "the largest intellectual property theft in American history" The proposed AI Accountability and Personal Data Protection Act -- which follows a recent hearing in which the US lawmaker accused companies including Meta and OpenAI of pirating vast amounts of protected material -- would bar AI companies from training on personal data or copyrighted works. "AI companies are robbing the American people blind while leaving artists, writers, and other creators with zero recourse," Hawley said in a statement. "It's time for Congress to give the American worker their day in court to protect their personal data and creative works." The bill would allow people to sue for use of their personal data or copyrighted works without giving consent. It would also require companies to disclose which third parties will be given access to data if consent is granted, and provides for financial penalties and injunctive relief. The Post has sought comment from Meta and OpenAI. Hawley added that the "bipartisan legislation would finally empower working Americans who now find their livelihoods in the crosshairs of Big Tech's lawlessness." Blumenthal, his Democratic partner on the bill, underscored privacy risks and the need for legal recourse. "Tech companies must be held accountable -- and liable legally -- when they breach consumer privacy, collecting, monetizing or sharing personal information without express consent," he said. In recent years, tech firms have been sued by content creators and publishers who allege that their copyrighted material was "scraped" for use by AI models. Thomson Reuters successfully sued Ross Intelligence, saying Ross used Westlaw's copyrighted legal headnotes to build its legal research AI. In February, a federal court agreed, ruling that Ross was guilty of copyright infringement. The news agency is seeking unspecified damages. In December 2023, the New York Times filed suit against OpenAI and Microsoft alleging that its articles were used to train systems such as GPT‑4 without permission. That case is ongoing. Last month, a federal judge said Anthropic's use of books to train its AI model was "highly transformative" and counted as fair use, but that keeping direct copies ("pirated" versions) in a central library was "direct infringement." The fight over damages and remedies is still ahead. Authors including Richard Kadrey say Meta used their books without permission to train LLaMA and other large language models. A court said Meta's use was also "highly transformative" and fair use, but the case continues over whether any stored "pirated" materials create liability.
Share
Copy Link
Senators Josh Hawley and Richard Blumenthal introduce the AI Accountability and Personal Data Protection Act, aiming to redefine fair use and allow individuals to sue AI companies for using copyrighted content without permission.
In a significant move that could reshape the landscape of AI development and copyright law, US Senators Josh Hawley (R-MO) and Richard Blumenthal (D-CT) have introduced the AI Accountability and Personal Data Protection Act. This bipartisan legislation aims to protect copyrighted content from being used for AI training without the owner's permission, potentially redefining the boundaries of fair use 1.
Source: New York Post
The bill introduces a new federal tort that would allow individuals to sue companies that use copyrighted works or personally identifiable information to train AI without express prior consent. This legislation could have far-reaching implications for major AI companies like Google, Meta, OpenAI, and Anthropic 1.
Key aspects of the bill include:
If passed, this bill could significantly alter the AI development landscape. Currently, many AI companies argue that their use of copyrighted material for training falls under fair use. However, this legislation challenges that assumption, potentially forcing companies to seek permission and possibly pay for the use of copyrighted content 2.
The bill's introduction comes in the wake of several high-profile legal cases involving AI companies and copyright infringement. While some recent judgments have favored AI companies, the landscape remains uncertain. For instance, Thomson Reuters successfully sued Ross Intelligence for copyright infringement related to AI training data 2.
Source: diginomica
This legislative move reflects growing concerns about AI companies' use of copyrighted material and personal data. Senator Hawley has accused companies like Meta and OpenAI of scraping millions of pirated texts for AI training, a practice that has been documented by organizations such as Denmark's Rights Alliance 2.
The bill also addresses privacy concerns, requiring companies to disclose which third parties will have access to data if consent is granted. It provides for financial penalties and injunctive relief for violations 3.
The proposed legislation is likely to face strong opposition from AI companies and tech giants. Meta's recent refusal to sign the European Union's General-Purpose AI Code of Practice, citing concerns about stunting growth, indicates the industry's resistance to increased regulation 2.
As the bill moves through the legislative process, it will undoubtedly spark intense debate about the balance between AI innovation and the protection of intellectual property rights. The outcome could have profound implications for the future of AI development, content creation, and data privacy in the United States and beyond.
Summarized by
Navi
[1]
A thriving black market for Nvidia's advanced AI chips has emerged in China, with at least $1 billion worth of processors smuggled into the country despite US export restrictions. The situation highlights the challenges in enforcing tech export controls and the high demand for cutting-edge AI hardware in China.
12 Sources
Technology
2 hrs ago
12 Sources
Technology
2 hrs ago
OpenAI is preparing to release its next-generation AI model, GPT-5, as early as August 2025. This highly anticipated launch promises enhanced capabilities and a unified approach to AI tasks.
7 Sources
Technology
2 hrs ago
7 Sources
Technology
2 hrs ago
Google introduces 'Web Guide', an AI-driven feature that reorganizes search results into thematic groups, potentially changing how users interact with search engines.
8 Sources
Technology
2 hrs ago
8 Sources
Technology
2 hrs ago
Google reports significant growth in AI-powered features across its products, with AI Overviews reaching 2 billion monthly users and Gemini app hitting 450 million users. The company processes 980 trillion monthly tokens, showcasing the increasing adoption of AI technologies.
6 Sources
Technology
18 hrs ago
6 Sources
Technology
18 hrs ago
Walmart announces the rollout of AI-powered 'super agents' to enhance customer experience, streamline operations, and boost e-commerce growth, aiming to compete with Amazon in the AI-driven retail landscape.
6 Sources
Technology
10 hrs ago
6 Sources
Technology
10 hrs ago