The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Sat, 8 Feb, 4:02 PM UTC
11 Sources
[1]
Court filings show Meta paused efforts to license books for AI training | TechCrunch
New court filings in an AI copyright case against Meta add credence to earlier reports that the company "paused" discussions with book publishers on licensing deals to supply some of its generative AI models with training data. The filings are related to the case Kadrey v. Meta Platforms -- one of many such cases winding through the U.S. court system that's pitted AI companies against authors and other intellectual property holders. For the most part, the defendants in these cases -- AI companies -- have claimed that training on copyrighted content is "fair use." The plaintiffs -- copyright holders -- have vociferously disagreed. The new filings submitted to the court Friday, which include partial transcripts of Meta employee depositions taken by attorneys for plaintiffs in the case, suggest that certain Meta staff felt negotiating AI training data licenses for books might not be scalable. According to one transcript, Sy Choudhury, who leads Meta's AI partnership initiatives, said that Meta's outreach to various publishers was met with "very slow uptake in engagement and interest." "I don't recall the entire list, but I remember we had made a long list from initially scouring the Internet of top publishers, et cetera," Choudhury said, per the transcript, "and we didn't get contact and feedback from -- from a lot of our cold call outreaches to try to establish contact." Choudhury added, "There were a few, like, that did, you know, engage, but not many." According to the court transcripts, Meta paused certain AI-related book licensing efforts in early April 2023 after encountering "timing" and other logistical setbacks. Choudhury said some publishers, in particular fiction book publishers, turned out to not in fact have the rights to the content that Meta was considering licensing, per a transcript. "I'd like to point out that the -- in the fiction category, we quickly learned from the business development team that most of the publishers we were talking to, they themselves were representing that they did not have, actually, the rights to license the data to us," Choudhury said. "And so it would take a long time to engage with all their authors." Choudhury noted during his deposition that Meta has on at least one other occasion paused licensing efforts related to AI development, according to a transcript. "I am aware of licensing efforts such, for example, we tried to license 3D worlds from different game engine and game manufacturers for our AI research team," Choudhury said. "And in the same way that I'm describing here for fiction and textbook data, we got very little engagement to even have a conversation [...] We decided to -- in that case, we decided to build our own solution." Counsel for the plaintiffs, who include bestselling authors Sarah Silverman and Ta-Nehisi Coates, have amended their complaint several times since the case was filed in the U.S. District Court for the Northern District of California, San Francisco Division in 2023. The latest amended complaint submitted by plaintiffs' counsel allege that Meta, among other offenses, cross-referenced certain pirated books with copyrighted books available for license to determine whether it made sense to pursue a licensing agreement with a publisher. The complaint also accuses Meta of using "shadow libraries" containing pirated ebooks to train several of the company's AI models, including its popular Llama series of "open" models. According to the complaint, Meta may have secured some of the libraries via torrenting. Torrenting, a way of distributing files across the web, requires that torrenters simultaneously "seed," or upload, the files they're trying to obtain -- which the plaintiffs asserted is a form of copyright infringement.
[2]
Meta purportedly trained its AI on more than 80TB of pirated content and then open-sourced Llama for the greater good
Court filings suggest Meta took steps to unsuccessfully mask its AI training activities Meta is facing a class-action lawsuit alleging copyright infringement and unfair competition over the training of its AI model, Llama. According to court documents released by vx-underground, Meta allegedly downloaded nearly 82TB of pirated books from shadow libraries such as Anna's Archive, Z-Library, and LibGen to train its AI systems. Internal discussions reveal that some employees raised ethical concerns as early as 2022, with one researcher explicitly stating, "I don't think we should use pirated material" while another said, "Using pirated material should be beyond our ethical threshold." Despite these concerns, Meta appears to have not only ploughed on and taken steps to avoid detection. In April 2023, an employee warned against using corporate IP addresses to access pirated content, while another said that "torrenting from a corporate laptop doesn't feel right," adding a laughing emoji. There are also reports that Meta employees allegedly discussed ways to prevent Meta's infrastructure from being directly linked to the downloads, raising questions about whether the company knowingly bypassed copyright laws. In January 2023, Meta CEO Mark Zuckerberg reportedly attended a meeting where he pushed for AI implementation at the company despite internal objections. Meta isn't alone in facing legal challenges over AI training. OpenAI has been sued multiple times for allegedly using copyrighted books without permission, including a case filed by The New York Times in December 2023. Nvidia is also under legal scrutiny for training its NeMo model on nearly 200,000 books, and a former employee had disclosed that the company scraped over 426,000 hours of video daily for AI development. And in case you missed it, OpenAI recently claimed that DeepSeek unlawfully obtained data from its models, highlighting the ongoing ethical and legal dilemmas surrounding AI training practices.
[3]
Meta faces lawsuit for training AI with pirated books
In a recent lawsuit, Meta has been accused of using pirated books to train its AI models, with CEO Mark Zuckerberg's approval. As per Ars Technica, the lawsuit filed by authors including Ta-Nehisi Coates and Sarah Silverman in a California federal court, cite internal Meta communications indicating that the company utilized the Library Genesis (LibGen) dataset -- a vast online repository known for hosting pirated books -- despite internal concerns about the legality of using such material. The authors argue that Meta's actions infringe upon their copyrights and could undermine the company's position with regulators. They claim that Meta's AI models, including Llama, were trained using their works without permission, potentially harming their livelihoods. Meta has defended its practices by invoking the "fair use" doctrine, asserting that using publicly available materials to train AI tools is legal in certain cases, such as "using text to statistically model language and generate original expression." Recommended Videos Unsealed court documents from February 5th, 2024, in Kadrey v. Meta show Meta (formerly Facebook) illegally torrented 81.7TB of data from "shadow libraries" such as Anna's Archive, Z-Library, and LibGen to train Meta artificial intelligence. Highlights include: - A senior AI... pic.twitter.com/Bqf60Hhbb6 — vx-underground (@vxunderground) February 8, 2025 One internal message highlighted in the lawsuit quotes an employee expressing discomfort, stating, "Torrenting from a corporate laptop doesn't feel right." In response to the lawsuit, U.S. District Judge Vince Chhabria dismissed some claims but allowed the authors to amend their complaint to include new allegations, including those related to the removal of copyright management information. This case is part of a broader wave of legal challenges against tech companies like Meta, OpenAI, and Anthropic, where authors and creators are seeking to protect their intellectual property rights in the face of rapidly advancing AI technologies. The outcome of this lawsuit could have significant implications for the tech industry, particularly concerning the use of copyrighted materials in AI training. It raises important questions about the balance between technological innovation and the protection of creators' rights.
[4]
Meta's Llama AI in hot water: Alleged copyright theft leads to class action lawsuit, accused of pirating 82TB of books for AI training
Meta is being sued for allegedly using pirated books from shadow libraries to train its AI models, despite internal concerns and ethical warnings. The company reportedly downloaded 81.7TB of data through torrents and concealed its involvement to bypass copyright laws.Meta's LLaMA AI is in the spotlight after being hit with a class action lawsuit accusing the company of illegally using pirated books to train its AI models, as per a report. According to court records shared by vx-underground, Meta allegedly downloaded 81.7TB of data from shadow libraries like Anna's Archive, Z-Library, and LibGen through torrents to feed its AI. According to Tom's Hardware report, internal messages reveal that even some of Meta's own employees raised concerns over using pirated materials. One senior AI researcher warned as early as October 2022 about crossing ethical lines, stating, "I don't think we should use pirated material." Another employee agreed, saying that platforms like SciHub and LibGen were essentially pirating content. Despite these concerns, in January 2023, Meta pushed ahead. Mark Zuckerberg himself was involved in a meeting urging the team to "move this stuff forward." As per the report, an employee flagged Meta's IP addresses being used to access pirated content, even joking that "torrenting from a corporate laptop doesn't feel right." What makes the case more concerning is evidence that Meta took steps to hide its involvement by avoiding direct links between its corporate infrastructure and the illegal downloading activity, reported Tom's Hardware. This is seen as an attempt to bypass copyright laws, according to the lawsuit. Meta isn't the only tech company facing backlash for how it trains AI. OpenAI has been sued by novelists for allegedly using their books without permission, and Nvidia has faced legal action for using thousands of books and videos to train its models. The lawsuit is ongoing, and even if Meta loses, the company could appeal, meaning the final ruling might take months or even years to settle. How did Meta allegedly obtain the pirated content? Meta is accused of using torrents to download large volumes of books, including copyrighted material, to train its AI. Court documents show that the company took steps to avoid direct links to its infrastructure to keep these actions under the radar. Has this happened before with other AI companies? Yes, Meta isn't the only company to face such accusations. OpenAI has been sued by authors for using their books to train its language models, and Nvidia was also sued for scraping books and videos for its own AI model training.
[5]
Meta used pirated books to train its AI models, and there are emails to prove it
Facepalm: A group of authors has sued Meta, alleging that the company used unauthorized copies of their books to train its generative AI models. While Meta has denied any wrongdoing, newly unsealed messages suggest that executives and engineers were well aware of their actions - and that they were violating copyright law. The lawsuit filed by Sarah Silverman, Richard Kadrey, and other writers and rights holders against Meta may be entering its most critical phase. The authors have obtained internal company emails in which Meta employees openly discussed "torrenting" well-known archives of pirated content to train more powerful AI models. Meta previously acknowledged using certain controversial datasets, arguing that such practices should be considered fair use. The company also admitted to downloading a massive dataset known as "LibGen," which contains millions of pirated books. However, the newly unsealed emails reveal deeper concerns within Meta about acquiring and distributing this data through the BitTorrent network. According to the emails, Meta downloaded and shared at least 81.7 terabytes of data across multiple contentious datasets, including 35.7 terabytes from Z-Library and LibGen archives. The plaintiffs allege that Meta engaged in an "astonishing" torrenting scheme, distributing pirated books at an unprecedented scale. In an April 2023 message, Meta researcher Nikolay Bashlykov wrote, "torrenting from a corporate laptop doesn't feel right." The message ended with a smiling emoji, but a few months later, his tone shifted significantly. In September 2023, Bashlykov stated that he was consulting Meta's legal team because using torrents - and thereby "seeding" terabytes of pirated data - was clearly "not OK" from a legal standpoint. Meta was apparently aware that its engineers were engaging in illegal torrenting to train AI models, and Mark Zuckerberg himself was reportedly aware of LibGen. To conceal this activity, the company attempted to mask its torrenting and seeding by using servers outside of Facebook's main network. In another internal message, Meta employee Frank Zhang referred to this approach as "stealth mode." Like other major tech firms, Meta is pouring massive amounts of money into AI development and generative AI services. The company, which aims to populate its aging social networks with AI-generated personas and bots, recently filed a motion to dismiss the lawsuit led by Silverman and other authors. However, the newly revealed emails detailing Meta's involvement in torrenting and distributing pirated books could significantly complicate its legal defense.
[6]
Meta staff torrented nearly 82TB of pirated books for AI training -- court records reveal copyright violations
Facebook parent-company Meta is currently fighting a class action lawsuit alleging copyright infringement and unfair competition, among others, with regards to how it trained LLaMA. According to an X (formerly Twitter) post by vx-underground, court records reveal that the social media company used pirated torrents to download 81.7TB of data from shadow libraries including Anna's Archive, Z-Library, and LibGen. It then used this information to train its AI models. The evidence, in the form of written communication, shows the researchers' concerns about Meta's use of pirated materials. One senior AI researcher said way back in October 2022, "I don't think we should use pirated material. I really need to draw a line here." While another one said, "Using pirated material should be beyond our ethical threshold," then they added, "SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protected by copyright and they're infringing it." Then, in January 2023, Mark Zuckerberg himself attended a meeting where he said, "We need to move this stuff forward... we need to find a way to unblock all this." Some three months later, a Meta employee sent a message to another one saying they were concerned about Meta IP addresses being used "to load through pirate content." They also added, "torrenting from a corporate laptop doesn't feel right," followed by laughing out loud emoji. Aside from those messages, documents also revealed that the company took steps so that its infrastructure wasn't used in these downloading and seeding operations so that the activity wouldn't be traced back to Meta. The court documents say that this constitutes evidence of Meta's unlawful activity, which seems like it's taking deliberate steps to circumvent copyright laws. However, this isn't the first time an AI training model has been accused of stealing information off the internet. OpenAI has been sued by novelists as far back as June 2023 for using their books to train its large language models, with The New York Times following suit in December. Nvidia has also been on the receiving end of a lawsuit filed by writers for using 196,640 books to train its NeMo model, which has since been taken down. A former Nvidia employee blew the whistle on the company in August of last year, saying that it scraped more than 426 thousand hours of videos daily for use in AI training. More recently, OpenAI is investigating if DeepSeek illegally obtained data from ChatGPT, which just shows how ironic things can get. The case against Meta is still ongoing, so we will have to wait until the court releases its decision to say if the company committed direct infringement. And even if the writers win this case, Meta, with its huge financial war chest, will likely appeal the decision, meaning we will have to wait for several months, if not years, to see the final court judgment.
[7]
Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI
TL;DR: Leaked court documents indicate that Meta is accused of illegally downloading 82 terabytes of data to train its artificial intelligence systems. For sophisticated AI chatbots to exist, they need to be trained on large swaths of data, but where things get murky is when the big question is posed to the companies behind these AI chatbots - where did you get this data from? And was it obtained legally? Since their massive rise in popularity, companies behind these AI chatbots have been accused of stealing copyrighted data, which is then used by the AI for training purposes to further increase its sophistication and, ultimately, the price the company charges for access to the AI. Obtaining datasets legally means companies must pay a licensing fee for copyrighted material, and also agree to a bunch of hoops set by the owner of the data. Why go through all that expenditure and stipulations when the dataset can just be pirated in the same way a movie can be illegally downloaded? Companies such as OpenAI are currently embroiled in copyright lawsuits for these reasons, but OpenAI isn't the only AI company facing copyright lawsuits, as a lawsuit against Meta has recently been leaked online that accuses the Mark Zuckerberg-run company of obtaining 82 terabytes (TB) of books from an illegal source for AI training. The lawsuit states Meta illegally downloaded the contents of the books from "shadow libraries" such as Anna's Archive, Z-Library, and LibGen, with the suit quoting a Meta researcher who was against the use of pirated material. - A senior AI research at Meta says, "I don't think we should use pirated material. I really need to draw a line there." - Another AI researcher says, "using pirated material should be beyond our ethical threshold" ... "SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protectec by copyright and they're infringing it". - In January 2023, Mark Zuckerberg attends a meeting which is heavily redacted in court documents. However, he says "we need to this move this stuff forward" and "we need to find a way to unblock all of this". - Fast forward to April, 2023, Meta employees discuss using a VPN to conceal Meta IP address ranges when torrenting data. Meta employees also discuss the need to involve lawyers if something goes astray. The unredacted court records show a Meta employee saying, "torrenting from a corporate laptop doesn't feel right 😂".
[8]
Unredacted Meta emails reveal scale of book piracy for AI training
Disclaimer: This content generated by AI & may have errors or hallucinations. Edit before use. Read our Terms of use Newly uncovered emails gave fresh impetus to a copyright case against Meta, raised by book authors who claim the tech giant utilised pirated books to train its AI models. Authors like Richard Kadrey, Sarah Silverman and Christopher Golden claimed that Meta torrented and processed their books to train AI models. The original lawsuit filed in 2023 alleged that the company used their work "without consent, without credit and without compensation". Meta had owned up to torrenting a controversially large dataset called LibGen, which comprises 10 million of pirated books, last month. However, exact details about the torrenting were unclear until five days ago, when Meta's unredacted emails were released to the public for the first time. The authors' court filing says that the new evidence illustrates Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries through the site Anna's Archive, including at least 35.7 terabytes of data from Z-Library and LibGen. Meta also previously torrented 80.6 terabytes of data from LibGen." The filing also alleges that the magnitude of Meta's illegal torrenting exercise was astonishing, adding that "vastly smaller acts of data piracy -- just 0.008 percent of the amount of copyrighted works Meta pirated -- have resulted in Judges referring the conduct to the US Attorneys' office for criminal investigation." Book authors had been demanding further information from Meta related to torrenting because of the evident copyright issue over Meta seeding, and consequently distributing pirated books. The tech giant initially held out against these attempts, but to no avail as eventually the authors dug up information anyway, including an essential document that begins with at least one staffer joking about the potential legal hurdles. Nikolay Bashlykov, a Meta research engineer, wrote in an April 2023 message, "Torrenting from a corporate laptop doesn't feel right," adding a smiley emoticon. Bashylkov apparently did away with the emoticons by September 2023, consulting the legal team and highlighting in an email that "using torrents would entail 'seeding' the files -- i.e., sharing the content outside, this could be legally not OK." Meta apparently tried to hide the seeding by not using Facebook servers to download the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers, an internal message from Meta researcher Frank Zhang said. Michael Clark, a Meta executive heading project management, said in a deposition that the tech giant reportedly changed settings "so that the smallest amount of seeding possible could occur". Authors asked for Meta staff involved in using LibGen to train AI models to be interrogated again as the new facts allegedly "contradict prior deposition testimony". Mark Zuckerberg, for instance, claimed to not be involved in decisions taken to use LibGen in training AI models. But the authors alleged that unredacted messages show the "decision to use LibGen occurred" after "a prior escalation to MZ". Also Read: 1) Meta's Copyright Lawsuit in the US Highlights Complex Liability in AI Training Using Open Source Data 2) Meta's AI Safety Pledge: How It Compares to the EU and US AI Regulations
[9]
Meta accused of downloading torrents of 81.7TB of pirated books to train its Llama AI models
Meta has been accused of torrenting an astonishing 81.7TB of pirated books to train its Llama AI models according to a new lawsuit filed in the US District Court for the Northern District of California. The social networking giant has been accused of illegally torrenting copyrighted materials from sources including Z-Library and LibGen, with the plaintiffs led by author Richard Kadrey and others representing a proposed class, filing a motion objecting to a pre-trial discovery ruling that the authors argue limits their ability to gather critical evidence against Meta. The authors claim that Meta's last-minute disclosure of over 2000 documents on December 13, 2024 just hours before the close of fact discovery revealed admissions from Meta employees about using pirated materials for its AI training. The newly-unsealed emails reveal damning evidence against Meta in a copyright lawsuit filed by book authors, claiming that Meta unlawfully trained its AI models using pirated books downloaded over torrents. In new evidence that shows Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries through the site's Anna's Archive, including at least 35.7 terabytes of data from Z-Library and LibGen" according to the authors' court filing, adding "Meta also previously torrented 80.6 terabytes of data from LibGen". The authors' filing alleges: "the magnitude of Meta's unlawful torrenting scheme is astonishing", insisting that "vastly smaller acts of data piracy-just .008 percent of the amount of copyrighted works Meta pirated-have resulted in Judges referring the conduct to the US Attorneys' office for criminal investigation". One Meta staffer reportedly said: "I feel that using pirated material should be beyond our ethical threshold", while another document alleges that Meta's decision to use LibGen was escalated to Meta CEO Mark Zuckerberg. The authors claim that internal emails about torrenting prove that Meta was well aware its actions were illegal, pointing to warnings from employees that say they were ignored. The plaintiffs are challenging several aspects of a recent discovery ruling:
[10]
Court documents show not only did Meta torrent terabytes of pirated books to train AI models, employees wouldn't stop emailing each other about it: 'Torrenting from a corporate laptop doesn't feel right'
First reported by Ars Technica, the copyright case against Facebook parent company Meta over its use of authors' work to train large language models has unearthed some embarrassing dirty laundry in discovery. Dozens of emails, allegedly between Meta employees, discuss torrenting massive amounts of pirated material -- and seeding those torrents to boot -- in order to train the company's AI models. It was revealed via court documents last month that Meta had obtained AI training data from LibGen, a large file sharing database that includes everything from paywalled news and academic articles, to whole books. The prosecution alleges that Meta downloaded over 80 terabytes from LibGen and another so-called "shadow library" by the name of Z-Library. This is, to be clear, internet piracy on a scale that would make a Nintendo lawyer blush, and the lawsuit alleges the emails put in writing "Meta's decision to take and use copyrighted works without permission that it knew to be pirated, despite clear ethical concerns." One of the emails in evidence quotes an alleged Meta employee futilely advising that "using pirated material should be beyond our ethical threshold" before arguing that databases like LibGen "are basically like PirateBay or something like that, they are distributing content that is protected by copyright and they're infringing it." There are repeated examples of emails ascribed to Meta employees flagging the use of LibGen as a concern, either in failed "lone sane man fashion," or in the context of hiding the activity. One researcher proposed only accessing LibGen through a VPN, and later joked that "torrenting from a corporate laptop doesn't feel right 😂." Meta would ultimately operate in "stealth mode," to quote one AI researcher at the company, concealing the activity by only downloading and seeding the torrents outside official Facebook servers. As an aside: It was real neighborly of them to seed the torrents too! Wonder how good their ratios were. The prosecution further argues that these discovery documents suggest that Meta executives up to and including Mark Zuckerberg were aware of the use of pirated material to train AI models at the company. Another detail that stands out to me: The emails filed as evidence indicate that Meta employees believed OpenAI used LibGen for its own models, framing the company's use of the database as a sort of arms race. If the Internet Archive isn't allowed to loan books as a digital library, I don't think companies like Meta should be allowed to swallow up terabytes of pirated material to train a chatbot that will lie to you about how many planets are in the solar system. In a twist of fate, our international copyright regime looks to be one of the most sturdy bulwarks against an AI future. I'm no fan of the Digital Millennium Copyright Act, but I say let them fight. One other thing I just can't escape is how low-rent this all is: Our Silicon Valley thought leaders and mavericks need unprecedented injections of capital in order to... do internet piracy and conquer a new frontier in cheating on your homework? The sheer body of written communication allegedly confirming it all is just the cherry on top of a schadenfreude sundae. "Subject: Forwarded: Re:Re:Re:Re: Crimes." I'm reminded of how Valve was saved from ruin by a similar disregard for opsec on the part of its former publisher Vivendi, or, indeed, that one I Think You Should Leave sketch.
[11]
81.7 terabytes of data: Meta downloads entire digital libraries without permission - Softonic
The information revealed suggests that among the 81.7 terabytes of downloaded data, at least 35.7 terabytes include library books Meta, the company of Mark Zuckerberg that was previously called Facebook, is facing serious accusations in the Kadrey case against the company, in which it is accused of using copyrighted works to train artificial intelligence models. Recent revelations have highlighted the systematic nature of these activities, with evidence suggesting massive downloads of books from platforms like Z-Library and LibGen, reaching a total of 81.7 terabytes of data. Judicial documents have revealed internal emails that demonstrate Meta employees' awareness of the illegality of their actions. In October 2022, employee Melanie Kambadur expressed her doubts about the ethics of downloading books using torrents from a corporate computer. Later, in April 2023, Nikolay Bashlykov, a project manager, was cautious about using the corporate network for these downloads, suggesting that it could involve legal risks. Despite these questions, Meta has filed a motion to dismiss the allegations, arguing that there is no concrete evidence that its employees have downloaded material illegally. However, the information revealed suggests that among the 81.7 terabytes of downloaded data, at least 35.7 terabytes include books from unauthorized digital libraries. These practices are not unique to Meta, as other artificial intelligence companies, such as Google and OpenAI, have also faced criticism for the misuse of protected content to train their models. The industry seems to be normalizing these copyright violations under the concept of "fair use," an argument that has been widely questioned in the context of AI growth. However, while "fair use" allows for the limited use of protected material, the constant reports of copyright violations in the field of generative AI suggest that this justification is losing relevance.
Share
Share
Copy Link
Meta is embroiled in a lawsuit alleging the company used pirated books to train its AI models, including Llama. Internal communications reveal ethical concerns and attempts to conceal the practice.
Meta, the parent company of Facebook, is facing a class-action lawsuit over allegations that it used pirated books to train its artificial intelligence models, including the popular Llama series. Court filings and internal communications have revealed that Meta allegedly downloaded and used vast amounts of copyrighted material without proper authorization 1.
According to the lawsuit, Meta is accused of downloading nearly 82 terabytes of pirated books from shadow libraries such as Anna's Archive, Z-Library, and LibGen 2. The plaintiffs, including bestselling authors Sarah Silverman and Ta-Nehisi Coates, allege that Meta infringed upon their copyrights and potentially harmed their livelihoods 3.
Unsealed court documents reveal that some Meta employees raised ethical concerns as early as 2022. One researcher explicitly stated, "I don't think we should use pirated material," while another employee commented that "torrenting from a corporate laptop doesn't feel right" 4.
Despite these internal warnings, Meta allegedly took steps to conceal its activities. Employees discussed ways to prevent Meta's infrastructure from being directly linked to the downloads, including using servers outside of Facebook's main network in what was referred to as "stealth mode" 5.
Meta has defended its practices by invoking the "fair use" doctrine, asserting that using publicly available materials to train AI tools is legal in certain cases. The company argues that it uses text to statistically model language and generate original expression 3.
This case is part of a larger trend of legal challenges against tech companies developing AI technologies. OpenAI and Nvidia have also faced similar accusations regarding their use of copyrighted materials for AI training 2.
U.S. District Judge Vince Chhabria has dismissed some claims but allowed the authors to amend their complaint to include new allegations, including those related to the removal of copyright management information 3. The outcome of this lawsuit could have significant implications for the tech industry, particularly concerning the use of copyrighted materials in AI training.
Reference
[2]
[3]
[4]
Meta is embroiled in a lawsuit accusing the company of using torrented copyrighted books to train its AI models, potentially setting a precedent for how courts view copyright law in AI development.
6 Sources
6 Sources
Meta CEO Mark Zuckerberg defends the use of copyrighted e-books to train AI models, comparing it to YouTube's content moderation challenges. The case raises questions about fair use in AI development.
17 Sources
17 Sources
Meta claims it didn't seed pirated books used for AI training, sparking debate on copyright infringement and data acquisition methods in AI development.
2 Sources
2 Sources
French publishing and authors' associations have filed a lawsuit against Meta, accusing the tech giant of using copyrighted content without permission to train its AI models. This marks the first such legal action against an AI company in France.
11 Sources
11 Sources
Court documents reveal Meta's intense focus on beating OpenAI's GPT-4 in AI development, highlighting the competitive landscape in the AI industry and raising questions about data usage practices.
2 Sources
2 Sources