Curated by THEOUTPOST
On Thu, 21 Nov, 4:02 PM UTC
13 Sources
[1]
Tech problems plague OpenAI court battles; judge rejects a key fair use defense
OpenAI keeps deleting data that could allegedly prove the AI company violated copyright laws by training ChatGPT on authors' works. Apparently largely unintentional, the sloppy practice is seemingly dragging out early court battles that could determine whether AI training is fair use. Most recently, The New York Times accused OpenAI of unintentionally erasing programs and search results that the newspaper believed could be used as evidence of copyright abuse. The NYT apparently spent more than 150 hours extracting training data, while following a model inspection protocol that OpenAI set up precisely to avoid conducting potentially damning searches of its own database. This process began in October, but by mid-November, the NYT discovered that some of the data gathered had been erased due to what OpenAI called a "glitch." Looking to update the court about potential delays in discovery, the NYT asked OpenAI to collaborate on a joint filing admitting the deletion occurred. But OpenAI declined, instead filing a separate response calling the newspaper's accusation that evidence was deleted "exaggerated" and blaming the NYT for the technical problem that triggered the data deleting. OpenAI denied deleting "any evidence," instead admitting only that file-system information was "inadvertently removed" after the NYT requested a change that resulted in "self-inflicted wounds." According to OpenAI, the tech problem emerged because NYT was hoping to speed up its searches and requested a change to the model inspection set-up that OpenAI warned "would yield no speed improvements and might even hinder performance." The AI company accused the NYT of negligence during discovery, "repeatedly running flawed code" while conducting searches of URLs and phrases from various newspaper articles and failing to back up their data. Allegedly the change that NYT requested "resulted in removing the folder structure and some file names on one hard drive," which "was supposed to be used as a temporary cache for storing OpenAI data, but evidently was also used by Plaintiffs to save some of their search results (apparently without any backups)."
[2]
OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit (updated)
Lawyers for The New York Times and Daily News, which are suing OpenAI for allegedly scraping their works to train its AI models without permission, say OpenAI engineers accidentally deleted data potentially relevant to the case. Earlier this fall, OpenAI agreed to provide two virtual machines so that counsel for The Times and Daily News could perform searches for their copyrighted content in its AI training sets. (Virtual machines are software-based computers that exist within another computer's operating system, often used for the purposes of testing, backing up data, and running apps.) In a letter, attorneys for the publishers say that they and experts they hired have spent over 150 hours since November 1 searching OpenAI's training data. But on November 14, OpenAI engineers erased all the publishers' search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Wednesday. OpenAI tried to recover the data -- and was mostly successful. However, because the folder structure and file names were "irretrievably" lost, the recovered data "cannot be used to determine where the news plaintiffs' copied articles were used to build [OpenAI's] models," per the letter. "News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time," counsel for The Times and Daily News wrote. "The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week's worth of its experts' and lawyers' work must be re-done, which is why this supplemental letter is being filed today." The plaintiffs' counsel makes clear that they have no reason to believe the deletion was intentional. But they do say the incident underscores that OpenAI "is in the best position to search its own datasets" for potentially infringing content using its own tools. An OpenAI spokesperson declined to provide a statement. But late Friday, November 22, counsel for OpenAI filed a response to the letter sent by lawyers for The Times and Daily News on Wednesday. In their response, OpenAI's attorneys unequivocally denied that OpenAI deleted any evidence, and instead suggested that the plaintiffs were to blame for a system misconfiguration that led to a technical issue. "Plaintiffs requested a configuration change to one of several machines that OpenAI has provided to search training datasets," OpenAI's counsel wrote. "Implementing plaintiffs' requested change, however, resulted in removing the folder structure and some file names on one hard drive -- a drive that was supposed to be used as a temporary cache ... In any event, there is no reason to think that any files were actually lost." In this case and others, OpenAI has maintained that training models using publicly available data -- including articles from The Times and Daily News -- is fair use. In other words, in creating models like GPT-4o, which "learn" from billions of examples of e-books, essays, and more to generate human-sounding text, OpenAI believes that it isn't required to license or otherwise pay for the examples -- even if it makes money from those models. That being said, OpenAI has inked licensing deals with a growing number of new publishers, including the Associated Press, Business Insider owner Axel Springer, Financial Times, People parent company Dotdash Meredith, and News Corp. OpenAI has declined to make the terms of these deals public, but one content partner, Dotdash, is reportedly being paid at least $16 million per year. OpenAI has neither confirmed nor denied that it trained its AI systems on any specific copyrighted works without permission.
[3]
OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit
Lawyers for The New York Times and Daily News, which are suing OpenAI for allegedly scraping their works to train its AI models without permission, say OpenAI engineers accidentally deleted data potentially relevant to the case. Earlier this fall, OpenAI agreed to provide two dedicated virtual machines so that counsel for The Times and Daily News could perform searches for copyrighted content in its training data sets. In a letter, attorneys for the publishers say that they and experts have spent over 150 hours since November 1 searching OpenAI's training data. But on November 14, OpenAI engineers erased all the publishers' search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Thursday. OpenAI tried recover the data -- and was somewhat successful. However, because the folder structure and file names were "irretrievably" lost, the recovered data "cannot be used to determine where the news plaintiffs' copied articles were used to build [OpenAI's] models," per the letter. "News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time," counsel for The Times and Daily News wrote. "The news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week's worth of its experts' and lawyers' work must be re-done, which is why this supplemental letter is being filed today." The plaintiffs' counsel makes clear that they have no reason to believe the deletion was intentional. But they do say the incident underscores that OpenAI "is in the best position to search its own datasets" for potentially infringing content using its own tools. We've reached out to OpenAI for comment and will update this piece if we hear back. In this case and others, OpenAI has maintained that training AI models using publicly available data -- including articles from The Times and Daily News -- is fair use. In other words, in creating models like GPT-4o, which "learn" from billions of examples of ebooks, essays, and more to generate human-sounding text, OpenAI believes that it isn't required to license or otherwise pay for the examples -- even if it makes money from those models. That being said, OpenAI has inked licensing deals with a growing number of new publishers, including The Associated Press, Business Insider owner Axel Springer, Financial Times, People parent company Dotdash Meredith, and News Corp. OpenAI has declined to make the terms of these deals public, but one content partner, Dotdash, is reportedly being paid at least $16 million per year.
[4]
NYT vs OpenAI case: OpenAI accidentally deleted case data
Disclaimer: This content generated by AI & may have errors or hallucinations. Edit before use. Read our Terms of use Attorneys representing The New York Times (NYT) and Daily News in their lawsuit against OpenAI, which alleges unauthorised use of their content to train AI models, claim that OpenAI engineers accidentally deleted data that could be significant to the case, reported TechCrunch. Earlier this year, OpenAI provided two virtual machines with computing resources. These machines were provided so that the counsel for NYT and Daily News could perform searches for their copyrighted content in its AI training sets. The counsel representing NYT and Daily News filed a letter in the U.S. District Court for the Southern District of New York. The letter is a status update regarding training data issues and a renewed request that OpenAI be ordered to identify and admit which of News Plaintiffs' (NYT and Daily News) work it used to train each of its GPT models. The letter stated that on November 14, 2024, OpenAI engineers erased programs and search result data stored on one of the dedicated virtual machines. However, counsel for the publishers added that they had no reason to believe that the deletion was intentional. The counsel for the publishers said that they "bear a significant burden and expense in searching for their copyrighted works in OpenAI's training datasets within a tightly controlled environment that this Court and the parties have previously referred to as 'the sandbox.'" Counsel for the publishers stated that they and the experts they hired have spent over 150 hours since November 1, 2024, searching OpenAI's training data. It adds that OpenAI was able to recover much of the data that "it erased." However, OpenAI has "irretrievably lost" the folder structure and file names of the publishers' work product. It added that without the folder structure and original file names, the recovered data becomes "unreliable" and cannot confirm whether OpenAI used the publishers' copied articles to build its models. Stating that the recovered data was "unusable," the counsel for the publishers argued that OpenAI was in the best position to search its own datasets for the publishers' works using its own tools and equipment. "The News Plaintiffs have also provided the information that OpenAI needs to run those searches -- all that is needed is for OpenAI to commit to doing so in a timely manner," it stated. The News Plaintiffs have provided OpenAI with detailed instructions to search for their content using specific URLs and "n-gram" analysis, which detects overlapping phrases from their works. However, OpenAI has yet to deliver results or confirm meaningful progress. According to the filing, OpenAI's counsel has only reported "promising meetings" with its engineers, but no tangible outcomes. Furthermore, OpenAI stated in response to the plaintiffs' formal requests for admission that it will "neither admit nor deny" using publishers' work in its training datasets or models. On November 22, 2024, OpenAI filed its response in the case. In their response, OpenAI's attorney denied that the company deleted any evidence, instead attributing the issue to a system misconfiguration by the publishers that led to a technical issue. "Plaintiffs requested a configuration change to one of several machines that OpenAI has provided to search training datasets. Implementing Plaintiffs' requested change, however, resulted in removing the folder structure and some file names on one hard drive -- a drive that was supposed to be used as a temporary cache for storing OpenAI data, but evidently was also used by Plaintiffs to save some of their search results (apparently without any backups). In any event, there is no reason to think that any files were actually lost, and Plaintiffs could re-run the searches to recreate the files with just a couple days of computing time," the company stated. "Plaintiffs' inspection efforts began with them repeatedly running flawed code that overwhelmed and crashed the file system," it added. OpenAI's attorney further stated that the company first made training data available for inspection in June but the publishers delayed their review until October. "Once they began, Plaintiffs triggered a series of technical issues due to their own errors. As a direct result of Plaintiffs' self-inflicted wounds, OpenAI has been forced to pour enormous resources into supporting Plaintiffs' inspection, much more than should be necessary," it added. The statement said that publishers want an order compelling OpenAI to respond to nearly 500 million requests for admission. The statement said that OpenAI is willing to collaborate with the publishers. "The core obstacle here is not technical; it is the Plaintiffs' unwillingness to collaborate," the company said in the response. OpenAI claimed that they have offered to take over publishers' searches, provided they provide "clear and reasonable proposals." "OpenAI also offered to run at least some of Plaintiffs' searches for them and asked Plaintiffs to make a comprehensive proposal. Despite OpenAI's support, Plaintiffs returned to their inefficient 'boil-the-ocean' searches, demanding ever increasing hardware performance," it stated. In this case, OpenAI has argued that using publicly available data, such as articles from NYT and Daily News, to train its models constitutes fair use. According to OpenAI, "learning" on billions of examples does not require licensing or compensation for the data. It says that this remains true even when the models use the data commercially.
[5]
OpenAI accidentally deleted potential evidence in New York Times copyright lawsuit case
OpenAI may have accidentally deleted important data related to its ongoing copyright lawsuit brought by the New York Times. First reported by TechCrunch, counsel for the Times and its co-plaintiff Daily News sent a letter to the judge overseeing the case, detailing how "an entire week's worth of its experts' and lawyers' work" was "irretrievably lost." OpenAI had provided the plaintiffs with two dedicated virtual machines for researching alleged instances of copyright infringement. According to the letter, on Nov. 14, "programs and search result data stored on one of the dedicated virtual machines was erased by OpenAI engineers." The Times has accused OpenAI, and Microsoft which uses OpenAI's models for its Bing AI chatbot, of copyright infringement by training its models on paywalled and unauthorized content. The lawsuit detailed multiple instances of "near-verbatim" copy in ChatGPT responses. OpenAI has refuted this claim, saying their models were trained on publicly available data, and therefore fair use under copyright laws. The case hinges on the Times being able to prove that OpenAI's models copied and used its content without compensation or credit. OpenAI was able to recover most of the erased data, but the "folder structure and file names" of the work was unrecoverable, rendering the data unusable. Now, the plaintiffs' counsel must start their evidence gathering from scratch. In the letter, counsel affirmed that there's "no reason to believe [the erasure] was intentional," but also pointed out how "OpenAI is in the best position to search its own datasets." The AI company has avoided sharing any detail about its training data. Other similar copyright claims have been filed against OpenAI. But a lawsuit from Raw Story and AlterNet was recently dismissed because the plaintiffs could not prove enough harm to support their claims. Meanwhile, OpenAI has struck licensing deals with several media companies, to use their work for training and providing ChatGPT responses with citations. Recently, Adweek reported that OpenAI is paying publishing giant Dotdash Meredith at least $16 million a year to license its content.
[6]
OpenAI reportedly deleted evidence in NY Times copyright lawsuit
Lawyers for The New York Times and Daily News claim that OpenAI inadvertently deleted crucial data related to their copyright lawsuit against the company regarding unauthorized use of their content, according to a TechCrunch report. The incident occurred after OpenAI agreed to provide access to its training datasets to aid the plaintiffs in verifying the usage of their copyrighted materials. The lawsuit alleges that OpenAI has scraped articles from The New York Times and Daily News without obtaining permission to train its models. In response to the suit, OpenAI provided two virtual machines for the publishers' attorneys to search its training data for their copyrighted content. Since November 1, the legal teams have dedicated more than 150 hours to this search. However, on November 14, OpenAI engineers mistakenly erased all search data stored on one of the virtual machines, as noted in a filing made in the U.S. District Court for the Southern District of New York. OpenAI's attempts to recover the deleted data were mostly successful, but the loss of the folder structure and file names rendered the recovered data unusable in tracking where the plaintiffs' articles were included in the AI's training. The letter filed by the plaintiffs' counsel emphasized that they had to reconstruct their work, consuming extensive resources and time. Despite the deletion of data, the counsel clarified that there is no indication the incident was intentional. They expressed concern that OpenAI is ideally positioned to search its own datasets, indicating an obligation to assist in the investigation of potential copyright infringement. OpenAI just made macOS smarter with ChatGPT app support OpenAI contends that using publicly available data for training its models falls under "fair use." The company maintains that it does not need to license or compensate for these contents, even as it profits from its AI products. Nonetheless, OpenAI has entered into licensing agreements with several publishers, including prominent names like the Associated Press and Financial Times. While the specific terms of these deals remain undisclosed, it is reported that Dotdash, one of the partners, receives at least $16 million annually. The potential implications of this case and others like it could reshape the landscape of content usage and licensing for AI training. OpenAI's approach to using news articles for model training without explicit permission raises questions about copyright law's applicability in the age of artificial intelligence. Investigations into the circumstances of the data deletion are ongoing, highlighting the complexities of the situation. OpenAI has yet to issue a statement addressing the incident or its implications for its relationship with the plaintiffs.
[7]
New York Times Says OpenAI Erased Potential Lawsuit Evidence
Lawsuits are never exactly a lovefest, but the copyright fight between The New York Times and both OpenAI and Microsoft is getting especially contentious. This week, the Times alleged that OpenAI's engineers inadvertently erased data the paper's team spent more than 150 hours extracting as potential evidence. OpenAI was able to recover much of the data, but the Times' legal team says it's still missing the original file names and folder structure. According to a declaration filed to the court Wednesday by Jennifer B. Maisel, a lawyer for the newspaper, this means the information "cannot be used to determine where the news plaintiffs' copied articles" may have been incorporated into OpenAI's artificial intelligence models. "We disagree with the characterizations made and will file our response soon," OpenAI spokesperson Jason Deutrom told WIRED in a statement. The New York Times declined to comment. The Times filed its copyright lawsuit against OpenAI and Microsoft last year, alleging that the companies had illegally used its articles to train artificial intelligence tools like ChatGPT. The case is one of many ongoing legal battles between AI companies and publishers, including a similar lawsuit filed by the Daily News being handled by some of the same lawyers. The Times' case is currently in discovery, which means both sides are turning over requested documents and information that could become evidence. As part of the process, OpenAI was required by the court to show the Times its training data, which is a big deal -- OpenAI has never publicly revealed exactly what information was used to build its AI models. To disclose it, OpenAI created what the court is calling a "sandbox" of two "virtual machines" that the Times' lawyers could sift through. In her declaration, Maisel said that OpenAI engineers had "erased" data organized by the Times' team on one of these machines. According to Maisel's filing, OpenAI acknowledged that the information had been deleted, and attempted to address the issue shortly after it was alerted to it earlier this month. But when the paper's lawyers looked at the "restored" data, it was too disorganized, forcing them "to recreate their work from scratch using significant person-hours and computer processing time," several other Times lawyers said in a letter filed to the judge the same day as Maisel's declaration. The lawyers noted that they had "no reason to believe" that the deletion was "intentional." In emails submitted as an exhibit along with Maisel's letter, OpenAI counsel Tom Gorman referred to the data erasure as a "glitch."
[8]
OpenAI Accused of Evidence Tampering, Accidentally Erased Key Evidence in New York Times Lawsuit
Lawyers say there was no evidence the erasure was done on purpose. The legal battle between The New York Times and OpenAI has taken a new twist after the publisher accused the defendant of accidentally destroying evidence. Specifically, the Times said OpenAI engineers deleted information it had collected while reviewing ChatGPT training data in a court-ordered "sandbox" environment. OpenAI vs. The New York Times For the discovery phase of OpenAI vs.The New York Times, the court ordered OpenAI to create a virtual test environment where the plaintiffs can search through its training data for instances of copyrighted material. The sandbox was a compromise designed to let the Times lawyers identify when copyrighted articles had been used without OpenAI having to hand over entire training datasets. Given the size of the dataset and the vast number of articles that may be included in it (including Times articles published elsewhere), lawyers for the plaintiffs said they spent over 150 hours combing through the training data in the first two weeks of November -- work they have had to repeat. "On November 14, all of News Plaintiffs' programs and search result data stored on one of the dedicated virtual machines was erased by OpenAI engineers." Although OpenAI managed to recover some data, the letter said the folder structure and file names were irretrievably lost," a major setback for the investigation. The lawyers further allege that OpenAI has been uncooperative in performing requested searches or providing timely updates on progress. While they insisted they "have no reason to believe [the erasure] was intentional," the Times' legal team is clearly frustrated. The incident further underscores concerns about OpenAI's lack of transparency and raises questions about its handling of evidence. Non-Cooperation Could Hurt OpenAI's Defense Whether OpenAI intended to delete files or not, tampering with evidence is never a good look for a defendant. Courts often scrutinize patterns of non-cooperation. If a judge perceives OpenAI's actions as obstructive, sanctions could follow, potentially setting a precedent for how AI companies handle copyright challenges in training data. As a landmark case in the emerging field of AI copyright law, the legal battle between OpenAI and the New York Times could have important consequences for other ongoing litigation. Precedents established now could set the tone for years to come.
[9]
OpenAI "Accidentally" Deleted Evidence From Its New York Times Lawsuit
OpenAI made a major oopsie when its engineers accidentally deleted a bunch of evidence sought by the New York Times in its copyright lawsuit against the AI firm and its benefactor Microsoft. In a letter to the judge presiding over the suit, lawyers for the NYT and the New York Daily News said that a ton of evidentiary files went missing while the attorneys were perusing them. Earlier this month, the firm provided the publishers' attorneys with two massive caches of training data files from the newspapers, in keeping with its defense that because they were publicly published, it was fair to use those articles to train AI models. Since the beginning of November, the newspapers' attorneys had spent more than 150 hours sifting through those caches -- until, in the middle of the month, OpenAI engineers erased all of the search data in one of the caches. While the company managed to recover most of the data itself, the folder structure and file names were "irretrievably" lost, meaning that the newspapers' attorneys can't use them "to determine where the news plaintiffs' copied articles were used to build [OpenAI's] models." As a result, the lawyers for the NYT and the NYDN had to completely retrace their steps, losing a week of work in the process. Now, the attorneys are asking the judge to make OpenAI do the legwork caused by the apparent error because, as they put it, "OpenAI is in the best position to search its own datasets." While the attorneys noted that they have "no reason to believe" the erasure was intentional, that deletion has nevertheless set them back as they build their case against the firm. "The [newspapers] have also provided the information that OpenAI needs to run those searches," the letter reads. "All that is needed is for OpenAI to commit to doing so in a timely manner." Despite that seemingly reasonable request, it seems that OpenAI may be planning a rebuttal. "We disagree with the characterizations made," an OpenAI spokesperson told Wired, "and will file our response soon."
[10]
OpenAI Accidentally Deletes ChatGPT Training Data Amid Publisher Copyright Claims, Sparking Concerns Over Evidence Retention In Legal Cases
OpenAI has been in a bit of controversy with the press, as The New York Times and the Daily News have sued the AI giant and its investors, claiming that ChatGPT was trained using their copyrighted content. The lawyers' research data, that went into training AI models, was deleted by OpenAI engineers, supposedly by accident. The move potentially deleted the evidence The New York Times lawyers acquired against OpenAI. OpenAI is advancing rapidly in developing AI for businesses but faces obstacles to achieving a major breakthrough, while Apple's cautious approach is keeping Apple Intelligence steady. Tech giants are not shy of using copyrighted material to train different AI models with different sets of data. We have previously covered how AI companies not only used textual data but also YouTube videos, including MKBHD videos, to train their AI models. OpenAI previously agreed to open its AI platform for The New York Times and Daily News in an attempt for them to search for their own copyrighted material in the AI training sets. The publishers' experts spent a hefty amount of time curating the data that OpenAI had used to train ChatGPT since early November. While evidence could have supported the publishers' claims, OpenAI accidentally erased relevant data sets that went into training ChatGPT. Kyle Wiggers from TechCrunch states: Earlier this fall, OpenAI agreed to provide two virtual machines so that counsel for The Times and Daily News could perform searches for their copyrighted content in its AI training sets...In a letter, attorneys for the publishers say that they and experts they hired have spent over 150 hours since November 1 searching OpenAI's training data. But on November 14, OpenAI engineers erased all the publishers' search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Wednesday. To put it simply, OpenAI is accused of deleting the evidence or research conducted by the experts from The New York Times. You can check out the letter published online for more details. OpenAI was able to retrieve the deleted data but in a format that can not be used legally, making it unsuitable in the case of copyrighted material. It remains to be seen how the publishers will respond to the mishap and if any additional measures in the pipeline could allow them to proceed with their claims. As mentioned earlier, it remains to be seen how the legal teams pursue their case against OpenAI and possibly other tech giants for copyrighted material. We will keep you posted on the latest updates on the story, so be sure to stick around.
[11]
OpenAI accidentally erases potential evidence in training data lawsuit
In a stunning misstep, OpenAI engineers accidentally erased critical evidence gathered by The New York Times and other major newspapers in their lawsuit over AI training data, according to a court filing Wednesday. The newspapers' legal teams had spent over 150 hours searching through OpenAI's AI training data to find instances where their news articles were included, the filing claims. But it doesn't explain how this mistake occurred or what precisely the data included. While the filing says OpenAI admitted to the error and tried to recover the data, what it managed to salvage was incomplete and unreliable -- so what was recovered cannot help properly trace how the news organizations' articles were used in building OpenAI's AI models. While OpenAI's lawyers characterized the data erasure as a "glitch," The New York Times' attorneys noted they had "no reason to believe" it was intentional.
[12]
OpenAI accidentally erased ChatGPT training findings as lawyers seek copyright violations - 9to5Mac
The New York Times and Daily News have sued OpenAI and its investor Microsoft over suspicions that ChatGPT was trained on their copyrighted works. Now, it turns out, the lawyers' research into the training data was erased last week by OpenAI engineers, presumably by accident. Kyle Wiggers writes for TechCrunch: Earlier this fall, OpenAI agreed to provide two virtual machines so that counsel for The Times and Daily News could perform searches for their copyrighted content in its AI training sets...In a letter, attorneys for the publishers say that they and experts they hired have spent over 150 hours since November 1 searching OpenAI's training data. But on November 14, OpenAI engineers erased all the publishers' search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Wednesday. The aforementioned letter has been published online here for all to read. It seems that after NY Times lawyers spent significant time compiling data from ChatGPT's training set, their research was erased by OpenAI. The letter states that OpenAI was later able to recover much of the data -- but only in a form that makes it unusable in legal proceedings. Thus, it can't be deployed against OpenAI in the case, and the expensive and time consuming work begins anew. The training data used by various AI companies unfortunately remains shrouded in a lot of vagueness. Not every publisher has the resources to pursue legal action against tech giants, but to then have your work accidentally deleted by OpenAI engineers? It's a bad look, to say the least. What do you make of this story? Let us know in the comments.
[13]
The New York Times says OpenAI deleted evidence in its copyright lawsuit
Most of it has been recovered but key parts showing the AI's pattern of plagiarism is still missing. Astrophysicist Stephen Hawking told Last Week Tonight's John Oliver a chilling but memorable hypothetical story a decade ago about the potential dangers of AI. The gist is a group of scientists build a superintelligent computer and ask it, "Is there a God?" The computer answers, "There is now" and a bolt of lightning zaps the plug preventing it from being shut down. Let's hope that's not what happened with OpenAI and some missing evidence from the New York Times' plagiarism lawsuit. Wired reported that a court declaration filed by the New York Times on Wednesday says that OpenAI's engineers accidentally erased evidence of the AI's training data that took a long time to research and compile. OpenAI recovered some of the data but "the original file names and folder structure" that show when the AI copied its articles into its models are still missing. OpenAI spokesperson Jason Deutrom disagreed with the NYT's claims and says the company "will file our response soon." The Times has been battling Microsoft and OpenAI over alleged copyright infringement with its AI models since December of last year. The lawsuit is still in its discovery phase when evidence is requested and delivered by both sides to build its case for trial. OpenAI had to turn over its training data to the Times but hasn't publicly revealed the exact information it used to build the AI modes. Instead, OpenAI created a "sandbox" of two virtual machines so the NYT's legal team could conduct its research. The NYT's legal team spent more than 150 hours sifting through the data on one of the machines before the data was deleted. OpenAI acknowledged the deletion but the company's legal team called it a "glitch." Although OpenAI engineers tried to correct the mistake, the restored data was missing the NYT's work. This led the NYT to essentially recreate everything from scratch. The NYT's lawyers said they had no reason to believe the deletion was intentional.
Share
Share
Copy Link
OpenAI faces challenges in a copyright lawsuit as it accidentally erases crucial data during the discovery process, leading to delays and complications in the legal battle with The New York Times and Daily News.
In a significant development in the ongoing copyright lawsuit against OpenAI, the artificial intelligence company has accidentally deleted potential evidence, causing delays and complications in the legal proceedings. The New York Times and Daily News, plaintiffs in the case, have reported that OpenAI engineers inadvertently erased crucial data during the discovery process 1.
On November 14, 2024, OpenAI engineers erased programs and search result data stored on one of the dedicated virtual machines provided for the plaintiffs' counsel to perform searches for copyrighted content in OpenAI's training datasets 2. While OpenAI managed to recover much of the data, the folder structure and file names were irretrievably lost, rendering the recovered data unusable for determining where the plaintiffs' copied articles were used in building OpenAI's models 3.
The incident has led to conflicting accounts from both parties. The plaintiffs' counsel claims that over 150 hours of work has been lost, necessitating the recreation of their work from scratch 4. OpenAI, however, denies deleting any evidence and attributes the issue to a system misconfiguration requested by the plaintiffs, which led to technical problems 2.
At the core of this legal battle is OpenAI's assertion that training AI models using publicly available data, including articles from The New York Times and Daily News, constitutes fair use 5. The company maintains that it is not required to license or pay for the examples used in training its models, even if it profits from them 3.
This case highlights the complex intersection of AI technology and copyright law. As AI companies continue to develop large language models trained on vast amounts of data, questions about fair use, compensation, and the rights of content creators remain at the forefront of legal and ethical discussions in the tech industry 5.
The accidental deletion of data has underscored the challenges in conducting discovery in such technologically complex cases. The plaintiffs argue that this incident demonstrates that OpenAI is best positioned to search its own datasets for potentially infringing content 4. As the legal proceedings continue, the outcome of this case could have significant implications for the future of AI development and copyright law in the digital age.
Reference
[2]
A federal judge has dismissed a copyright lawsuit against OpenAI, filed by news outlets Raw Story and AlterNet, citing lack of evidence of harm. The case centered on OpenAI's use of news articles for AI training without consent.
10 Sources
10 Sources
Suchir Balaji, a former OpenAI employee, speaks out against the company's data scraping practices, claiming they violate copyright law and pose a threat to the internet ecosystem.
6 Sources
6 Sources
OpenAI, the company behind ChatGPT, has responded to copyright infringement lawsuits filed by authors, denying allegations and asserting fair use. The case highlights the ongoing debate surrounding AI and intellectual property rights.
3 Sources
3 Sources
Major Canadian news organizations have filed a lawsuit against OpenAI, claiming copyright infringement and seeking billions in damages for the unauthorized use of their content in training AI models like ChatGPT.
22 Sources
22 Sources
OpenAI refutes claims of using Indian media content to train ChatGPT in a copyright lawsuit, stating it has no obligation to partner with media outlets for publicly available content. The case, initiated by ANI, now involves major Indian media groups.
7 Sources
7 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved