Curated by THEOUTPOST
On Tue, 19 Nov, 8:03 AM UTC
7 Sources
[1]
HarperCollins strikes AI training deal with unnamed company amid rising copyright tensions between publishers and AI firms
Publishing giants and generative artificial intelligence companies are striking deals that aim to both protect copyright and provide for the rapidly increasing needs of the AI industry. US publishing giant HarperCollins has reached a contract with an unnamed tech company allowing it to use some of its books to train its generative AI models. In a letter seen by AFP, the tech company is proposing a payment of $2,500 per selected book to train its so-called large language model (LLM) for up to three years. AI models need massive quantities of texts to train their everyday language use. "HarperCollins has reached an agreement with an artificial intelligence technology company to allow limited use of select nonfiction backlist titles for training AI models to improve model quality and performance," the publisher said in a statement. It said the agreement has "limited scope and clear guardrails around model output that respects author's rights." Authors "have the choice to opt in to the agreement or to pass on the opportunity", it added. The offer has had a mixed reception in the publishing world, with writers such as Daniel Kibblesmith curtly declining. "I'd probably do it for a billion dollars. I'd do it for a sum of money that wouldn't require me to work anymore, since that's the ultimate goal of this technology," the author posted on the Bluesky social network. HarperCollins is one of the largest publishers to reach such an accord, but not the first. US scientific publisher Wiley said it has allowed "access to previously published academic and professional book content for specific use in training LLM models" in a $23 million contract with an unidentified "large tech company". The accords underscore the tension behind AI models, which collect huge quantities of content on the web, creating the risk of widespread copyright violations. Giada Pistilli, head of ethics at Hugging Face, a French-American open-access AI platform, said these agreements are a step forward since they involve payments to publishers. But she regrets that they leave little room for the authors to negotiate. "What we are going to see is a mechanism of bilateral agreements between new technology companies and publishers or copyright holders, whereas in my opinion, we need a broader conversation that includes stakeholders a little more," she said. Julien Chouraqui, legal director at the French publishing union (SNE), said the accords represented "progress". "An agreement means that there has been a dialogue and a desire to achieve a balance between the use of source data, which are subject to copyright and which will generate value," he said. The press is also organising to face the challenges created by AI. In late 2023, The New York Times sued OpenAI, creator of ChatGPT, as well as Microsoft, its main investor, for violating copyright protections. Other media groups have cut deals with OpenAI. Tech companies may have no choice but to pay out to improve their products, especially as they are starting to run out of new materials to power their models. "On the web, you find lots of licit and illicit stiff, and lots of pirated copy. That not only causes legal problems but also raises issues about the quality of the data," said Chouraqui at the SNE. "If we are committed to developing a market on a virtuous basis, we must involve all the players," he said.
[2]
To maintain growth, AI firms seek accords with publishing giants
Publishing giants and generative artificial intelligence companies are striking deals that aim to both protect copyright and provide for the rapidly increasing needs of the AI industry. US publishing giant HarperCollins has reached a contract with an unnamed tech company allowing it to use some of its books to train its generative AI models. In a letter seen by AFP, the tech company is proposing a payment of $2,500 per selected book to train its so-called large language model (LLM) for up to three years. AI models need massive quantities of texts to train their everyday language use. "HarperCollins has reached an agreement with an artificial intelligence technology company to allow limited use of select nonfiction backlist titles for training AI models to improve model quality and performance," the publisher said in a statement. It said the agreement has "limited scope and clear guardrails around model output that respects author's rights." Authors "have the choice to opt in to the agreement or to pass on the opportunity", it added. The offer has had a mixed reception in the publishing world, with writers such as Daniel Kibblesmith curtly declining. "I'd probably do it for a billion dollars. I'd do it for a sum of money that wouldn't require me to work anymore, since that's the ultimate goal of this technology," the author posted on the Bluesky social network. HarperCollins is one of the largest publishers to reach such an accord, but not the first. US scientific publisher Wiley said it has allowed "access to previously published academic and professional book content for specific use in training LLM models" in a $23 million contract with an unidentified "large tech company". The accords underscore the tension behind AI models, which collect huge quantities of content on the web, creating the risk of widespread copyright violations. 'A broader conversation' Giada Pistilli, head of ethics at Hugging Face, a French-American open-access AI platform, said these agreements are a step forward since they involve payments to publishers. But she regrets that they leave little room for the authors to negotiate. "What we are going to see is a mechanism of bilateral agreements between new technology companies and publishers or copyright holders, whereas in my opinion, we need a broader conversation that includes stakeholders a little more," she said. Julien Chouraqui, legal director at the French publishing union (SNE), said the accords represented "progress". "An agreement means that there has been a dialogue and a desire to achieve a balance between the use of source data, which are subject to copyright and which will generate value," he said. The press is also organizing to face the challenges created by AI. In late 2023, The New York Times sued OpenAI, creator of ChatGPT, as well as Microsoft, its main investor, for violating copyright protections. Other media groups have cut deals with OpenAI. Tech companies may have no choice but to pay out to improve their products, especially as they are starting to run out of new materials to power their models. "On the web, you find lots of licit and illicit stiff, and lots of pirated copy. That not only causes legal problems but also raises issues about the quality of the data," said Chouraqui at the SNE. "If we are committed to developing a market on a virtuous basis, we must involve all the players," he said.
[3]
HarperCollins Is Asking Its Authors to Sell Books for A.I. Training
A new deal between the publisher and Microsoft will use nonfiction books to improve A.I. models. Earlier this month, Daniel Kibblesmith received an emailed memo from HarperCollins, one of the world's largest publishing companies, offering $2,500 to license his 2017 children's book Santa's Husband over a three-year period. The catch? The title would be licensed to a tech company to help train an A.I. model. "Abominable," the author wrote of the offer in a post on the microblogging site Bluesky. Sign Up For Our Daily Newsletter Sign Up Thank you for signing up! By clicking submit, you agree to our <a href="http://observermedia.com/terms">terms of service</a> and acknowledge we may use your information to send you emails, product samples, and promotions on this website and other properties. You can opt out anytime. See all of our newsletters With their troves of high-quality content, book publishers have emerged as an enticing target for A.I. companies in need of data to enhance the capabilities and knowledge of their A.I. systems. HarperCollins, a British-American publishing company and member of the "Big Five" publishing group, recently inked a partnership with Microsoft that will see some of its nonfiction books used to help the company train a new model, as reported by Bloomberg. In a statement to Observer, HarperCollins confirmed that it has "reached an agreement with an artificial intelligence technology company to allow limited use of select nonfiction backlist titles for training A.I. models." Microsoft (MSFT) declined requests for comment. HarperCollins noted that authors will be given the option to take or pass on the opportunity. "Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams," the publisher said. "This agreement, with its limited scope and clear guardrails around model output that respects author's rights, does that." The deal's guardrails include limiting the output of A.I. models to no more than 5 percent of a book's text, according to a statement from the Authors Guild, the largest professional organization of writers in the U.S. HarperCollins' A.I. licensing partnership will result in a $5,000 fee per title split evenly between the publisher and the author, said the organization. Although the Authors Guild described this arrangement as giving "far too much to the publisher," it lauded the fact that HarperCollins will request individual permission from writers and described licensing as a way to "bring control over uses back to the authors and their partners." Alongside writers like George R.R. Martin, Jonathan Franzen and Jodi Picoult, the Authors Guild last year sued OpenAI for allegedly using their work to train models without permission. Various authors have also filed similar copyright lawsuits against the likes of Anthropic, Meta (META) and Microsoft for training A.I. models on datasets of pirated books. Publishers' A.I. deals proliferate These concerns haven't stopped publishers from striking lucrative deals with major tech companies. Academic publishers Wiley and Taylor & Francis earlier this year partnered with various A.I. developers to provide content for A.I. training, with Microsoft reportedly offering $10 million to the latter for access to its data. Oxford University Press has also said that it's working with A.I. companies, while MIT Press recently told 404 Media it has been approached with several A.I. training offers. As they run out of accessible high-quality data online, A.I. developers are increasingly seeking out new ways to get their hands on reliable and accurate content. News Corp, the parent company of HarperCollins, in May struck an agreement to provide stories from its news publications like the Wall Street Journal, Barron's and the New York Post to OpenAI, which has similar deals with a bevy of publications including the Atlantic, Vox Media, the Associated Press, the Financial Times and Time Magazine. Microsoft, too, has content licensing arrangements with the likes of Reuters, Hearst Magazines and Axel Springer. Microsoft's data access could soon expand significantly, as HarperCollins has already sent out requests to license books from thousands of writers, according to the Authors Guild. How many authors will actually opt in, however, is yet to be seen. In replies to his Bluesky post, Kibblesmith jokingly said he probably wouldn't take such a deal unless it was worth $1 billion. "I'd do it for an amount of money that wouldn't require me to work anymore, since that's the end goal of this technology," he wrote.
[4]
HarperCollins Inks AI Training Deal, But It Needs Authors to Opt-In
HarperCollins Publishers has struck a licensing deal with an AI tech company and is now asking its authors to opt in. As 404 Media reports, the AI company will pay HarperCollins for access to select "backlist titles," or books that have been out for at least a year. The content will be used to train its AI, but only if the authors agree to it. Terms of the HarperCollins deal were not disclosed, but one writer said he was offered $2,500 per book with a three-year license. "Abominable," author Daniel Kibblesmith wrote on Bluesky about the proposal, which sought permission to scrape his book Santa's Husband. A HarperCollins spokesperson told 404 Media that the offer is "part of our role..to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams. "This agreement, with its limited scope and clear guardrails around model output that respects author's rights, does that." Last month, rival publishing house Penguin Random House took action to block firms from training AI systems on its content. It amended the copyright wording on all its titles worldwide and across all its imprints to read: "No part of this book may be used or reproduced in any manner to train artificial intelligence technologies or systems."
[5]
HarperCollins is asking authors to license their books for AI training
HarperCollins has agreed with an unnamed AI tech company to let the company use some nonfiction titles to train its models, 404 Media reports, but only if authors opt-in to having their books be used for training. Some authors are currently suing companies like OpenAI, accusing them of copyright infringement for training AI models on their works without permission. According to a statement HarperCollins gave to 404 Media, the agreement protects authors' "underlying value of their works and our shared revenue and royalty streams." Author Daniel Kibblesmith posted screenshots of an email showing that he would be paid $2,500 if he allowed one of his books to be licensed. Here is the full statement given to 404 Media: HarperCollins has reached an agreement with an artificial intelligence technology company to allow limited use of select nonfiction backlist titles for training AI models to improve model quality and performance. While we believe this deal is attractive, we respect the various views of our authors, and they have the choice to opt in to the agreement or to pass on the opportunity. HarperCollins has a long history of innovation and experimentation with new business models. Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams. This agreement, with its limited scope and clear guardrails around model output that respects author's rights, does that. HarperCollins didn't immediately reply to a request for comment from The Verge.
[6]
HarperCollins to allow tech firms to use its books to train AI models
Some nonfiction backlist titles will be used to train artificial intelligence with authors' permission Publisher HarperCollins will allow some of its titles to be used to train AI models, with the permission of authors. The company "has reached an agreement with an artificial intelligence technology company to allow limited use of select nonfiction backlist titles for training AI models to improve model quality and performance", it said in a statement shared with the Guardian. "While we believe this deal is attractive, we respect the various views of our authors, and they have the choice to opt in to the agreement or to pass on the opportunity," it added. The move comes after US children's author Daniel Kibblesmith revealed last week that he had been offered $2,500 (£2,000)for permission to use one of his books published by HarperCollins to train AI models. Kibblesmith published a series of screenshots of an email from the agency that represented his 2017 book Santa's Husband in a post on the social media site Bluesky. The email states that titles would be licensed for three years, "with certain protections concerning credit and limits of verbatim usage per AI response". According to the email, the terms are non-negotiable and have been "agreed to by several hundred authors". It also says that HarperCollins has been required to keep the "company's identify [sic] confidential", but that the agency has "good reason to believe it is a major and respected company". While HarperCollins says that the deal will involve nonfiction titles, the book Kibblesmith was approached about, Santa's Husband, is a children's fiction book. "HarperCollins has a long history of innovation and experimentation with new business models," the company said in its statement. "Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams. This agreement, with its limited scope and clear guardrails around model output that respects authors' rights, does that." In a follow-up to the screenshots, Kibblesmith encouraged people to "direct any outrage toward the incredibly doable action of purchasing physical books by living authors from local bookstores." In April, it was announced that HarperCollins would partner with AI audio company ElevenLabs to produce audiobooks for its foreign language business that would not otherwise be created. At the time, the publisher said that while it would "continue to devote time and resources to voice actor-led productions", AI would be "leveraged as a complementary tool to enable a broader number of audiobooks for backlist series books in non-English markets".
[7]
HarperCollins Confirms It Has a Deal to Sell Authors' Work to AI Company
The Big Five publisher made a deal with an unnamed "artificial intelligence technology company" and is allowing authors to opt-in if they want to join the agreement. HarperCollins, one of the biggest publishers in the world, made a deal with an "artificial intelligence technology company" and is giving authors the option to opt in to the agreement or pass, 404 Media can confirm. A spokesperson for HarperCollins told 404 Media in a statement: "HarperCollins has reached an agreement with an artificial intelligence technology company to allow limited use of select nonfiction backlist titles for training AI models to improve model quality and performance. While we believe this deal is attractive, we respect the various views of our authors, and they have the choice to opt in to the agreement or to pass on the opportunity. "HarperCollins has a long history of innovation and experimentation with new business models. Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams. This agreement, with its limited scope and clear guardrails around model output that respects author's rights, does that." On Friday, author Daniel Kibblesmith, who wrote the children's book Santa's Husband and published it with HarperCollins, posted screenshots on Bluesky of an email he received, seemingly from his agent, informing him that the agency was approached by the publisher about the AI deal. "Let me know what you think, positive or negative, and we can handle the rest of this for you," the screenshotted text in an email to Kibblesmith says. The screenshots show the agent telling Kibblesmith that HarperCollins was offering $2,500 (non-negotiable). "You are receiving this memo because we have been informed by HarperCollins that they would like permission to include your book in an overall deal that they are making with a large tech company to use a broad swath of nonfiction books for the purpose of providing content for the training of an Al language learning model," the screenshots say. "You are likely aware, as we all are, that there are controversies surrounding the use of copyrighted material in the training of Al models. Much of the controversy comes from the fact that many companies seem to be doing so without acknowledging or compensating the original creators. And of course there is concern that these Al models may one day make us all obsolete." "It seems like they think they're cooked, and they're chasing short money while they can. I disagree," Kibblesmith told the AV Club. "The fear of robots replacing authors is a false binary. I see it as the beginning of two diverging markets, readers who want to connect with other humans across time and space, or readers who are satisfied with a customized on-demand content pellet fed to them by the big computer so they never have to be challenged again."
Share
Share
Copy Link
HarperCollins has reached an agreement with an unnamed AI company to use select nonfiction books for AI model training, offering authors $2,500 per book. The deal highlights growing tensions between publishers, authors, and AI firms over copyright and compensation.
HarperCollins, one of the world's largest publishing companies, has entered into an agreement with an unnamed artificial intelligence technology company to allow the use of select nonfiction backlist titles for training AI models 1. This move comes amid rising tensions between publishers, authors, and AI firms over copyright issues and the use of written content for AI training.
Under the terms of the agreement, the AI company is proposing a payment of $2,500 per selected book to train its large language model (LLM) for up to three years 2. HarperCollins has emphasized that authors have the choice to opt in or pass on this opportunity, respecting the various views of its authors 1.
The publisher stated that the agreement has a "limited scope and clear guardrails around model output that respects author's rights" 3. These guardrails include limiting the output of AI models to no more than 5% of a book's text, according to the Authors Guild 5.
The offer has received a mixed reception in the publishing world. Some authors, like Daniel Kibblesmith, have publicly declined the offer, describing it as "abominable" 4. Kibblesmith jokingly stated he would only consider such a deal for a sum that would eliminate his need to work, highlighting the concerns many authors have about AI potentially replacing human writers 5.
HarperCollins is not the first publisher to reach such an accord. US scientific publisher Wiley has also allowed access to its academic and professional book content for AI training in a $23 million contract with an unidentified "large tech company" 2. Other publishers like Taylor & Francis and Oxford University Press have also been approached with or are working on similar deals 5.
The agreements underscore the ongoing tension surrounding AI models, which collect vast amounts of content from the web, raising concerns about potential copyright violations 2. In response to these concerns, some authors and publishers have taken legal action. The New York Times, for instance, sued OpenAI and Microsoft in late 2023 for alleged copyright infringement 2.
Giada Pistilli, head of ethics at Hugging Face, views these agreements as a step forward since they involve payments to publishers. However, she expresses concern that they leave little room for authors to negotiate 2. Julien Chouraqui, legal director at the French publishing union (SNE), sees the accords as progress, indicating a dialogue and desire to balance the use of copyrighted source data 2.
As AI companies face challenges in finding new, high-quality data to power their models, these deals may become increasingly common. The publishing industry is grappling with how to protect copyright while also potentially benefiting from the growing AI sector. The outcome of these early agreements and ongoing legal battles will likely shape the future relationship between the publishing world and AI technology 35.
Reference
Microsoft has entered into a licensing agreement with HarperCollins to use nonfiction books for training an unreleased AI model, aiming to improve model quality and performance without generating AI-written books.
6 Sources
6 Sources
Meta faces legal challenges for allegedly using pirated books to train AI, raising questions about copyright infringement and fair use in the AI industry. The case highlights growing tensions between tech companies and content creators.
2 Sources
2 Sources
Penguin Random House, the world's largest trade publisher, has updated its copyright pages to prohibit the use of its books for training AI systems, marking a significant move in the ongoing debate over AI and copyright.
6 Sources
6 Sources
New research reveals that major AI companies like OpenAI, Google, and Meta prioritize high-quality content from premium publishers to train their large language models, sparking debates over copyright and compensation.
2 Sources
2 Sources
A group of authors has filed a lawsuit against AI company Anthropic, alleging copyright infringement in the training of their AI chatbot Claude. The case highlights growing concerns over AI's use of copyrighted material.
14 Sources
14 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved