Curated by THEOUTPOST
On Thu, 1 Aug, 12:08 AM UTC
3 Sources
[1]
Reddit CEO wants Microsoft to pay for its content
Steve Huffman called out Microsoft for scraping data from Reddit in an interview with The Verge. Reddit's chief executive officer Steve Huffman didn't pull any punches with Microsoft in an interview with The Verge. He called out Microsoft and other AI search engines like Anthropic and Perplexity for not paying for the information they take from Reddit, some of which have already been blocked from Huffman's website. Reddit has deals in places with companies like Google and OpenAI to receive compensation for using its posts and information. Huffman says Microsoft, however, hasn't even stepped up to the table to discuss its use of Reddit's content in its AI searches. "Without these agreements, we don't have any say or knowledge of how our data is displayed and what it's used for, which has put us in a position now of blocking folks who haven't been willing to come to terms with how we'd like our data to be used or not used," Huffman told The Verge's deputy editor Alex Heath. Huffman says if Microsoft and other AI search sites continue to use Reddit's information without proper compensation, they'll have to be blocked. He doesn't want to do that because it's "a real pain in the ass to block these companies." Reddit has started cracking down on search engines that expunge information from its various forms and communities. The website vowed to block unauthorized data scraping in June by updating its Robots Exclusion Protocol (robots.txt) and it's already prevented Bing from accessing data from Reddit, a fact confirmed by Microsoft's head of search Jordi Ribas on X. Earlier this month, a source confirmed to Engadget's Will Shanklin that Microsoft's refusal to work with Reddit's terms of service led to the blocking of Bing. A spokesperson from Reddit also said, "Anyone accessing Reddit content must abide by our policies, including those in place to protect redditors. We are selective about who we work with and trust with large-scale access to Reddit content."
[2]
Reddit CEO calls out Microsoft, others for scraping site without permission: report
Reddit (NYSE:RDDT) CEO Steve Huffman called out Microsoft (NASDAQ:MSFT), Anthropic and Perplexity for scraping his site's data without permission, in an interview with The Verge. Huffman said Microsoft, Anthropic and Perplexity have all used Reddit's data to train their artificial intelligence models, according to the report. "We've had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use," Huffman said in the interview. "That's their real position." Other tech companies have established deals with Reddit before scraping data. Google (GOOG)(GOOGL) reached a deal with Reddit early this year that allowed access to its content to train the search giant's AI models. The contract was worth approximately $60M per year. OpenAI also entered into an agreement with Reddit in March, which allows ChatGPT to learn from Reddit content in real time. "Reddit has become one of the internet's largest open archives of authentic, relevant, and always up-to-date human conversations about anything and everything," said Huffman. Reddit was up 3.6% shortly before markets closed on Wednesday. More on Microsoft, Reddit, Inc., etc. Microsoft Q4: Buy The Correction Microsoft Q4 Earnings: AI Capital Expenditures Remain Strong And Growing Microsoft Corporation 2024 Q4 - Results - Earnings Call Presentation Nancy Pelosi buys more Nvidia, sells Microsoft 'In the eye of the beholder:' Microsoft slips as Wall Street debates AI upside
[3]
Reddit CEO stands by change that blocks most non-Google search engines
Reddit CEO Steve Huffman is standing by Reddit's decision to block companies from scraping the site without an AI agreement. Last week, 404 Media noticed that search engines that weren't Google were no longer listing recent Reddit posts in results. This was because Reddit updated its Robots Exclusion Protocol (txt file) to block bots from scraping the site. The file reads: "Reddit believes in an open Internet, but not the misuse of public content." Since the news broke, OpenAI announced SearchGPT, which can show recent Reddit results. The change came a year after Reddit began its efforts to stop free scraping, which Huffman initially framed as an attempt to stop AI companies from making money off of Reddit content for free. This endeavor also led Reddit to begin charging for API access (the high pricing led to many third-party Reddit apps closing). In an interview with The Verge today, Huffman stood by the changes that led to Google temporarily being the only search engine able to show recent discussions from Reddit. Reddit and Google signed an AI training deal in February said to be worth $60 million a year. It's unclear how much Reddit's OpenAI deal is worth. Huffman said: Without these agreements, we don't have any say or knowledge of how our data is displayed and what it's used for, which has put us in a position now of blocking folks who haven't been willing to come to terms with how we'd like our data to be used or not used. Per The Verge, Huffman claimed that Microsoft, Anthropic, and Perplexity haven't been negotiating. The three companies haven't commented on Huffman's interview. "[It's been] a real pain in the ass to block these companies," Huffman told The Verge. A person familiar with the matter previously told Ars that Microsoft has refused to enter an agreement that adheres to Reddit's data-privacy rules. Speaking with The Verge, Huffman claimed Microsoft previously used data from Reddit for AI training and Bing result summaries but didn't tell Reddit. He also claimed that data from Reddit has "been sold through the Bing API to other search engines," per The Verge. AI debate A Microsoft spokesperson told me last week that "Microsoft respects the robots.txt standard and we honor the directions provided by websites that do not want content on their pages to be used with our generative AI models." But as The Verge pointed out, Jordi Ribas, corporate VP of search and AI at Microsoft, took to X on July 29 to emphasize how the changes to Reddit favor Google "impacting competition from Bing and Bing-powered engines." Huffman also reportedly made reference to a June CNBC interview where Mustafa Suleyman, CEO of Microsoft AI, said: "I think that with respect to content that is already on the open web, the social contract of that content since the '90s has been that it is fair use. Anyone can copy it, re-create with it, reproduce with it. That has been freeware, if you like. That's been the understanding." Suleyman added that his comment didn't refer to certain types of web content, like news organizations. "We've had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use. That's their real position," Huffman said. Reddit hasn't disclosed how much money is needed for deals that would enable scraping from Microsoft, Perplexity, Anthropic, or smaller companies. Reddit spokesperson Tim Rathschmidt told Ars last week that Reddit has been speaking "with multiple search engines" and that Reddit's "open to working with partners big and small." It is likely that Reddit is targeting big AI deals, which it views as an important part of its business. Colin Hayhurst, CEO of search engine Mojeek, told Ars last week that Reddit didn't respond to his emails about Mojeek getting blocked until 404 Media's report came out. Reddit's efforts to find new revenue streams as it attempts to become profitable for the first time have been riddled with hiccups, including a massive user protest in response to Reddit's API rule changes. The company is seeking to strike deals at a time when publishers, the music industry, and more are grappling with the legality of AI bots and looking to set precedence. Reddit's reliance on free, user-generated content brings further complications to the debate. Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.
Share
Share
Copy Link
Reddit's CEO Steve Huffman calls out Microsoft and other companies for scraping the site's content without permission. He implements changes to block most non-Google search engines, sparking controversy in the tech industry.
Reddit CEO Steve Huffman has taken a firm stance against companies using Reddit's content without permission or compensation. In a recent interview, Huffman specifically called out Microsoft, stating that the tech giant should pay for access to Reddit's content 1. This move comes as Reddit prepares for its initial public offering (IPO) and seeks to monetize its vast user-generated content.
Huffman's primary concern revolves around Microsoft's AI models, which he believes are being trained on Reddit's data without proper authorization. He emphasized that while Reddit is open to partnerships, companies must obtain permission and compensate the platform for using its content 2. This stance reflects a growing trend in the tech industry where content creators and platforms are becoming increasingly protective of their data.
In a controversial move, Reddit has implemented changes that effectively block most non-Google search engines from accessing its content 3. Huffman defended this decision, arguing that it's necessary to prevent unauthorized scraping and protect the platform's content. This change has significant implications for search engine diversity and user choice.
The decision to block non-Google search engines has sparked debate within the tech community. Critics argue that this move reduces competition and user options, potentially leading to a more monopolistic internet landscape. Supporters, however, see it as a necessary step to protect Reddit's intellectual property and ensure fair compensation for content creators.
As Reddit approaches its IPO, the company is exploring various avenues to increase its revenue. By demanding payment for content usage and implementing stricter access controls, Reddit aims to capitalize on its vast repository of user-generated content. This strategy aligns with broader industry trends where social media platforms are seeking new ways to monetize their data and user base.
Huffman's statements highlight the ongoing debate surrounding the use of publicly available data for training AI models. As artificial intelligence and machine learning technologies continue to advance, questions about data ownership, fair use, and compensation are becoming increasingly complex. Reddit's stance could potentially influence how other platforms approach the issue of AI companies using their content for training purposes.
Reference
[2]
Reddit has implemented a new policy blocking non-Google search engines from indexing its content, raising concerns about internet openness and search engine competition.
6 Sources
6 Sources
Reddit reports mixed Q4 results with strong revenue growth but missed user targets. The company highlights AI initiatives and partnerships as key drivers for future growth.
4 Sources
4 Sources
Google and Reddit have expanded their partnership, with Reddit now using Google's Vertex AI to improve its search functionality. This move comes amid concerns about Reddit's user growth and its dependence on Google Search traffic.
3 Sources
3 Sources
Reddit reports its first profit in nearly 20 years, with significant user growth and revenue increase. AI-powered features and data licensing deals with tech giants contribute to this milestone.
11 Sources
11 Sources
Freelancer.com's CEO Matt Barrie alleges that AI company Anthropic engaged in unauthorized data scraping from their platform. The accusation raises questions about data ethics and the practices of AI companies in training their models.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved