2 Sources
[1]
Bright Data beat Elon Musk and Meta in court -- now its $100M AI platform is taking on Big Tech
Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Bright Data, the Israeli web scraping company that defeated both Meta and Elon Musk's X in federal court, unveiled a comprehensive AI infrastructure suite Wednesday designed to give artificial intelligence systems unfettered access to real-time web data -- a capability the company argues Big Tech platforms are trying to monopolize. The announcement of Deep Lookup, Browser.ai, and enhanced data collection protocols represents a dramatic expansion for the decade-old company, which has transformed from a specialized web scraping service into what CEO Or Lenchner calls "a unique infrastructure layer for AI companies." The move comes as artificial intelligence companies increasingly struggle to access current web information needed to power chatbots, autonomous agents, and other AI applications. "The intelligence of today's LLMs is no longer its limiting factor; access is," Lenchner said in an exclusive interview with VentureBeat. "We've spent the last decade fighting for open access to public web data, and these new offerings bring us to the next chapter in our journey, one characterized by truly accessible data and the subsequent rise of contextually-aware agents." The launch follows Bright Data's high-profile legal victories in 2024, when federal judges dismissed lawsuits from both Meta and X alleging the company illegally scraped their platforms. Those rulings established crucial legal precedent defining what constitutes "public data" on the internet -- information that can be viewed without logging in and therefore can be legally collected and used. Court wins against Meta and X establish legal precedent for web scraping rights The court cases revealed that both Meta and X had been Bright Data customers even while suing the company, highlighting the contradictory stance many tech giants have taken toward web scraping. The rulings have broader implications for the AI industry, which relies heavily on web data to train and operate language models. "It was revealed in court that both of them were a Bright Data customer, because everyone needs data, everyone, especially those who are building models," Lenchner explained. "We are the only company that has the financial resources, and I would even say the courage to do that." Judge William Alsup, who presided over the X case, wrote that giving social media companies "free rein to decide, on any basis, who can collect and use data" risks creating "information monopolies that would disserve the public interest." The ruling established that data viewable without login credentials constitutes public information that can be legally scraped. Bright Data has now filed a countersuit against X, alleging the platform violated antitrust laws by trying to create a data monopoly to benefit Musk's AI company, xAI. "The only reason that X are trying to stop Bright Data from allowing its customers to scrape X is that they will be the only entity that can enjoy the relevant quality data that X produces," Lenchner said. Deep Lookup and Browser.ai target AI companies struggling with data access The company's new products address what Lenchner identifies as the three core requirements for AI systems: algorithms, compute power, and data access. While Bright Data doesn't develop AI algorithms or provide computing resources, it aims to become the definitive solution for the third requirement. Deep Lookup functions as a natural language research engine designed to answer complex, multi-layered business questions in real-time. Unlike general-purpose search engines or AI chatbots that provide summaries, Deep Lookup specializes in comprehensive results for queries beginning with "find all." For example, users can ask for "all shipping companies that went through the Panama and Suez canals in 2023 whose Q3 revenues declined by over 2 percent." The system draws from Bright Data's massive web archive, which currently contains over 200 billion HTML pages and adds 15 billion monthly. By next year, the archive is expected to exceed 500 billion pages. "It's not just random web pages, it's actually what the world cares about, because our 20,000 customers represent billions of internet users," Lenchner noted. Browser.ai represents what the company calls "the industry's first unblockable, AI-native browser." Designed specifically for autonomous AI agents, the cloud-based service mimics human behavior to access websites without triggering bot detection systems. It supports natural language commands and can perform complex web interactions like booking flights or making restaurant reservations. The browser infrastructure already processes over 150 million web actions daily, according to the company. "Almost all of them are customers," Lenchner said of AI agent companies that have raised significant funding. "Because what we figured out, and they figured out, is that we solve that problem of entering a website without being blocked and executing web actions on the website." MCP Servers (Model Context Protocol) provides a low-latency control layer enabling AI agents to search, crawl, and extract live data in real-time. The protocol allows developers to build AI systems that can act on current information rather than relying solely on training data. Patent portfolio and proxy network create competitive moat against blocking Bright Data's competitive advantage stems from what Lenchner describes as an "obsession" with overcoming website blocking mechanisms. The company holds over 5,500 patent claims on its technology and operates the world's largest proxy network with more than 150 million IP addresses across 195 countries. "We have such a good look into the internet," Lenchner explained. "For a long time now, we have been mapping the internet, and for a long time now, we're also archiving big chunks of the internet." The company's approach involves sophisticated techniques to mimic human behavior, using real devices, IP addresses, and browser fingerprints rather than simple automated scripts. This makes detection and blocking extremely difficult for websites. "The only way to block us, practically, is to put the data behind the login, then we won't even try," Lenchner said. "Sometimes there is a new blocking logic that we won't solve immediately. It will take our research team 12 hours, three days that's like the most it was, and we will unlock it." Revenue surpasses $100 million as AI demand explodes post-ChatGPT While Bright Data remains privately held by a private equity firm, Lenchner confirmed with VentureBeat the company's annual recurring revenue significantly exceeds $100 million. The business has experienced explosive growth since the launch of ChatGPT in late 2022, as AI companies scrambled to access training data and real-time information. "Starting March 2023, which is pretty much when GPT-3 changed the world, the AI, or what we call the data for AI, use case just absolutely exploded for us as a company," Lenchner said. "Everything else is also growing, because everyone needs more data, period. But this use case is just like nothing we've seen before." The company serves over 20,000 businesses, including Fortune 500 companies and major AI laboratories. Traditional customers include e-commerce platforms tracking competitor pricing, financial services firms seeking market intelligence, and enterprises conducting business research. GDPR compliance and ethical practices differentiate from competitors Bright Data has invested heavily in compliance infrastructure to address privacy concerns around data collection. The company follows European GDPR and California CCPA regulations, automatically notifying individuals when their personal information is collected from public sources and providing deletion options. "The regulation and the legislation are clear since the European GDPR and at least California and CCPA regulations came to play," Lenchner explained. "If we collected your email address, for example, we will automatically send you an email saying, 'Hey, this is who we are. We collected your personal information from the public domain. Here's a huge button you can click if you want to review it, and you can obviously ask to delete it.'" The company maintains a large compliance team and extensive documentation of its practices, which proved valuable during court proceedings. "We enterprises especially love us because we have our ethical stand that was scrutinized in US courts twice," Lenchner said. Web access wars intensify as tech giants seek data monopolies The battle over web data access reflects broader tensions in the AI industry about information control and competitive advantage. As AI systems become more sophisticated, access to current, comprehensive web data becomes increasingly valuable -- and contentious. Lenchner predicts the web will become "more closed" over time, similar to how Google maintains exclusive access to its web crawling capabilities while others must use alternative services. "A few tech giants are gonna get free access to every website with their agents," he said. "The rest will need to use our infrastructure or someone else's infrastructure." The company is also observing new trends, including businesses scraping AI chatbots for marketing purposes and the emergence of new protocols like MCP that enable AI agents to interact with web services more effectively. "All of these guys that are consuming massive amounts of data, and all of us are using them, it's all going towards building the brains of the robots," Lenchner said. "It's okay that you have a chatbot that is talking to a human, because that's eventually what a robot will do." Robot brains and agent economy drive next phase of growth Bright Data's transformation from web scraping service to AI infrastructure provider reflects the rapidly evolving needs of the artificial intelligence industry. As companies rush to deploy AI agents and autonomous systems, access to real-time web data becomes as crucial as computing power and algorithmic sophistication. The legal precedents established through Bright Data's court victories may prove as significant as its technical innovations, potentially shaping how the entire AI industry accesses and uses web information. With major tech platforms increasingly restricting data access while simultaneously developing their own AI systems, independent infrastructure providers like Bright Data may become essential for maintaining competitive balance in the AI ecosystem. "We're an infrastructure company," Lenchner emphasized. "We're very talented engineers that hardly go anywhere, just sit with our computers and write code. We're doing it well. We have no intentions to do anything else." The Deep Lookup beta launches Tuesday for business customers, with general public access available through a waitlist. Browser.ai and MCP Servers are already available to enterprise clients through Bright Data's existing platform.
[2]
Bright Data introduces new lineup of AI data tools - SiliconANGLE
Startup Bright Data Ltd today introduced a suite of software tools designed to help companies collect information from the public web. The first tool, Deep Lookup, lends itself to use cases such as helping salespeople find potential customers. The two other products in the suite are geared towards artificial intelligence developers. They make it easier to collect publicly-available data for AI training projects. "The intelligence of today's LLMs is no longer its limiting factor; access is," said Bright Data Chief Executive Officer Or Lenchner. Founded in 2014, Bright Data provides a web scraping platform that is used by over 20,000 organizations. Those customers spend more than $100 million per year on the software. The company says that it has helped users scan more than 200 billion webpages to date. The first component of Bright Data's new product lineup, Deep Lookup, is an AI-powered search engine. It allows workers to retrieve information about companies and other entities using natural language instructions. For example, an investor could ask Deep Lookup to create a list of AI startups that were founded in the past three years. The search engine organizes its output in a spreadsheet-like format. If necessary, users can enrich the data in the spreadsheet by entering additional instructions. Deep Lookup also provides citations that can be used to check the accuracy of the retrieved records. Bright Data's second new tool is called Browser.ai. It's a cloud-based browser that AI applications can use to interact with websites. A product recommendation engine, for example, could use the service to retrieve merchandise details from popular e-commerce platforms. Browser.ai is based on Chromium, the open-source browser engine that underpins Chrome. If an AI application's attempt to access a webpage fails, the service can automatically retry the request. There are also features for detecting and troubleshooting technical issues in webpage retrieval workflows. Rounding out Bright Data's new product suite is Bright Data MCP. It's an open-source tool that allows AI applications to access the company's software via an application programming interface. The tool can be used by chatbots, AI code editors such as OpenAI's Windsurf and other workloads. Bright Data MCP is based on an open-source technology called Model Context Protocol. Released by Anthropic PBC last year, it provides a standardized interface through which AI applications can interact with third-party systems. The software thereby removes the need for developers to build custom connectors.
Share
Copy Link
Bright Data, fresh from legal victories against Meta and X, launches a comprehensive AI infrastructure suite including Deep Lookup, Browser.ai, and enhanced data collection protocols, aiming to democratize access to real-time web data for AI companies.
Bright Data, an Israeli web scraping company, has unveiled a comprehensive AI infrastructure suite valued at $100 million, marking a significant expansion from its specialized web scraping services 1. This strategic move comes in the wake of the company's high-profile legal victories against tech giants Meta and X (formerly Twitter) in federal court, establishing crucial legal precedents for web scraping rights 1.
The court rulings in 2024 dismissed lawsuits from both Meta and X, alleging illegal scraping of their platforms by Bright Data. These decisions have broader implications for the AI industry, which heavily relies on web data for training and operating language models 1. Judge William Alsup, who presided over the X case, emphasized the risk of creating "information monopolies" if social media companies were given "free rein to decide who can collect and use data" 1.
Source: SiliconANGLE
Bright Data's new offering includes three key components:
Deep Lookup: An AI-powered search engine that allows users to retrieve complex, multi-layered business information using natural language queries. It draws from Bright Data's massive web archive, containing over 200 billion HTML pages 12.
Browser.ai: Described as "the industry's first unblockable, AI-native browser," this cloud-based service mimics human behavior to access websites without triggering bot detection systems. It supports natural language commands and can perform complex web interactions 12.
MCP Servers (Model Context Protocol): A low-latency control layer enabling AI agents to search, crawl, and extract live data in real-time 12.
Bright Data CEO Or Lenchner identifies three core requirements for AI systems: algorithms, compute power, and data access. The company aims to become the definitive solution for the third requirement 1. Lenchner states, "The intelligence of today's LLMs is no longer its limiting factor; access is" 12.
Source: VentureBeat
Bright Data's competitive edge stems from its extensive patent portfolio, with over 5,500 patent claims on its technology, and the world's largest proxy network, boasting more than 150 million IP addresses across 195 countries 1. The company serves over 20,000 organizations, processing over 150 million web actions daily 12.
The launch of Bright Data's AI infrastructure suite could potentially democratize access to real-time web data for AI companies, challenging the perceived data monopoly of Big Tech platforms 1. This move aligns with the growing need for AI systems to access current web information to power chatbots, autonomous agents, and other AI applications 12.
As Bright Data continues to expand its offerings and challenge the status quo, the AI industry may see a shift in how data is accessed and utilized. The company's legal battles and technological advancements could pave the way for more open access to public web data, potentially fostering innovation and competition in the AI sector 12.
The Model Context Protocol (MCP) is emerging as a game-changing framework for AI integration, offering a standardized approach to connect AI agents with external tools and services. This innovation promises to streamline development processes and enhance AI capabilities across various industries.
2 Sources
Technology
9 hrs ago
2 Sources
Technology
9 hrs ago
A new study reveals that advanced AI language models, including ChatGPT and Llama, are increasingly prone to oversimplifying complex scientific findings, potentially leading to misinterpretation and misinformation in critical fields like healthcare and scientific research.
2 Sources
Science and Research
9 hrs ago
2 Sources
Science and Research
9 hrs ago
Recent tests reveal that NVIDIA's RTX 5090 GPU can suffer significant performance drops in content creation tasks when PCIe bandwidth is limited, highlighting the importance of proper PCIe configuration for professionals.
4 Sources
Technology
1 day ago
4 Sources
Technology
1 day ago
OpenAI publicly disavows Robinhood's sale of 'OpenAI tokens', stating they are not actual company equity. The incident raises questions about AI company ownership and tokenization of private assets.
4 Sources
Business and Economy
2 days ago
4 Sources
Business and Economy
2 days ago
Elon Musk's xAI obtains an air permit for 15 gas turbines at its Memphis data center, sparking debate over pollution and environmental justice in predominantly Black neighborhoods.
6 Sources
Technology
2 days ago
6 Sources
Technology
2 days ago