3 Sources
3 Sources
[1]
Silicon Valley Builds Amazon and Gmail Copycats to Train A.I. Agents
Cade Metz has reported on artificial intelligence for more than 15 years. This summer, lawyers at United Airlines noticed that someone had built an almost perfect replica of the company's website. This digital clone offered all the same buttons and menus for booking flights, hotels and rental cars. It included the same blue links for tracking frequent flier miles and browsing discount deals. It even used the United brand name and logo. So United's lawyers sent a formal takedown notice accusing the site of violating its copyrights. Div Garg, whose tiny company built the replica site, promptly changed the site's name to "Fly Unified" and removed the United logo. He was not interested in stepping on United's copyrights. He and his company built their United.com replica as a training ground for artificial intelligence. Mr. Garg's company, AGI, is among a number of Silicon Valley start-ups that have spent the past several months recreating popular websites so that A.I. systems can learn to navigate the internet and complete specific tasks on their own, like booking flights. If an A.I. system learns to use a replica of United.com, it can use the real site, too. These new shadow sites are a significant part of the tech industry's efforts to transform today's chatbots into A.I. agents, which are systems designed to book travel, schedule meetings, build bar charts and complete other computing tasks. In the coming years, many companies believe, A.I. agents will become increasingly sophisticated and could replace some white-collar workers. "We want to build training environments that capture entire jobs that people do," said Robert Farlow, whose start-up, Plato, is among those recreating popular websites and other software applications. The new trend, fueled by Silicon Valley venture capital, shows just how far the tech industry will go in search of the enormous amounts of digital data needed to advance artificial intelligence. First, Silicon Valley hoovered up text, sounds and images from across the internet. When many sites blocked these efforts, companies found new ways of getting their hands on other people's data. Now, they are recreating websites as a way of generating new data from scratch. In recent months, backed by $10 million in funding from Menlo Ventures and other investors, Mr. Garg and his company have also cloned sites like Amazon, Airbnb and Gmail. With names like Omnizon, Staynb and Go Mail, these replicas provide a way for A.I. systems to learn skills through trial and error -- a technique that researchers call reinforcement learning. Rather than learning from data that shows how humans use websites, they learn from vast amounts of data they generate on their own. Silicon Valley researchers also have the option of training A.I. systems on real websites. But in many cases, that is not possible. Sites like Amazon and Airbnb often bar online bots, particularly when bots repeat the same tasks over and over again -- a process that is fundamental to reinforcement learning. "When you're doing training, you want to run thousands of A.I. agents at the same time, so that they can explore the website and visit its different pages and do all sorts of different things," Mr. Garg said. "If you do that on a real website, you will get blocked." Today's A.I. systems are driven by what scientists call neural networks, which are mathematical systems that can identify patterns in text, images and sounds. But about nine months ago, companies like OpenAI used up just about all the English language text on the internet. So they are leaning more heavily on reinforcement learning. This process, which can extend over weeks or months, began in areas like math and computer programming. By working through thousands of math problems, for instance, A.I. systems can learn which actions lead to the right answer and which do not. Now, companies like OpenAI, Google, Amazon and Anthropic are using the technique to build A.I. agents. They started by using recordings of people using real websites. By analyzing the way these hired hands used their mouse and keyboard to order lunch on DoorDash or type numbers into Microsoft Excel, the systems learned to use the sites on their own. To make that work go faster, A.I. companies are paying little-known start-ups like AGI and Plato to build replica websites where bots can learn through extreme trial and error. "You want the A.I. to be able to experiment with all the possible ways of completing each task," said John Qian, whose start-up, Matrices, builds replica websites for A.I. training. Most of this work happens behind the scenes. But in some cases, start-ups have posted their replica websites to the public internet as a way of advertising their work to the big A.I. companies like OpenAI, Google and Amazon. After removing company names and logos from the replica sites built by his start-up, Mr. Garg said, he is not worried about further legal action from copyright holders like United Airlines. Mr. Qian said much the same, though he acknowledged that A.I. research had ventured into new legal territory that was not completely settled. Mr. Farlow declined to comment. Robin Feldman, a professor at U.C. Law San Francisco and the author of the book "AI Versus IP," said that using these shadow sites to train A.I. technologies could violate the copyrights of companies like United Airlines. But the courts may eventually find, she added, that this is permitted under copyright law. "These companies are shooting first and asking questions later," Ms. Feldman said. "The field is expanding much faster than the legal system can keep up with. Some of the decisions being made along the way end up biting the companies that have made those decisions." Companies like OpenAI and Anthropic have already released experimental technologies that can shop on Instacart or take notes using online word processors like Google Docs. But these technologies frequently make mistakes. Sometimes, this prevents them from completing the requested task. (The New York Times has sued OpenAI and Microsoft, claiming copyright infringement of news content related to A.I. systems. The two companies have denied the suit's claims.) "There is a big gap between what companies want these agents to do and what they are capable of today," said Rayan Krishnan, chief executive of Vals AI, a company that tests the performance of the latest A.I. technologies. "Today, these systems are way too slow for them to be useful. You can just do the clicks yourself." Experts disagree on how quickly this work will progress, whether consumers and businesses want or need this kind of automation and whether popular websites will even allow it to happen. Last month, Amazon sued a start-up called Perplexity over A.I. that aimed to automate shopping on the Amazon site. But the goal is to build systems that automate almost any white-collar work. "If you can recreate all the software and websites that people use, you can train A.I. to do the jobs and start to do them even better than a human," Mr. Farlow said.
[2]
How AI Companies Are Simulating the Robot Takeover
One of the promises underwriting the AI boom is that software will be able to use computers on our behalf. At the low end, this looks like the "deep research" tools AI firms have already released that search the web and synthesize information on your behalf, attempting to automate the task of using Google. More theoretically, it might mean models trained to use productivity software in a work context, which may then be used to attempt to automate increasingly complicated jobs. In between, you've got the stuff that companies like Google and OpenAI keep touting in demos: AI that can book flights for you, AI that can get you a reservation, AI that can comparison-shop for you. As a category of "agentic" behavior, this sort of stuff is both ambitious and conceptually funny -- the great big intelligence in the sky is being deployed, first, to click-farm the entire economy. So far, the results of such an approach have been mixed: Computer-use agents and self-clicking AI browsers make for great demos but struggle to offer obvious utility to most people. One might argue for a different path to broadly useful and flexible AI agents, one that doesn't take a long detour through web interfaces designed for human cursors and thumbs. Or, as many in the AI industry would have it, you might argue instead that this is an obvious first step and that these tools don't yet work well only because they haven't been adequately trained and because they don't have enough access to the types of sites they're intended to use. That's how you get, as reported by Cade Metz at the New York Times, AI start-ups making training-ready replicas of websites, including Gmail, Airbnb, and United Airlines: These new shadow sites are a significant part of the tech industry's efforts to transform today's chatbots into A.I. agents, which are systems designed to book travel, schedule meetings, build bar charts and complete other computing tasks. In the coming years, many companies believe, A.I. agents will become increasingly sophisticated and could replace some white-collar workers. "We want to build training environments that capture entire jobs that people do," said Robert Farlow, whose start-up, Plato, is among those recreating popular websites and other software applications. The pitch for these companies is basically: We'll build replicas of popular websites so your AI can train on them and your agentic tools can attempt to use them with no limitations. Sending your half-baked AI agent out to try to book concert tickets will look, to the company selling them, a lot like spam, scraping, or sniping. Better, the thinking goes, to do most of this practice in simulation, where the airline might not even know you're planning to automate its ticketing interface. There's a lot of similar stuff going on in AI, from "enterprise simulation," as Salesforce calls it -- in which companies can use "virtual environments that mirror the noise, accents, crosstalk, and complexity of real-world operations" to train, for example, customer-service agents -- to virtual environments where robotics companies can collect data from thousands of scenarios in gamelike environments that can then be used to train, for example, humanoid machines. Right now, on trade-show floors around the world, you've got companies touting robots that have been "trained in simulation," often using the same software from Nvidia: Like the software agents training on airline and e-commerce sites, there's an interesting tension at the core of humanoid-robot efforts: Does it make sense to automate tasks, or groups of like tasks, or to try to automate the entire person who typically does those tasks? But the singularity-in-a-sandbox parallel is illuminating in another way, too. Setting aside the question of how well such broad simulations map to the real world, the demand for such a thing is obvious. Running a bunch of tests with robot prototypes is prohibitively time-consuming and expensive, and doing the same thing in something akin to a video-game engine might get you to a productive feedback loop much more quickly. The reason you need a simulated Airbnb interface, though, is different: It's not that training a robot to use a website is particularly time or resource intensive. It's that Airbnb probably doesn't want you to build what you're building at all. AI-training replicas are an early expression of a crucial question about the near future of AI agents, at least as companies like OpenAI have been talking about them: How will the internet, and the world in general, react to the presence of a bunch of freelance, mercenary machines? (Or, to put a finer point on it, to the replacement of human customers with bots transacting on their behalf?) Companies like Plato and AGI exist because one of the more obvious ways for a company to train a computer-use agent, aside from monitoring users' screens or ingesting years of screen-captured videos -- to have it try a whole bunch to see what works -- is interpreted as antagonistic or threatening by the companies whose interfaces they're training on. Amazon is already suing an automated-browser company for "unauthorized access and trespass" to its site, referring to a shopping agent as a sort of "intruder." This is understandable but also revealing: To Amazon, customers are worth a lot more than the products they buy, particularly as audiences for advertising. Replica sites promise to give AI companies a means to get at least part of the way to a working product without interacting with companies like Amazon at all (although, as the Times notes, these companies are already getting sued too). Much in the way that the appeal of using ChatGPT instead of conventional search is partly because it just doesn't have many ads (yet), a chatbot shopping routine might be appealing because it lets you spend less time in Amazon's intentionally disorienting, ad-laden, hypermonetized interface. AI companies don't spend a lot of time dwelling, at least publicly, on what might happen after their agents-in-progress get deployed en masse. This is partly because they don't know. But the most obvious paths forward are all pretty bumpy. At one extreme, you have a situation in which every other company in the world has a severe immune response to what you're doing and sees AI agents as a generalized attempt to take over their relationship with their customers. At another, they happily accept that AI companies can provide them with a bunch of business, albeit under different terms than they're used to, and try to work something out. Anything short of the latter scenario leaves AI firms with a fight. They could compromise, working out partnerships with, say, airlines, which, among other things, might mean that ChatGPT doesn't have to pretend to be a person using their websites but which would let airlines keep more control over how their tickets are sold through chatbots. But the rise of replica sites suggests that AI companies don't particularly want to ask for permission or partnership and that they know their approaches will be treated with suspicion. (Also, they're all building chatbot booking and shopping interfaces of their own.) They'd rather force the issue -- see if these features work, see if users want them -- then approach would-be partners with more leverage. They're also a consequence of the AI industry's all-or-nothing ethos at the moment. Gradually incorporating a bunch of commerce partners into popular new chat interfaces is a plausible and familiar business plan, but at this stage of AI narrative cycle -- and at this level of investment -- plausible and familiar won't cut it. Partnering with Airbnb and getting commission on bookings probably doesn't add up to a trillion-dollar start-up valuation. Standing between your captive users and the entirety of the online economy, on the other hand, might. So that's what they're going to try, first in simulation, then in real life.
[3]
Silicon Valley builds Amazon and gmail copycats to train AI agents
Silicon Valley startups are cloning major websites to train AI agents through reinforcement learning. These replicas let AI practice tasks like booking flights without being blocked by real sites. Backed by venture capital, companies pursue vast synthetic data despite legal uncertainties, aiming to automate complex white-collar work. This summer, lawyers at United Airlines noticed that someone had built an almost perfect replica of the company's website. This digital clone offered all the same buttons and menus for booking flights, hotels and rental cars. It included the same blue links for tracking frequent flyer miles and browsing discount deals. It even used the United brand name and logo. So United's lawyers sent a formal takedown notice accusing the site of violating its copyrights. Div Garg, whose tiny company built the replica site, promptly changed the site's name to "Fly Unified" and removed the United logo. He was not interested in stepping on United's copyrights. He and his company built their United.com replica as a training ground for artificial intelligence. Garg's company, AGI, is among a number of Silicon Valley startups that have spent the past several months recreating popular websites so that AI systems can learn to navigate the internet and complete specific tasks on their own, like booking flights. If an AI system learns to use a replica of United.com, it can use the real site, too. These new shadow sites are a significant part of the tech industry's efforts to transform today's chatbots into AI agents, which are systems designed to book travel, schedule meetings, build bar charts and complete other computing tasks. In the coming years, many companies believe, AI agents will become increasingly sophisticated and could replace some white-collar workers. "We want to build training environments that capture entire jobs that people do," said Robert Farlow, whose startup, Plato, is among those recreating popular websites and other software applications. The new trend, fueled by Silicon Valley venture capital, shows just how far the tech industry will go in search of the enormous amounts of digital data needed to advance artificial intelligence. First, Silicon Valley hoovered up text, sounds and images from across the internet. When many sites blocked these efforts, companies found new ways of getting their hands on other people's data. Now, they are recreating websites as a way of generating new data from scratch. In recent months, backed by $10 million in funding from Menlo Ventures and other investors, Garg and his company have also cloned sites like Amazon, Airbnb and Gmail. With names like Omnizon, Staynb and Go Mail, these replicas provide a way for AI systems to learn skills through trial and error -- a technique that researchers call reinforcement learning. Rather than learning from data that shows how humans use websites, they learn from vast amounts of data they generate on their own. Silicon Valley researchers also have the option of training AI systems on real websites. But in many cases, that is not possible. Sites like Amazon and Airbnb often bar online bots, particularly when bots repeat the same tasks over and over again -- a process that is fundamental to reinforcement learning. "When you're doing training, you want to run thousands of AI agents at the same time, so that they can explore the website and visit its different pages and do all sorts of different things," Garg said. "If you do that on a real website, you will get blocked." Today's AI systems are driven by what scientists call neural networks, which are mathematical systems that can identify patterns in text, images and sounds. But about nine months ago, companies like OpenAI used up just about all the English language text on the internet. So they are leaning more heavily on reinforcement learning. This process, which can extend over weeks or months, began in areas like math and computer programming. By working through thousands of math problems, for instance, AI systems can learn which actions lead to the right answer and which do not. Now, companies like OpenAI, Google, Amazon and Anthropic are using the technique to build AI agents. They started by using recordings of people using real websites. By analyzing the way these hired hands used their mouse and keyboard to order lunch on DoorDash or type numbers into Microsoft Excel, the systems learned to use the sites on their own. To make that work go faster, AI companies are paying little-known startups like AGI and Plato to build replica websites where bots can learn through extreme trial and error. "You want the AI to be able to experiment with all the possible ways of completing each task," said John Qian, whose startup, Matrices, builds replica websites for AI training. Most of this work happens behind the scenes. But in some cases, startups have posted their replica websites to the public internet as a way of advertising their work to the big AI companies like OpenAI, Google and Amazon. After removing company names and logos from the replica sites built by his startup, Garg said, he is not worried about further legal action from copyright holders like United Airlines. Qian said much the same, though he acknowledged that AI research had ventured into new legal territory that was not completely settled. Farlow declined to comment. Robin Feldman, a professor at UC Law San Francisco and the author of the book "AI Versus IP," said that using these shadow sites to train AI technologies could violate the copyrights of companies like United Airlines. But the courts may eventually find, she added, that this is permitted under copyright law. "These companies are shooting first and asking questions later," Feldman said. "The field is expanding much faster than the legal system can keep up with. Some of the decisions being made along the way end up biting the companies that have made those decisions." Companies like OpenAI and Anthropic have already released experimental technologies that can shop on Instacart or take notes using online word processors like Google Docs. But these technologies frequently make mistakes. Sometimes, this prevents them from completing the requested task. "There is a big gap between what companies want these agents to do and what they are capable of today," said Rayan Krishnan, CEO of Vals AI, a company that tests the performance of the latest AI technologies. "Today, these systems are way too slow for them to be useful. You can just do the clicks yourself." Experts disagree on how quickly this work will progress, whether consumers and businesses want or need this kind of automation and whether popular websites will even allow it to happen. Last month, Amazon sued a startup called Perplexity over AI that aimed to automate shopping on the Amazon site. But the goal is to build systems that automate almost any white-collar work. "If you can recreate all the software and websites that people use, you can train AI to do the jobs and start to do them even better than a human," Farlow said.
Share
Share
Copy Link
Silicon Valley startups are building replica websites of Amazon, Gmail, and Airbnb to train AI agents through reinforcement learning. Backed by $10 million in venture capital, companies like AGI and Plato create shadow sites that let AI systems practice tasks like booking flights without being blocked by real websites, raising questions about copyright infringement and the future of automated work.
Silicon Valley startups are building replica websites to train AI systems in a development that reveals how far the tech industry will go to secure data for artificial intelligence advancement. When United Airlines lawyers discovered an almost perfect clone of their website this summer, complete with the same buttons, menus, and branding for booking flights and tracking frequent flier miles, they sent a copyright infringement takedown notice
1
. The replica was created by Div Garg's company AGI, which promptly rebranded the site as "Fly Unified" and removed the logo. But the purpose wasn't to deceive customers—it was to create training environments where AI agents could learn to navigate websites and complete computing tasks autonomously.These shadow sites enable a technique called reinforcement learning, where AI systems learn through trial and error rather than by mimicking human behavior. Backed by $10 million in funding from Menlo Ventures and other investors, AGI has cloned sites like Amazon, Airbnb, and Gmail, giving them names like Omnizon, Staynb, and Go Mail
1
.Source: ET
"When you're doing training, you want to run thousands of AI agents at the same time, so that they can explore the website and visit its different pages and do all sorts of different things," Garg explained. "If you do that on a real website, you will get blocked"
1
. Real sites like Amazon and Airbnb actively bar bots that repeat tasks over and over, making simulation the only viable path forward for companies seeking to develop sophisticated AI agents.
Source: NYMag
Companies like OpenAI, Google, Amazon, and Anthropic are using these techniques to build AI agents capable of booking travel, scheduling meetings, and creating bar charts—tasks that could eventually automate white-collar work
1
. The shift toward reinforcement learning accelerated about nine months ago when companies like OpenAI exhausted available English language text on the internet3
. Initially, AI companies trained systems by recording people using real websites—analyzing how hired workers used their mouse and keyboard on DoorDash or Microsoft Excel. Now they're paying little-known Silicon Valley startups like AGI, Plato, and Matrices to build replica websites where bots can experiment with all possible ways of completing each task through extreme trial and error1
.Related Stories
Robert Farlow, whose startup Plato recreates popular websites and software applications, stated: "We want to build training environments that capture entire jobs that people do"
1
2
. This raises questions about how the internet will react to autonomous AI agents acting as mercenary machines on behalf of human users. The approach mirrors developments in robotics, where companies train humanoid machines in simulation using software from Nvidia before deploying them in real-world scenarios2
. The existence of companies like Plato and AGI reflects a tension: one obvious way to train AI systems would be to have them practice on real websites, but many companies don't want these tools built at all. After removing company names and logos from replica sites, Garg said he's not worried about further legal action from copyright holders1
. However, the venture capital-fueled trend demonstrates the industry's determination to generate synthetic data from scratch, even as legal uncertainties around copyright law remain unresolved.Summarized by
Navi
1
Technology

2
Technology

3
Science and Research
