Silicon Valley Startups Clone Amazon and Gmail to Train AI Agents on Shadow Sites

3 Sources

Share

Silicon Valley startups are building replica websites of Amazon, Gmail, and Airbnb to train AI agents through reinforcement learning. Backed by $10 million in venture capital, companies like AGI and Plato create shadow sites that let AI systems practice tasks like booking flights without being blocked by real websites, raising questions about copyright infringement and the future of automated work.

Silicon Valley Startups Build Shadow Sites for AI Training

Silicon Valley startups are building replica websites to train AI systems in a development that reveals how far the tech industry will go to secure data for artificial intelligence advancement. When United Airlines lawyers discovered an almost perfect clone of their website this summer, complete with the same buttons, menus, and branding for booking flights and tracking frequent flier miles, they sent a copyright infringement takedown notice

1

. The replica was created by Div Garg's company AGI, which promptly rebranded the site as "Fly Unified" and removed the logo. But the purpose wasn't to deceive customers—it was to create training environments where AI agents could learn to navigate websites and complete computing tasks autonomously.

Reinforcement Learning Drives Demand for Replica Websites

These shadow sites enable a technique called reinforcement learning, where AI systems learn through trial and error rather than by mimicking human behavior. Backed by $10 million in funding from Menlo Ventures and other investors, AGI has cloned sites like Amazon, Airbnb, and Gmail, giving them names like Omnizon, Staynb, and Go Mail

1

.

Source: ET

Source: ET

"When you're doing training, you want to run thousands of AI agents at the same time, so that they can explore the website and visit its different pages and do all sorts of different things," Garg explained. "If you do that on a real website, you will get blocked"

1

. Real sites like Amazon and Airbnb actively bar bots that repeat tasks over and over, making simulation the only viable path forward for companies seeking to develop sophisticated AI agents.

Source: NYMag

Source: NYMag

Major AI Companies Pursue Autonomous Agents

Companies like OpenAI, Google, Amazon, and Anthropic are using these techniques to build AI agents capable of booking travel, scheduling meetings, and creating bar charts—tasks that could eventually automate white-collar work

1

. The shift toward reinforcement learning accelerated about nine months ago when companies like OpenAI exhausted available English language text on the internet

3

. Initially, AI companies trained systems by recording people using real websites—analyzing how hired workers used their mouse and keyboard on DoorDash or Microsoft Excel. Now they're paying little-known Silicon Valley startups like AGI, Plato, and Matrices to build replica websites where bots can experiment with all possible ways of completing each task through extreme trial and error

1

.

Questions About Copyright Law and Future Implications

Robert Farlow, whose startup Plato recreates popular websites and software applications, stated: "We want to build training environments that capture entire jobs that people do"

1

2

. This raises questions about how the internet will react to autonomous AI agents acting as mercenary machines on behalf of human users. The approach mirrors developments in robotics, where companies train humanoid machines in simulation using software from Nvidia before deploying them in real-world scenarios

2

. The existence of companies like Plato and AGI reflects a tension: one obvious way to train AI systems would be to have them practice on real websites, but many companies don't want these tools built at all. After removing company names and logos from replica sites, Garg said he's not worried about further legal action from copyright holders

1

. However, the venture capital-fueled trend demonstrates the industry's determination to generate synthetic data from scratch, even as legal uncertainties around copyright law remain unresolved.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo