Silicon Valley Startups Build Amazon and Gmail Replicas to Train AI Agents for Complex Tasks

4 Sources

Share

Silicon Valley startups are creating replica websites of Amazon, Gmail, and United Airlines to train AI agents. These shadow sites allow AI systems to learn through reinforcement learning without getting blocked by real websites. Backed by venture capital, companies like AGI and Plato aim to build training environments that capture entire jobs, though the approach raises questions about copyright infringement and the future of automation.

Silicon Valley Startups Create Shadow Sites for AI Training

Silicon Valley startups are building replica websites of major platforms like Amazon, Gmail, and United Airlines to train AI agents capable of performing complex computing tasks. This summer, United Airlines lawyers discovered an almost perfect clone of their website, complete with booking menus, frequent flyer tracking, and the company's branding

1

. After receiving a copyright infringement notice, Div Garg's company AGI promptly renamed the site "Fly Unified" and removed the logo, clarifying that the replica was built solely as a training ground for artificial intelligence

2

.

Source: ET

Source: ET

These shadow sites represent a significant shift in how the tech industry develops AI agents—systems designed to book travel, schedule meetings, build bar charts, and automate white-collar work. Backed by $10 million in funding from Menlo Ventures and other investors, AGI has cloned sites including Airbnb and Gmail, giving them names like Omnizon, Staynb, and Go Mail

1

. Companies like Plato and Matrices are pursuing similar strategies, with Robert Farlow of Plato stating, "We want to build training environments that capture entire jobs that people do"

2

.

Source: NYMag

Source: NYMag

Reinforcement Learning Drives the Need for Replica Websites

The replica websites enable AI systems to learn through reinforcement learning, a technique where bots practice tasks through extreme trial and error over weeks or months. Unlike traditional AI training that relies on recordings of humans using websites, reinforcement learning allows AI agents to generate synthetic data by exploring different approaches to completing tasks like booking flights or managing emails

4

. "When you're doing training, you want to run thousands of AI agents at the same time, so that they can explore the website and visit its different pages and do all sorts of different things," Garg explained. "If you do that on a real website, you will get blocked"

1

.

Real websites like Amazon and Airbnb actively bar online bots, especially when they repeat tasks continuously—a process fundamental to reinforcement learning. This makes training environments for AI agents essential for companies like OpenAI, Google, Amazon, and Anthropic, which are racing to develop sophisticated chatbots capable of navigating web interfaces designed for human users

3

. John Qian, whose startup Matrices builds replica websites for AI training, noted, "You want the AI to be able to experiment with all the possible ways of completing each task"

2

.

Data Scarcity Fuels New Training Approaches

The trend reflects Silicon Valley's relentless pursuit of digital data to advance artificial intelligence. About nine months ago, companies like OpenAI exhausted virtually all available English language text on the internet, forcing them to lean more heavily on reinforcement learning and generate new data from scratch

1

. Initially applied to areas like math and computer programming, where AI systems could work through thousands of problems to identify correct approaches, the technique now extends to web navigation and automation tasks.

Venture capital is fueling this expansion, with Silicon Valley startups receiving substantial funding to create training environments that mirror real-world operations. Some startups have posted their replica websites publicly to advertise their capabilities to major AI companies like OpenAI, Google, and Amazon

4

. The approach raises questions about whether it makes sense to automate individual tasks or attempt to replicate entire human roles, with many companies betting that AI agents will eventually replace white-collar workers in various industries

3

.

Legal and Practical Implications for the Future

While Garg expressed confidence that removing company names and logos protects his startup from further legal action, the practice ventures into uncertain legal territory regarding copyright and intellectual property

2

. The development also highlights a fundamental tension: How will companies react when AI agents trained on these replica websites begin interacting with real platforms? Websites may view automated booking systems and comparison-shopping bots as spam or scraping attempts, potentially leading to widespread blocking of AI agents

3

. The question of whether the internet and economy can accommodate masses of freelance, mercenary machines transacting on behalf of humans remains unanswered, even as Silicon Valley continues building the infrastructure to make such automation possible.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo