8 Sources
[1]
Go, Claude! Twitch Fans Cheer on an AI Playing Pokémon Red Surprisingly Well
A livestream of an AI model playing Pokémon Red on Twitch is captivating audiences this week. The model is Anthropic's latest release, Claude 3.7 Sonnet, which is navigating the classic Gameboy game with no prior training. "HE'S DOING IT," says one onlooker in the live chat. "Let's see what happens now," another adds. "GO, CLAUDE, GO!" Although the livestream page claims the experiment is "a passion project made by a person who loves Claude and loves Pokémon," it was actually set up by Claude's creator, Anthropic. The idea to unleash Claude on Pokémon Red began internally at Anthropic in 2024, with an earlier model called Claude 3.5 Sonnet. The project "gained a cult following within the company," David Hershey, Anthropic technical staff member, tells PCMag. "The livestream on Twitch was a natural extension of that internal enthusiasm...Our team quickly created the ongoing livestream so anyone could watch Claude attempt to catch 'em all." Claude 3.7 is getting further in the game than its predecessor Claude 3.5. While Claude 3.5 could catch Pokémon and leave the starting area of Pallet Town, the "real breakthrough" with Claude 3.7 Sonnet is that it can complete challenges, collecting three badges from Pokémon gym leaders, Hershey says. Video game progress is a lot easier to understand than the typical AI improvement metrics that OpenAI, Grok, Google, and all AI companies release with each new model. That's why Claude included its new models' gaming chops in the 3.7 Sonnet announcement. "We're slowly moving away from traditional benchmarks in favor of more 'accessible' tests that can be understood by a larger group of people," says Dianne Penn, lead product manager of research at Anthropic. "We're at a point where standard evaluations don't tell the full story of how much more capable each version of these models are." Measuring the nuances of AI model improvement is a difficult task. This week, OpenAI admitted it struggled to measure the improvements of its latest model, GPT-4.5, and had to develop its own testing scale for "vibes," or humanlike behavior. When playing Pokémon Red, Claude can perform actions with the main game buttons (A, B, Up, Down, Left, Right, Start, Select) and navigate to specific coordinates on the screen. It takes screenshots and processes the images to understand its surroundings. As it plays, it updates its knowledge base with new information and keeps building upon it. It's not perfect, and sometimes gets confused by the navigation and where it is. It's not always successful, either, but human onlookers are finding its solutions to challenges creative. In that sense, it's providing a fresh perspective on how to beat the game that humans may not have thought of, along with some good internet fun.
[2]
Watching One of the World's Most Advanced AIs Try to Beat Pokémon Red Is Strangely Fascinating
To prepare to take over the real world, AI models are first conquering virtual ones. On Tuesday, Anthropic kicked off its Twitch livestream titled "Claude Plays Pokémon." Without human intervention, the Google-backed startup's latest AI model, Claude 3.7 Sonnet, explores the world of Pokémon Red, doing its best to beat Nintendo's classic RPG for the Game Boy, released in the halcyon days of 1998. And it's not doing too badly, either. So far, Claude 3.7 has managed to clinch three Gym Leader badges, most recently besting Lt. Surge at the Vermilion City Gym. That's considerably better than Claude 3.5, which had stalled at Pallet Town, the game's starting area. Endearingly, Claude 3.7 even gives nicknames to its roster of battling creatures, christening its choice of starter Pokémon, Squirtle, as "Shell." In the case of Pokémon, the game's turn-based combat, not to mention simple dialog options, make an ideal testing ground for the LLM's newly-boasted "reasoning" capabilities. There's a limited number of options available to the player, making the challenge approachable. Viewers of the livestream can witness Claude's real-time thought process in a window next to the gameplay, providing some amusing insights. "It appears a wild Pokémon encounter has started when I moved!" reads the AI's ersatz stream of consciousness. "Let me press 'a' to advance through this unusual dialogue... and prepare for battle. I'll lead with SPIKE who is at full health." That said, the AI's thought process as it navigates the game's open world portions can be painstakingly circuitous. TechCrunch notes an instance where Claude was stumped by a rock wall that it kept trying to walk through, taking forever to realize that it could simply path around the minor obstacle. According to Anthropic, Claude mainly sees the world by analyzing a constant stream of screenshots of the game -- though often erroneously, the startup admits. It also can read the game's memory, gleaning information like the player's coordinates. And in what is the biggest upgrade from its predecessor, Claude 3.7 keeps an ever-changing "knowledge base" to store notes about its gameplay as it goes along, like where things are, or what sequence to press buttons in to execute certain game mechanics. Actually controlling the game, meanwhile, is accomplished by a custom interface that lets Claude press virtual buttons, Anthropic said, along with a pathfinding tool that helps the model determine how to move from location to location. Clunkiness and laborious pace notwithstanding, watching the AI model stumble around and occasionally succeed can be an oddly fascinating spectacle. If nothing else, it's a nostalgic trip down memory lane -- or an excuse to keep some good old Pokémon music in the background.
[3]
Anthropic's Claude AI is playing Pokémon on Twitch... slowly | TechCrunch
On Tuesday afternoon, Anthropic launched Claude Plays Pokémon on Twitch, a live stream of Anthropic's newest AI model, Claude 3.7 Sonnet, playing a game of Pokémon Red. It's become a fascinating experiment of sorts, showcasing the capabilities of today's AI tech and people's reactions to them. AI researchers have used all sorts of video games, from Street Fighter to Pictionary, to test new models -- often more for amusement than utility. But Anthropic said that Pokémon proved to be a useful benchmark for Claude 3.7 Sonnet, which can effectively "think" through the sorts of puzzles the game contains. Like OpenAI's o3-mini and DeepSeek's R1, Claude 3.7 Sonnet can "reason" its way through tough challenges, like playing a video game designed for children. While the model's non-reasoning predecessor, Claude 3.5 Sonnet, failed the very beginning of Pokémon Red -- exiting the player's home in Pallet Town -- Claude 3.7 Sonnet managed to win three gym leader badges. The newest Claude still runs into trouble, though. Hours into the Twitch stream, the model was deterred by a rock wall, which it couldn't walk through no matter how hard it tried. One Twitch user summed up the situation this way: "who would win, a computer AI with thousands of hours put into programming it, or 1 rock wall?" Eventually, Claude realized that it could navigate around the wall. On the one hand, it's frustrating to watch Claude traverse Pokémon Red with the speed of a Slowpoke, reasoning through each and every step with excruciating contemplation. Yet it's also oddly compelling. The left of the stream shows Claude's "thought process," while the right shows real-time gameplay. At one point, Claude attempted to locate Professor Oak inside his laboratory, but got confused, because there were other NPCs in the scene. "I notice a new character has appeared below me -- a character with black hair and what appears to be a white coat at coordinates (2, 10)," Claude wrote. "This might be Professor Oak! Let me go down and talk to him." Claude then proceeded to mistakenly talk to an NPC other than the Processor -- an NPC the model had spoken with several times before. Some of the thousand-odd people in the Twitch chat started to get antsy. Others, particularly those who'd been watching the stream for more than a few minutes, were less worried. "Guys chill," one person wrote in the chat. "Before we exited and entered Oak's lab like 10 times before understanding how to move on." For longtime Twitch users, the format of Anthropic's stream might feel nostalgic. Over a decade ago, millions of people tried to play Pokémon Red at once in a first-of-its-kind online social experiment called Twitch Plays Pokémon. Each user could control the player character via Twitch chat, resulting in predictably chaotic gameplay. Some AI researchers have cited Twitch Plays Pokémon as an inspiration for their work. In October 2023, Seattle-based software engineer Peter Whidden published a YouTube video detailing how he trained a reinforcement learning algorithm to play Pokémon. His AI spent over 50,000 hours playing the game before it learned to successfully navigate it. One challenge was that the AI preferred to admire the pixelated scenery instead of actually playing the game. AI-powered "reenactments" of Twitch Plays Pokémon like Whidden's and Anthropic's are entertaining, but a little bittersweet at the same time. The original stream was such a pivotal moment in Twitch history because it brought people together in an unexpected way. Everyone was on the same team, working toward the goal of getting the player character to stop running in circles and actually progress through the game. In 2025, it seems we're no longer teammates, but spectators, watching an AI model try to play a game many of us got the hang of when we were five years old. It's an AI-motivated microcosm of a larger trend: our experiences online are moving from shared, communal activities to more solitary ones.
[4]
I Can't Stop Watching This AI Chatbot Play Pokémon
Have you ever wondered if an AI chatbot can play Pokémon? The latest version of Anthropic's Claude is currently playing Pokémon live on Twitch, and I have to admit, it's more engrossing than I expected. Claude Plays Pokémon Live on Twitch Anthropic launched the latest version of its Claude AI chatbot on 24 February 2024, upgrading it to version 3.7. It's the first Claude version to feature a "hybrid reasoning model," enabling it to solve complex problems, outperforming its previous models significantly. It's a massive step forward from the first version of Claude 3, launched back in March 2024. After launching the update, Anthropic posted on X that its research team had been progressively pushing Claude to learn how to play Pokémon. Starting with the previous version, Claude 3.5, it was a slow start. Claude even asked to reset the game at one point. But now, with the launch of Claude 3.7, the AI chatbot is in full flow and attempting to complete Pokémon Red live on Twitch, on the Claude Plays Pokémon channel. Claude Plays Pokémon Is a Surprisingly Interesting Watch Claude's new-found ability to play Pokémon is interesting for a few reasons. You're watching an advanced AI tool reason its way through a game many of us have known and played over the years. It may be progressing through the game at a glacial, Slowpoke-like speed, but seeing the scrolling window of Claude's thought process makes it intriguing. At the time of writing, Claude has been stuck in Mt. Moon for over 18.5 hours, struggling to find its way to Cerulean City and frequently running out of active Pokémon. I also like that Claude updates its Context Window periodically, compounding its knowledge of the game. Claude is learning about the game and its own capabilities live, becoming more accustomed to the processes and quirks of the game. At one point, I spotted Claude's reasoning explaining to itself why the opposing Pokémon had received an extra turn (after swapping out its own Pokémon). At the start of each battle, it takes a moment to assess its chances of winning, considering its Pokémon health and the opposing Pokémon -- then selects Run if it knows it can't win. And if you look closely, Claude has given each of its Pokémon cute names: Its Pikachu is named Bolt, its Wartortle is named Shell, and its Spearow is named Swift. It's a Nice Little Nostalgia Hit, Too Over 11 years ago, the gaming world was transfixed by Twitch Plays Pokémon, the social experiment in which millions of users spammed inputs in the hope of completing Pokémon Red. It took more than 16 days, but eventually, the combined efforts pushed through to see Twitch beat Blue (the champion of Pokémon Red). I have no idea how long Claude will end up playing Pokémon Red, but it's worth checking out every now and then to see how it's faring up.
[5]
An AI is trying to 'teach' itself Pokemon Red after its previous versions weren't smart enough to solve it, and it's very slowly working its way through the Kanto gym leaders
The frankly terrifying rise of AI would be enough to convince anyone that a robot takeover is imminent, but one large language model's attempts to play through Pokemon Red have reassured me that it might still be a ways off yet. Over on Twitch, Claude Plays Pokemon (Claude being the name of the AI, which was developed by Anthropic) has been wandering through Pokemon Red for the last 22 hours, and the end is nowhere in sight. With a badge in hand, a Squirtle named Shell, a Nidoran named Spike, a Pikachu called Bolt, and a Spearow dubbed Swift, Claude has only recently made its way out of Pewter City after spending a bit too long repeatedly walking into the fence at the side of Brock's gym. According to the channel description on Twitch, this latest version of Claude (3.7 Sonnet, to be exact) previously got past Lt. Surge's gym, which is quite a feat when you consider that even humans tend to struggle with his frustrating trash can puzzle. It's progress that only this version of Claude has managed to make, which makes sense as developer Anthropic's prior attempts at the same project found that "previous models wandered aimlessly or got stuck in loops." This seems to be a different run which obviously hasn't got that far as Lt. Surge yet, but it's clearly got the capability to do so, despite the fact that Claude "has no special training for Pokemon - it's using its general reasoning abilities to navigate the world." As for how it all works, Claude tries to work out what's happening by analyzing screenshots of the game - it uses "a pathfinding tool" to spot paths, helping it find its bearings. On top of that, it also "maintains a dynamic set of notes" of information from everything about the mechanics of the game to the Pokemon themselves, and has access to some parts of the game's memory to check things like the status of its party. But, as you've probably gathered by this point, progress is rather slow. You can actually watch the AI's reasoning for each decision it makes to see how it processes the information, including its, uh, feelings? "This is concerning!" it declares as a Caterpie knocks Bolt the Pikachu to 1HP. Despite accumulating notes, the AI sometimes appears to forget things, including what it's just done - this was seen in full force when it was trying to navigate away from Pewter City gym, but kept looping its actions and getting stuck by the fence. To watch it without the AI's thought process running, the whole thing feels very similar to Twitch Plays Pokemon, which is rather funny when you consider that, at times, there were over 121,000 people simultaneously trying to control the same game (with many deliberately trying to halt progress, at that). They say two brains are better than one, but can thousands be better than one AI model? I'd like to think so, and hey, if Claude takes longer than 16 days to see this through, I'd say the collective hive mind of Twitch circa 2014 wins.
[6]
Claude AI Can Now Play Pokémon -- And It's Winning - Decrypt
It turns out robot lawnmowers and ChatGPT are not the only ones that can play video games. Anthropic said on Tuesday that Claude's latest version, 3.7 Sonnet, can play the classic video game Pokémon. In a thread posted to X, Anthropic said an early version of Claude 3.7 Sonnet could defeat opponents within hours of playing Pokémon. "The results were striking. Within hours, Claude defeated Brock. Days later, it trounced Misty. Progress that older models had little hope of achieving," Anthropic wrote. "Turns out extended thinking is super effective." According to Anthropic, Claude 3.7 Sonnet keeps notes in its knowledge base, observes the screen, and employs function calls to click buttons and navigate the game. In addition to screenshots, Anthropic linked to a Twitch channel called "ClaudePlaysPokemon" showing Claude playing the game. What made defeating the Pokémon opponents possible, Anthropic said, was Claude 3.7 Sonnet's ability to plan its next moves and adapt its strategies, where previous models like Claude 3.5 Sonnet would wander or get stuck in a loop. "With a few tools to help it see the screen a bit better, Claude acts as an agent, applying its abilities to a novel task," Anthropic wrote. "In this, we start to see glimmers of AI systems that tackle challenges with increasing competence, not just through training but with generalized reasoning." Claude 3.7 Sonnet is the latest AI model to play video games successfully. Last March, researchers used ChatGPT to play classic first-person shooter Doom, managing to get to the last room in the game once. That same month, Google DeepMind launched its Scalable Instructable Multiworld Agent (SIMA). This generalist AI, capable of performing various tasks such as text generation, image analysis, and translation, was trained to play video games such as No Man's Sky, Teardown, and Valheim. "Our AI agent doesn't need access to a game's source code, nor bespoke APIs," Google DeepMind wrote. "It requires just two inputs: the images on screen and simple, natural-language instructions provided by the user."
[7]
New Anthropic AI Model Plays Pokemon Red
Disclaimer: This content generated by AI & may have errors or hallucinations. Edit before use. Read our Terms of use United States-based Artificial Intelligence startup Anthropic has done something interesting while training its newest Large Language Model (LLM), Claude 3.7 Sonnet. In its blog post dated February 24, the company stated that it had tested Claude with playing the game Pokémon Red, in an attempt to map what it refers to as 'extended thinking'. Anthropic equipped the model with basic memory, screen pixel input, and function calls to press buttons and navigate around the screen prior to the test, the AI startup noted. The company showcased the gameplay results in a chart displaying comparative performances of previous models and 3.7 Sonnet. While Claude 3.0 Sonnet, a previous model, was unsuccessful in completing the initial level and finish a task which requires the player to leave a house, the latest model notably completed 11 levels, and even participated in battles with, and defeated, three Pokémon Gym Leaders (more difficult fights requiring a higher level of skillfulness). The experiment tallied the model's in-game advancement against the number of tasks it carried out to complete all levels. As per the chart, Claude 3.7 Sonnet took 35,000 actions to reach and complete the last level. The gameplay highlighted the 'extended thinking mode' characteristic that Claude 3.7 Sonnet comes enabled with, according to the report, which described it as a feature that users will be able to toggle on and off - one which essentially provides the model the option to allow itself more time, and put in more effort, to come up with an answer. Interestingly, Anthropic stated that the previous versions of Claude Sonnet, against which 3.7's performance while playing the game was tallied, did not have access to the feature, which can potentially raise questions on the efficacy of the experiment . "When Claude 3.7 Sonnet is using its extended thinking capability, it could be described as benefiting from "serial test-time compute". That is, it uses multiple, sequential reasoning steps before producing its final output, adding more computational resources as it goes", the AI startup further explained. Anthropic further added that developers could also set a "thinking budget" to control precisely how long a period Claude spends on a problem. Despite the purportedly promising results, what might attract developers' and the general users' attention to test out Claude 3.7 Sonnet's responsiveness and accuracy is Anthropic's integration of quick answers and reasoning within a single model, something the company calls 'Hybrid Reasoning'. This would be a departure from a so far widespread strategy of developing two separate models for use case variance, and can also help users by allowing them to "control speed and cost by choosing when to use reasoning capabilities."
[8]
Anthropic's Newest AI Wants to Be a Pokémon Master. Here's Why That's a Big Deal
David Hershey, a member of Anthropic's technical team, tells Inc. that staffers were inspired by a YouTube video in which an original reinforcement learning model was trained to play Pokémon, so they created a virtual environment in which Claude could attempt to play the game. Eventually, around June 2024, Hershey (a self-proclaimed Pokémon fan) took up the idea as a side project, first using it to test the capabilities of Claude 3.5 Sonnet, the new model at the time. He found that while earlier versions of Claude would immediately get stuck, Claude 3.5 could progress further, successfully catching a Pokémon and leaving the starting area of Pallet Town. For the uninitiated, the goal of Pokémon Red is to catch adorable creatures, train them by battling against non-playable characters, and win badges from powerful enemies called Gym Leaders. Pokemon is, of course, wildly popular, and is considered the highest-grossing media franchise of all time.
Share
Copy Link
Anthropic's latest AI model, Claude 3.7 Sonnet, is playing Pokémon Red on Twitch, showcasing AI capabilities and captivating audiences with its slow but determined progress through the classic game.
Anthropic, the AI research company, has launched an intriguing experiment that has captured the attention of both AI enthusiasts and gaming fans alike. Their latest AI model, Claude 3.7 Sonnet, is currently playing the classic Game Boy game Pokémon Red live on Twitch, showcasing the capabilities of advanced AI in navigating a complex virtual world 1.
Claude 3.7 Sonnet has made significant strides in the game, surpassing its predecessor Claude 3.5. While the earlier version struggled to leave the starting area of Pallet Town, Claude 3.7 has managed to collect three gym badges, including defeating Lt. Surge at the Vermilion City Gym 2. This progress demonstrates a marked improvement in the AI's ability to understand and navigate the game world.
The AI has shown some endearing behaviors, such as nicknaming its Pokémon. For instance, it named its starter Squirtle "Shell," adding a touch of personality to its virtual adventure 2.
Claude interacts with the game through a custom interface that allows it to press virtual buttons corresponding to the Game Boy controls. The AI processes the game environment by analyzing screenshots and accessing certain game memory data, such as the player's coordinates 3.
One of the key features of Claude 3.7 is its ability to maintain a dynamic knowledge base. As it plays, it updates its understanding of the game mechanics, locations, and strategies, allowing it to build upon its experiences 1.
Despite its progress, Claude's gameplay is not without challenges. The AI often moves at a glacial pace, carefully reasoning through each action. It can get stuck in loops or confused by simple obstacles, such as repeatedly trying to walk through a rock wall before realizing it needs to go around 2.
These moments of confusion provide insight into the AI's decision-making process and highlight the complexities involved in teaching an AI to navigate a game designed for human players 4.
The Twitch stream has garnered significant attention, with thousands of viewers tuning in to watch Claude's progress. The stream displays Claude's "thought process" alongside the gameplay, offering a unique glimpse into the AI's reasoning 3.
Viewers have reacted with a mix of amusement, frustration, and fascination. Some cheer on the AI's successes, while others express impatience with its slower moments. The experiment has sparked discussions about AI capabilities and limitations in gaming contexts 5.
Anthropic sees this experiment as more than just entertainment. It represents a shift towards more accessible and understandable benchmarks for AI capabilities. Traditional metrics often fail to capture the nuanced improvements in AI models, whereas progress in a familiar game like Pokémon is easier for the general public to grasp 1.
This approach to showcasing AI abilities through gaming aligns with a broader trend in the field. Other companies, like OpenAI, have also been exploring new ways to measure and demonstrate the capabilities of their AI models 1.
NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.
10 Sources
Technology
21 hrs ago
10 Sources
Technology
21 hrs ago
Nvidia is reportedly developing a new AI chip, the B30A, based on its latest Blackwell architecture for the Chinese market. This chip is expected to outperform the currently allowed H20 model, raising questions about U.S. regulatory approval and the ongoing tech trade tensions between the U.S. and China.
11 Sources
Technology
22 hrs ago
11 Sources
Technology
22 hrs ago
SoftBank Group has agreed to invest $2 billion in Intel, buying common stock at $23 per share. This strategic investment comes as Intel undergoes a major restructuring under new CEO Lip-Bu Tan, aiming to regain its competitive edge in the semiconductor industry, particularly in AI chips.
18 Sources
Business
14 hrs ago
18 Sources
Business
14 hrs ago
Databricks, a data analytics firm, is set to raise its valuation to over $100 billion in a new funding round, showcasing the strong investor interest in AI startups. The company plans to use the funds for AI acquisitions and product development.
7 Sources
Business
6 hrs ago
7 Sources
Business
6 hrs ago
OpenAI introduces ChatGPT Go, a new subscription plan priced at ₹399 ($4.60) per month exclusively for Indian users, offering enhanced features and affordability to capture a larger market share.
15 Sources
Technology
14 hrs ago
15 Sources
Technology
14 hrs ago