12 Sources
[1]
DeepMind reveals Genie 3 "world model" that creates real-time interactive simulations
While no one has figured out how to make money from generative artificial intelligence, that hasn't stopped Google DeepMind from pushing the boundaries of what's possible with a big pile of inference. The capabilities (and costs) of these models have been on an impressive upward trajectory, a trend exemplified by the reveal of Genie 3. A mere seven months after showing off the Genie 2 "foundational world model," which was itself a significant improvement over its predecessor, Google now has Genie 3. With Genie 3, all it takes is a prompt or image to create an interactive world. Since the environment is continuously generated, it can be changed on the fly. You can add or change objects, alter weather conditions, or insert new characters -- DeepMind calls these "promptable events." The ability to create alterable 3D environments could make games more dynamic for players and offer developers new ways to prove out concepts and level designs. However, many in the gaming industry have expressed doubt that such tools would help. It's tempting to think of Genie 3 simply as a way to create games, but DeepMind sees this as a research tool, too. Games play a significant role in the development of artificial intelligence because they provide challenging, interactive environments with measurable progress. That's why DeepMind previously turned to games like Go and StarCraft to expand the bounds of AI. World models take that to the next level, generating an interactive world frame by frame. This provides an opportunity to refine how AI models -- including so-called "embodied agents" -- behave when they encounter real-world situations. One of the primary limitations as companies work toward the goal of artificial general intelligence (AGI) is the scarcity of reliable training data. After piping basically every webpage and video on the planet into AI models, researchers are turning toward synthetic data for many applications. DeepMind believes world models could be a key part of this effort, as they can be used to train AI agents with essentially unlimited interactive worlds. DeepMind says Genie 3 is an important advancement because it offers much higher visual fidelity than Genie 2, and it's truly real-time. Using keyboard input, it's possible to navigate the simulated world in 720p resolution at 24 frames per second. Perhaps even more importantly, Genie 3 can remember the world it creates.
[2]
DeepMind thinks its new Genie 3 world model presents a stepping stone towards AGI | TechCrunch
Google DeepMind has revealed Genie 3, its latest foundation world model that can be used to train general-purpose AI agents, a capability that the AI lab says makes for a crucial stepping stone on the path to "artificial general intelligence," or human-like intelligence. "Genie 3 is the first real-time interactive general purpose world model," Shlomi Fruchter, a research director at DeepMind, said during a press briefing. "It goes beyond narrow world models that existed before. It's not specific to any particular environment. It can generate both photo-realistic and imaginary worlds, and everything in between." Still in research preview and not publicly available, Genie 3 builds on both its predecessor Genie 2 (which can generate new environments for agents) and DeepMind's latest video generation model Veo 3 (which is said to have a deep understanding of physics). With a simple text prompt, Genie 3 can generate multiple minutes of interactive 3D environments at 720p resolution at 24 frames per second -- a significant jump from the 10 to 20 seconds Genie 2 could produce. The model also features "promptable world events," or the ability to use a prompt to change the generated world. Perhaps most importantly, Genie 3's simulations stay physically consistent over time because the model can remember what it previously generated -- a capability that DeepMind says its researchers didn't explicitly program into the model. Fruchter said that while Genie 3 has implications for educational experiences, gaming or prototyping creative concepts, its real unlock will manifest in training agents for general purpose tasks, which he said is essential to reaching AGI. "We think world models are key on the path to AGI, specifically for embodied agents, where simulating real world scenarios is particularly challenging,"Jack Parker-Holder, a research scientist on DeepMind's open-endedness team, said during the briefing. Genie 3 is supposedly designed to solve that bottleneck. Like Veo, it doesn't rely on a hard-coded physics engine; instead, DeepMind says, the model teaches itself how the world works - how objects move, fall, and interact - by remembering what it has generated and reasoning over long time horizons. "The model is auto-regressive, meaning it generates one frame at a time," Fruchter told TechCrunch in an interview. "It has to look back at what was generated before to decide what's going to happen next. That's a key part of the architecture." That memory, the company says, lends to consistency in Genie 3's simulated worlds, which in turn allows it to develop a grasp of physics, similar to how humans understand that a glass teetering on the edge of a table is about to fall, or that they should duck to avoid a falling object. Notably, DeepMind says the model also has the potential to push AI agents to their limits -- forcing them to learn from their own experience, similar to how humans learn in the real world. As an example, DeepMind shared its test of Genie 3 with a recent version of its generalist Scalable Instructable Multiworld Agent (SIMA), instructing it to pursue a set of goals. In a warehouse setting, they asked the agent to perform tasks like "approach the bright green trash compactor" or "walk to the packed red forklift." "In all three cases, the SIMA agent is able to achieve the goal," Parker-Holder said. "It just receives the actions from the agent. So the agent takes the goal, sees the world simulated around it, and then takes the actions in the world. Genie 3 simulates forward, and the fact that it's able to achieve it is because Genie 3 remains consistent." That said, Genie 3 has its limitations. For example, while the researchers claim it can understand physics, the demo showing a skier barreling down a mountain didn't reflect how snow would move in relation to the skier. Additionally, the range of actions an agent can take is limited. For example, the prompt-able world events allow for a wide range of environmental interventions, but they're not necessarily performed by the agent itself. And it's still difficult to accurately model complex interactions between multiple independent agents in a shared environment. Genie 3 can also only support a few minutes of continuous interaction, when hours would be necessary for proper training. Still, the model presents a compelling step forward in teaching agents to go beyond reacting to inputs, letting them potentially plan, explore, seek out uncertainty, and improve through trial and error - the kind of self-driven, embodied learning that many say is key to moving towards general intelligence. "We haven't really had a Move 37 moment for embodied agents yet, where they can actually take novel actions in the real world," Parker-Holder said, referring to the legendary moment in the 2016 game of Go between DeepMind's AI agent AlphaGo and world champion Lee Sedol, in which Alpha Go played an unconventional and brilliant move that became symbolic of AI's ability to discover new strategies beyond human understanding. "But now, we can potentially usher in a new era," he said.
[3]
DeepMind reveals Genie 3, a world model that could be the key to reaching AGI | TechCrunch
Google DeepMind has revealed Genie 3, its latest foundation world model that the AI lab says presents a crucial stepping stone on the path to artificial general intelligence, or human-like intelligence. "Genie 3 is the first real-time interactive general purpose world model," Shlomi Fruchter, a research director at DeepMind, said during a press briefing. "It goes beyond narrow world models that existed before. It's not specific to any particular environment. It can generate both photo-realistic and imaginary worlds, and everything in between." Genie 3, which is still in research preview and not publicly available, builds on both its predecessor Genie 2 - which can generate new environments for agents - and DeepMind's latest video generation model Veo 3 - which exhibits a deep understanding of physics. With a simple text prompt, Genie 3 can generate multiple minutes - up from 10 to 20 seconds in Genie 2 - of diverse, interactive, 3D environments at 24 frames per second with a resolution of 720p. The model also features "promptable world events," or the ability to use a prompt to change the generated world. Perhaps most importantly, Genie 3's simulations stay physically consistent over time because the model is able to remember what it had previously generated - an emergent capability that DeepMind researchers didn't explicitly program into the model. Fruchter said that while Genie 3 clearly has implications for educational experiences and new generative media like gaming or prototyping creative concepts, its real unlock will manifest in training agents for general purpose tasks, which he said is essential to reaching AGI. "We think world models are key on the path to AGI, specifically for embodied agents, where simulating real world scenarios is particularly challenging,"Jack Parker-Holder, a research scientist on DeepMind's open-endedness team, said during a briefing. Genie 3 is designed to solve that bottleneck. Like Veo, it doesn't rely on a hard-coded physics engine. Instead, it teaches itself how the world works - how objects move, fall, and interact - by remembering what it has generated and reasoning over long time horizons. "The model is auto-regressive, meaning it generates one frame at a time," Fruchter told TechCrunch in a separate interview. "It has to look back at what was generated before to decide what's going to happen next. That's a key part of the architecture." That memory creates consistency in its simulated worlds, and that consistency allows it to develop a kind of intuitive grasp of physics, similar to how humans understand that a glass teetering on the edge of a table is about to fall, or that they should duck to avoid a falling object. This ability to simulate coherent, physically plausible environments over time makes Genie 3 much more than a generative model. It becomes an ideal training ground for general-purpose agents. Not only can it generate endless, diverse worlds to explore, but it also has the potential to push agents to their limits - forcing them to adapt, struggle, and learn from their own experience in a way that mirrors how humans learn in the real world. Currently, the range of actions an agent can take is still limited. For example, the promptable world events allow for a wide range of environmental interventions, but they're not necessarily performed by the agent itself. Similarly, it's still difficult to accurately model complex interactions between multiple independent agents in a shared environment. Genie 3 can also only support a few minutes of continuous interaction, when hours would be necessary for proper training. Still, Genie 3 presents a compelling step forward in teaching agents to go beyond reacting to inputs so they can plan, explore, seek out uncertainty, and improve through trial and error - the kind of self-driven, embodied learning that's key in moving towards general intelligence. "We haven't really had a Move 37 moment for embodied agents yet, where they can actually take novel actions in the real world," Parker-Holder said, referring to the legendary moment in the 2016 game of Go between DeepMind's AI agent AlphaGo and world champion Lee Sedol, in which Alpha Go played an unconventional and brilliant move that became symbolic of AI's ability to discover new strategies beyond human understanding. "But now, we can potentially usher in a new era," he said.
[4]
Google's new AI model creates video game worlds in real time
Google DeepMind is releasing a new version of its AI "world" model, called Genie 3, capable of generating 3D environments that users and AI agents can interact with in real time. The company is also promising that users will be able to interact with the worlds for much longer than before and that the model will actually remember where things are when you look away from them. World models are a type of AI system that can simulate environments for purposes like education, entertainment, or to help train robots or AI agents. With world models, you give them a prompt and they generate a space that you can move around in like you would in a video game, but instead of the world being handcrafted with 3D assets, it's all being generated with AI. It's an area Google is putting a lot of effort into; the company showed off Genie 2 in December, which could create interactive worlds based off of an image, and it's building a world models team led by a former co-lead of OpenAI's Sora video generation tool. But the models currently have a lot of drawbacks. Genie 2 worlds were only playable up to a minute, for example. I recently tried "interactive video" from a company backed by Pixar's cofounder, and it felt like walking through a blurry version of Google Street View where things morphed and changed in ways that I didn't expect as I looked around. Genie 3 seems like it could be a notable step forward. Users will be able to generate worlds with a prompt that supports a "few" minutes of continuous interaction, which is up from the 10-20 seconds of interaction possible with Genie 2, according to a blog post. Google says that Genie 3 can keep spaces in visual memory for about a minute, meaning that if you turn away from something in a world and then turn back to it, things like paint on a wall or writing on a chalkboard will be in the same place. The worlds will also have a 720p resolution and run at 24fps. DeepMind is adding what it calls "promptable world events" into Genie 3, too. Using a prompt, you'll be able to do things like change weather conditions in a world or add new characters. However, this probably isn't a model you'll be able to try for yourself. It's launching as "a limited research preview" that will be available to "a small cohort of academics and creators" so its developers can better understand the risks and how to appropriately mitigate them, according to Google. There are also plenty of restrictions, like the limited ways users can interact with generated worlds and that legible text is "often only generated when provided in the input world description." Google says it's "exploring" how to bring Genie 3 to "additional testers" down the line.
[5]
Google's Genie 3 Hints at a Future Where AI Builds the Video Games We Play
Using a text prompt, Google's latest Genie 3 model can create interactive 3D worlds that can be navigated with a mouse and keyboard. New advancements from Google suggest AI-generated video games might become a reality faster than previously thought. On Tuesday, the company's DeepMind research lab introduced Genie 3, an AI program that goes beyond image or video generation by letting you create 3D interactive worlds based on a text prompt. DeepMind introduced an earlier version, Genie 2, in December. But at the time, it could only create 3D worlds at 360p resolution, and you could only play with it for 10 to 20 seconds. In contrast, Genie 3 levels up the resolution to 720p. It can maintain visual consistency for a few minutes, letting the user navigate the 3D world for a longer time. The environment will also react to your actions. For example, you can paint a wall with a brush. In addition, the user can also type in new prompts to change the 3D environment, or what's called a "promptable event." For instance, you can ask for a man wearing a chicken suit or a flying dragon to be added, and the program will do just that. The result is a bit like a Star Trek holodeck, but within your PC. "We're excited to see how Genie 3 can be used for next-generation gaming and entertainment," Google's DeepMind team adds in a video. "And that's just the beginning." Other applications could include using Genie 3 for education and to train workers, including robots. The technology is especially impressive since it's able to create a huge diversity of fictional and real-world 3D environments while faithfully rendering the physics within them. Google introduced the technology as other companies, including Microsoft, have been experimenting with using generative AI for video game creation. It's clear generative AI could take character interactions and procedural generation in video games to new levels. But the topic has also become controversial since some fear any adoption of AI will lead to layoffs and dilute game quality. In the meantime, Google's DeepMind still needs to resolve certain limitations facing Genie 3. This includes maintaining the 3D world's consistency beyond a few minutes and raising the visual quality. DeepMind itself notes: "Accurately modeling complex interactions between multiple independent agents in shared environments is still an ongoing research challenge." The research lab also didn't mention the hardware needed to run Genie 3 or how long it needs to generate a 3D world. For now, the technology remains in the testing phase. The DeepMind team has only given early access "to a small cohort of academics and creators," it said in the announcement. But the goal is to expand the number of testers over time.
[6]
Google DeepMind's Genie 3 can dynamically alter the state of its simulated worlds
Promptable world events will allow the lab to train AI agents in new, more sophisticated ways. At start of December, Google DeepMind released Genie 2. The Genie family of AI systems are what are known as world models. They're capable of generating images as the user -- either a human or, more likely, an automated AI agent -- moves through the world the software is simulating. The resulting video of the model in action may look like a video game, but DeepMind has always positioned Genie 2 as a way to train other AI systems to be better at what they're designed to accomplish. With its new Genie 3 model, which the lab announced on Tuesday, DeepMind believes it has made an even better system for training AI agents. At first glance, the jump between Genie 2 and 3 isn't as dramatic as the one the model made last year. With Genie 2, DeepMind's system became capable of generating 3D worlds, and could accurately reconstruct part of the environment even after the user or an AI agent left it to explore other parts of the generated scene. Environmental consistency was often a weakness of prior world models. For instance, Decart's Oasis system had trouble remembering the layout of the Minecraft levels it would generate. By comparison, the enhancements offered by Genie 3 seem more modest, but in a press briefing Google held ahead of today's official announcement, Shlomi Fruchter, research director at DeepMind, and Jack Parker-Holder, research scientist at DeepMind, argued they represent important stepping stones in the road toward artificial general intelligence. So what exactly does Genie 3 do better? To start, it outputs footage at 720p, instead of 360p like its predecessor. It's also capable of sustaining a "consistent" simulation for longer. Genie 2 had a theoretical limit of up to 60 seconds, but in practice the model would often start to hallucinate much earlier. By contrast, DeepMind says Genie 3 is capable of running for several minutes before it starts producing artifacts. Also new to the model is a capability DeepMind calls "promptable world events." Genie 2 was interactive insofar as the user or an AI agent was able to input movement commands and the model would respond after it had a few moments to generate the next frame. Genie 3 does this work in real-time. Moreover, it's possible to tweak the simulation with text prompts that instruct Genie to alter the state of the world it's generating. In a demo DeepMind showed, the model was told to insert a herd of deer into a scene of a person skiing down a mountain. The deer didn't move in the most realistic manner, but this is the killer feature of Genie 3, says DeepMind. As mentioned before, the lab primarily envisions the model as a tool for training and evaluating AI agents. DeepMind says Genie 3 could be used to teach AI systems to tackle "what if" scenarios that aren't covered by their pre-training. "There are a lot of things that have to happen before a model can be deployed in the real world, but we do see it as a way to more efficiently train models and increase their reliability," said Fruchter, pointing to, for example, a scenario where Genie 3 could be used to teach a self-driving car how to safely avoid a pedestrian that walks in front of it. Despite the improvements DeepMind has made to Genie, the lab acknowledges there's much work to be done. For instance, the model can't generate real-world locations with perfect accuracy, and it struggles with text rendering. Moreover, for Genie to be truly useful, DeepMind believes the model needs to be able to sustain a simulated world for hours, not minutes. Still, the lab feels Genie is ready to make a real-world impact. "We already at the point where you wouldn't use [Genie] as your sole training environment, but you can certainly finds things you wouldn't want agents to do because if they act unsafe in some settings, even if those settings aren't perfect, it's still good to know," said Parker-Holder. "You can already see where this is going. It will get increasingly useful as the models get better." For the time being, Genie 3 isn't available to the general public. However, DeepMind says it's working to make the model available to additional testers.
[7]
Genie 3 creates real-time AI worlds from simple text prompts
Google DeepMind has introduced Genie 3, its most advanced world simulation model to date. The AI system can generate interactive, dynamic environments in real time from text prompts. Users can explore these generated worlds at 720p resolution and 24 frames per second, with consistency sustained over a few minutes. This release builds on years of research at DeepMind, where AI agents have been trained in simulated environments for games, robotics, and open-ended learning. Genie 3 marks a significant leap from earlier iterations, Genie 1 and Genie 2, by supporting real-time navigation and improved realism.
[8]
Watch this: Google Genie 3 can create a 3D world, let you explore it, and interact with it in real time
This means you can generate an environment, explore it, and change it on the fly Google's AI world model has just received a significant upgrade, as the technology giant, specifically Google DeepMind, is introducing Genie 3. This is the latest AI world model, and it kicks things into the proverbial high gear by letting the user generate a 3D world at 720p quality, explore it, and feed it new prompts to interact or change the environment all in real time. It's really neat, and I highly recommend you watch the announcement video from DeepMind that's embedded below. Genie 3 is also keenly different from, say, the still impressive Veo 3, as it offers video with audio that goes well beyond the 8-second limit. Genie 3 offers multiple minutes of what Google calls the 'interaction horizon,' allowing you to interact with the environment in real-time and make adjustments as needed. It's sort of like if AI and VR merged; it lets you build a world off a prompt, add new items in, and explore it all. Genie 3 appears to be an improvement over Genie 2, which was introduced in late 2024. In a chart shared within Google's DeepMind post, you can see the progression from GameNGen to Genie 2 to Genie 3, and even a comparison to Veo. Google's also shared a number of demos, including a few that you can try within the blog post, and it's giving us choose-your-adventure vibes. There are a few different scenes you can try on a snowy hill or even a goal you'd want the AI to achieve within a museum environment. Google sums it up as, "Genie 3 is our first world model to allow interaction in real-time, while also improving consistency and realism compared to Genie 2." And while my mind, and my colleague Lance Ulanoff's, went to interacting in this environment in a VR headset to explore somewhere new or even as a big boon for game developers to test out environments and maybe even characters, Google views this as - no surprise - a step towards AGI. That's Artificial General Intelligence, and the view here from DeepMind is that it can train various AI agents in an unlimited number of deeply immersive environments within Genie 3. Another key improvement with Genie 3 is its ability to persist objects within the world - for instance, we observed a set of arms and hands using a paint roller to apply blue paint to a wall. In the clip, we saw a few wide stripes of rolled blue paint on the wall, then turned away and looked back to see the paint marks still in the correct spots. It's neat, and similar to some of the object permanence that Apple's set to achieve with visionOS 26 - of course, that's overlaying onto your real-world environment, so maybe not quite as impressive. DeepMind lays out the limitations of Genie 3, noting that in its current version, the world model cannot "simulate real-world locations with perfect geographic accuracy" and that it only supports a few minutes of interaction. Genie 3's minutes of capability are still a significant jump over Genie 2, but it's not enabling hours of use. You also can't jump into the world of Genie 3 right now. It's available to a small set of testers. Google does note it's hoping to make Genie 3 available to other testers, but it's figuring out the best way to do so. It's unclear what the interface to interact with Genie 3 looks like at this stage, but from the shared demos, it's pretty clear that this is some compelling tech. Whether Google restricts its use to AI research and training, or it explores generating media, I have no doubt we'll see Genie 4 here in short order ... or at least an expansion of Genie 3. For now, I'll go back to playing with Veo 3.
[9]
Genie 3: A New Frontier for World Models
Given a text prompt, Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p. At Google DeepMind, we have been pioneering research in simulated environments for over a decade, from training agents to master real-time strategy games to developing simulated environments for open-ended learning and robotics. This work motivated our development of world models, which are AI systems that can use their understanding of the world to simulate aspects of it, enabling agents to predict both how an environment will evolve and how their actions will affect it. World models are also a key stepping stone on the path to AGI, since they make it possible to train AI agents in an unlimited curriculum of rich simulation environments. Last year we introduced the first foundation world models with Genie 1 and Genie 2, which could generate new environments for agents. We have also continued to push the state of the art in video generation with our models Veo 2 and Veo 3, which exhibit a deep understanding of intuitive physics. Each of these models marks progress along different capabilities of world simulation. Genie 3 is our first world model to allow interaction in real-time, while also improving consistency and realism compared to Genie 2.
[10]
Google outlines latest step towards creating artificial general intelligence
Genie 3 world model's ability to simulate real environments means it can be used to train robots Google has outlined its latest step towards artificial general intelligence (AGI) with a new model that allows AI systems to interact with a convincing simulation of the real world. The Genie 3 "world model" could be used to train robots and autonomous vehicles as they engage with realistic recreations of environments such as warehouses, according to Google. The US technology and search company's AI division, Google DeepMind, argues that world models are a key step to achieving AGI, a hypothetical level of AI where a system can carry out most tasks on a par with humans - not just individual tasks such as playing chess or translating languages - and potentially do someone's job. DeepMind said such models would play an important role in the development of AI agents, or systems that carry out tasks autonomously. "We expect this technology to play a critical role as we push toward AGI, and agents play a greater role in the world," DeepMind said. However, Google said Genie 3 is not yet ready for full public release and did not give a date for its launch, adding that the model had a range of limitations. The announcement comes amid ever-increasing competition in the AI market, with the chief executive of the ChatGPT developer, OpenAI, Sam Altman, sharing a screenshot on Sunday of what appeared to be the company's latest AI model, GPT-5. Google said the model could also help humans experience a range of simulations for training or exploring, replicating experiences such as skiing or walking around a mountain lake. Genie 3 creates its scenarios immediately from text prompts, according to DeepMind, and the simulated environment can be altered quickly - by for instance, introducing a herd of deer on to a ski slope - with further text prompts. The tech company showed the Genie 3-created skiing and warehouse scenarios to journalists on Monday but is not yet releasing the model to the public. The quality of the simulations seen by the Guardian are on a par with Google's latest video creation model, Veo 3, but they last minutes rather than the eight seconds offered by Veo 3. While AGI has been viewed through the prism of potentially eliminating white collar jobs, as autonomous systems carry out an array of jobs from sales agent to lawyer or accountant, world models are viewed by Google as a key technology for developing robots and autonomous vehicles. For instance, a recreation of a warehouse with realistic physics and people could help train a robot, as a simulation it "learned" from in training helps it achieve its goal. Google has also created a virtual agent, SIMA, that can carry out tasks in video game settings, although like Genie 3, it is not publicly available. Andrew Rogoyski of the Institute for People-Centred AI at the University of Surrey in the UK said world models could also help large language models - the technology that underpins chatbots such as ChatGPT. "If you give a disembodied AI the ability to be embodied, albeit virtually, then the AI can explore the world, or a world - and grow in capabilities as a result," he said. "While AIs are trained on vast quantities on internet data, allowing an AI to explore the world physically will add an important dimension to the creation of more powerful and intelligent AIs." In a research note accompanying the SIMA announcement last year, Google researchers said world models are important because large language models are effective at tasks such as planning but not at taking action on a human's behalf.
[11]
Google DeepMind debuts Genie 3 model for generating interactive virtual worlds - SiliconANGLE
Google DeepMind debuts Genie 3 model for generating interactive virtual worlds Alphabet Inc.'s Google DeepMind lab today detailed Genie 3, an artificial intelligence model designed to generate virtual worlds. The AI is the third iteration of an algorithm series that the company introduced last February. According to DeepMind, it could help advance machine learning research by creating higher-quality training environments for AI models. Genie 3 generates virtual worlds based on natural language prompts. It can simulate forecasts, Alpine landscapes and other outdoor locations along with indoor spaces. In one internal test, DeepMind researchers had Genie 3 generate an elaborate underwater environment complete with a large jellyfish. Users can change a virtual world with prompts. Genie 3 may be instructed to modify the weather, adjust the camera angle or add objects to an environment. It's also possible to simulate interactions between those objects. The algorithm's predecessor, Genie 2, could render virtual environments for up to 20 seconds at a time. DeepMind says that Genie 3 can manage up to several minutes. The Alphabet unit's researchers also boosted rendering quality from 360p to 720p, which corresponds to a resolution of 1280 by 720 pixels. Rendering consistency is another area where Genie 3 offers improvements. During user sessions, the model analyzes past frames to determine how future frames should be generated. "Generating an environment auto-regressively is generally a harder technical problem than generating an entire video, since inaccuracies tend to accumulate over time," DeepMind researchers Jack Parker-Holder and Shlomi Fruchter explained in a blog post today. "Despite the challenge, Genie 3 environments remain largely consistent for several minutes, with visual memory extending as far back as one minute ago." DeepMind believes that Genie 3 could lend itself to training embodied agents. Those are AI agents designed to power autonomous systems such as industrial robots. Often, such algorithms are trained in simulations of the real-world environments they will be expected to navigate. DeepMind put Genie 3's embodied agent training features to the test using an AI model called SIMA. The latter algorithm is designed to autonomously perform tasks in virtual environments. During the test, DeepMind researchers successfully instructed SIMA to perform a series of actions in environments generated by Genie 3. "In each world we instructed the agent to pursue a set of distinct goals, which it aims to achieve by sending navigation actions to Genie 3," Parker-Holder and Fruchter wrote. "Like any other environment, Genie 3 is not aware of the agent's goal, instead it simulates the future based on the agent's actions."
[12]
Google DeepMind Unveils Genie 3 as Step Toward AGI With Real-Time 3D World Generation | AIM
Access to Genie 3 is currently limited to a small group of academic researchers and creative professionals. Google DeepMind has announced Genie 3, its latest AI world model capable of generating interactive 3D environments in real time while preserving visual consistency. "Given a text prompt, Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p," the company said in its blog post. Previous models, such as Genie 2 allowed only around 10-20 seconds of interaction. https://youtu.be/PDKhUknuQDg Unlike earlier versions, Genie 3 retains objects in place even when users move the camera away and return later. This ability to remember elements such as wall markings or object positions is being presented as a step forward in the development of AI-driven virtual environments.
Share
Copy Link
Google DeepMind unveils Genie 3, an advanced AI world model capable of generating real-time, interactive 3D environments, marking a significant step towards artificial general intelligence (AGI).
Google DeepMind has unveiled Genie 3, its latest foundation world model, marking a significant advancement in artificial intelligence (AI) and a crucial step towards achieving artificial general intelligence (AGI). This innovative AI system can generate interactive 3D environments in real-time, offering unprecedented capabilities in simulating diverse worlds for AI training and potential applications in gaming and education 12.
Source: Interesting Engineering
Genie 3 represents a substantial leap forward from its predecessor, Genie 2. The new model can generate multiple minutes of interactive 3D environments at 720p resolution and 24 frames per second, a significant improvement over Genie 2's 10 to 20 seconds of interaction 2. The environments created by Genie 3 are not limited to specific settings; they can range from photo-realistic to entirely imaginary worlds 3.
One of the most notable features of Genie 3 is its ability to maintain physical consistency over time. The model can remember what it has previously generated, allowing for coherent and persistent environments 2. This emergent capability was not explicitly programmed but developed as part of the model's learning process 3.
Genie 3 builds upon DeepMind's latest video generation model, Veo 3, which demonstrates a deep understanding of physics. Unlike traditional approaches, Genie 3 doesn't rely on hard-coded physics engines. Instead, it teaches itself how the world works by remembering what it has generated and reasoning over long time horizons 23.
The model's architecture is auto-regressive, generating one frame at a time and looking back at previously generated content to determine what happens next. This approach allows Genie 3 to develop an intuitive grasp of physics, similar to human understanding 3.
Source: Analytics India Magazine
While Genie 3 has clear implications for gaming, educational experiences, and creative prototyping, its most significant potential lies in training AI agents for general-purpose tasks. DeepMind researchers believe that world models like Genie 3 are key to achieving AGI, particularly for embodied agents where simulating real-world scenarios is challenging 23.
The model's ability to create diverse, interactive environments could revolutionize how AI agents are trained. It allows for the creation of endless, varied worlds where agents can adapt, struggle, and learn from their own experiences, mirroring human learning in the real world 35.
Despite its advancements, Genie 3 still faces limitations. The range of actions an agent can take within the generated worlds is currently restricted, and accurately modeling complex interactions between multiple independent agents remains a challenge. Additionally, Genie 3 can only support a few minutes of continuous interaction, falling short of the hours necessary for comprehensive AI training 3.
DeepMind is cautious about the model's release, making it available only as a limited research preview to a small group of academics and creators. This approach allows the team to better understand and mitigate potential risks associated with the technology 4.
Source: TechCrunch
The introduction of Genie 3 hints at a future where AI could play a significant role in video game development and other interactive media. While this prospect excites some, it has also sparked concerns within the gaming industry about potential job displacement and impacts on game quality 5.
As research continues, DeepMind aims to expand Genie 3's capabilities, potentially ushering in a new era of AI-driven interactive experiences and bringing us closer to the goal of artificial general intelligence 23.
OpenAI launches two open-weight AI reasoning models, gpt-oss-120b and gpt-oss-20b, marking a significant shift in the company's strategy and responding to the growing dominance of Chinese open-source AI.
23 Sources
Technology
6 hrs ago
23 Sources
Technology
6 hrs ago
OpenAI implements new features in ChatGPT to address mental health concerns, including break reminders and improved detection of emotional distress.
17 Sources
Technology
22 hrs ago
17 Sources
Technology
22 hrs ago
Cloudflare alleges that AI search engine Perplexity is using stealth tactics to bypass website crawling restrictions, sparking a debate on the ethics of AI web crawlers and the future of internet content access.
23 Sources
Technology
22 hrs ago
23 Sources
Technology
22 hrs ago
OpenAI's ChatGPT is set to reach 700 million weekly active users, marking a significant milestone in AI adoption. This growth comes as the company prepares to launch GPT-5, integrating advanced reasoning capabilities into its flagship model.
5 Sources
Technology
22 hrs ago
5 Sources
Technology
22 hrs ago
Google takes a jab at Apple's delayed AI features in a new Pixel 10 ad, highlighting the ongoing competition in AI-powered smartphone assistants.
13 Sources
Technology
22 hrs ago
13 Sources
Technology
22 hrs ago