24 Sources
[1]
Google's Gemini Omni turns images, audio, and text into video -- and that's just the start | TechCrunch
When Google launched Gemini three years ago, the goal was to build a multimodal large language model -- a single neural network that was trained on text, image, audio, and video and could generate content in any of those formats. Today, at its Google I/O developer conference, the company took a concrete step toward that goal with Gemini Omni, a new family of multimodal models that Google CEO Sundar Pichai says will be able to "create anything from any input." Omni will start with video. Users can now combine images, audio, video, and text, and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The result is high-quality videos that reflect an understanding of physics, culture, history, and science. Omni also lets users edit photos with plain text commands rather than complex editing software, similar to Google's Nano Banana. Google already has a dedicated video model, Veo, that lets users turn text and images into videos, and even direct and customize avatars. But Google DeepMind director of product management Nicole Brichtova says that today's release is more than a Veo update: "It's the next step towards the progression of combining the intelligence of Gemini with the rendering capabilities of our media models." One example that Koray Kavukcuoglu, DeepMind's chief technologist, gave reporters during a media briefing on Monday: When Omni was given a simple prompt like "a claymation explainer of protein folding," it quickly rendered a video of a stop-motion explainer with a voice-over that said, "Proteins start as chains of amino acids. They fold into patterns like the alpha helix and flat sections called beta sheets, forming a perfect three-dimensional shape." The long-term vision for Omni is broader, involving the model being used to do things like generate images from audio, or audio from video. "When we first announced Gemini, it was our first AI model to be natively multimodal," Pichai said during the briefing. "We knew that training it on a combination of text, code, audio, images, and video would give it a deeper understanding of the world. With world models, AI is moving from predicting text to simulating reality. Gemini Omni is the next step in that direction." As part of the release, users will also be able to create videos with their own digital avatars -- something OpenAI popularized on its now-defunct Sora app with Cameos. To prevent deepfakes, users will have to go through a dedicated product onboarding, which involves recording themselves and speaking out a series of numbers, per Brichtova. The avatar then gets stored for future use. Additionally, all videos created with Omni will include Google's SynthID digital watermark, which allows users to verify if videos were generated via the Gemini products. The first model in the family is Gemini Omni Flash, which will roll out today to the Gemini app, YouTube Shorts, and AI creative studio Flow. Flash will be capable of rendering 10 seconds of video, which Brichtova says isn't a model limitation, but rather a decision based both on a desire to get it into more hands and an anticipation that most users won't want to make much longer videos yet. Longer video durations are in the pipeline for the near future, though. Google seems to be pitching Omni Flash as more of a consumer tool. The examples Brichtova and Gabe Barth-Maron, a research engineer at DeepMind, gave on a call with TechCrunch of uses for digital avatars were all personal: Making a video of yourself winning an award or going to the moon, or removing a passerby from the background of a video you took on vacation. Barth-Maron put it more simply: "They're like personalized memes." "We definitely did focus on making this easy to use for consumers," Brichtova said. "Not many video models have breached that chasm with consumers, so this is our play to do that." The ease of use comes with a caveat: Brichtova and Barth-Maron noted that editing prompts will need to be highly specific, otherwise Omni risks over-editing or unintentionally altering elements the user wanted to keep -- a problem Nano Banana users would have run into. Despite the near-term consumer focus, Omni's enterprise and creative implications are obvious, and Google will make Omni available via API in the coming weeks. The avatar-generating tool -- a capability that is available today on Shorts -- is something Google expects content creators to pick up. But more broadly, an end-to-end multimodal workflow could be transformative for advertisers and filmmakers. Startup Luma AI is building something similar, an agentic tool that can generate an entire ad campaign based on a short brief and a product image, powered by its own "unified" model. "We're actually pretty proud of the model's text-rendering capabilities, which is really useful for things like advertising," Brichtova said. "If you want a product somewhere, or even just a slogan, it needs to be accurate ... We definitely anticipate filmmakers and other kinds of creators are going to be using this model as well." The more professional use cases might be better served by the Omni Pro model, which should perform better across all Omni tasks. Google hasn't said when it will release Pro yet, but Brichtova said that will happen when "we feel like we're at a point where we have a step change above Flash."
[2]
Gemini Omni Will Bring Only More AI Slop and Skepticism
Gemini Omni was announced at the Shoreline Amphitheatre in Mountain View, California, alongside a slew of agentic AI features, such as Gemini Spark and Universal Cart (check out more on our live blog). Gemini Omni is Google's new AI content-generation tool that creates graphics and videos based on your prompts. I hate to burst Google's bubble, but we already have enough AI-content generation tools, including Google's own Nano Banana 2, an AI image generator. OpenAI, Shutterstock and Canva also have AI video generation capabilities. Gemini Omni Flash is now available, so you can create and edit videos in the Gemini app, Google Flow and YouTube Shorts. Output modalities, like images and audio, will be available later. Google gave an example at Google I/O of creating a claymation-style video about how protein is created. Maybe Gemini Omni will be a cool way to teach my 5-year-old about science, but we all know this is just another tool for more AI slop. CNET found that many of us are already tired of the futuristic content on our social media timelines. Earlier this year, CNET found that 51% of US adults believe we need better AI labels online, and 21% believe there should be a total ban on AI-generated content on social media. Only 11% say AI content is useful, informative or entertaining. Despite over half of US adults wanting tighter AI content restrictions, AI is still getting past us. Most (94%) of US adults believe they see AI-generated or -altered content on social media. However, only 44% say they can confidently tell real content from AI-generated photos and videos. Google appeared to reveal a solution to that problem at Google I/O. Google is adding Content Credentials verification across its Gemini app to show whether the content was created with AI or a camera. It will also detect whether it's been edited with AI. The SynthID detector is still available on the app to verify AI-generated content. It sounds like Google is trying to please everyone while still pushing its agenda of shoving AI into every tool and software update possible. Between Nano Banana Pro and Gemini Omni, that feels like a paradox. The same tech giant that provides the tools to create AI-generated content is also developing tools to verify it. Gemini Omni is one of many examples of how Google's developer showcase is out of touch with everyday consumers. There's still hesitancy and mistrust toward AI, despite how helpful it can be in some use cases. We don't need another AI content generation tool. We need to understand how companies are protecting our data, not by hosting demo planning parties or by using smart glasses to turn crowds into cartoons.
[3]
Google's new Omni AI tool will let you video clone yourself - I'm intrigued (and concerned)
Today, Google announced a new AI video capability that will either help creatives produce higher-quality videos more easily, or vastly increase the amount of AI slop on YouTube. I'm betting it'll be a mix of both. Google announced Gemini Omni, a tool that raises the ability to create video via AI to an entirely new level. The company compared this announcement to the level of AI image generation improvement that came about when it released Nano Banana. Also: I tested ChatGPT Images 2.0 vs. Gemini Nano Banana to see which is better Nano Banana raised the bar considerably on what was possible with image generation. Omni purports to do the same with video. Omni will be rolling out starting today, but I didn't have a chance to play with it prior to the announcement. Google described Omni as "where Gemini's ability to reason meets the ability to create." Interestingly, according to the company, "With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge." Although Omni is "starting with video," Google said the new model can "create anything from any input," so presumably we'll see other media types generated by the tool within due time. Omni will also be available in model tiers, starting now with Gemini Omni Flash. The capability is coming to the Gemini app, Google Flow, and YouTube Shorts. It's not clear whether the web version of Gemini will support Omni, or whether you'll need to use the Flow interface via your browser. There are some standout features that make this a very interesting offering. I honestly can't decide if this is going to be a standout feature, a very big concern for privacy, or an untethered slop generator. The company said you can create videos "with your own voice by using Avatars, which create a digital version of yourself so you can generate videos that look and sound like you." Also: I used Nano Banana 2 to make perfect sketchnotes: 5 lessons learned As a regular producer of YouTube videos for my channel, I'm intrigued. There have been times when I wanted to put out a video, but was having a bad hair day, a bad voice day, or a bad attitude day, and I just didn't want that to come across in video. Could I just feed a script into my digital twin avatar and have RoboDave do the talking? Would my audience notice? Would they care? Would they hate it? Would I? Clearly that's an area worthy of experimentation, but it's probably not something I'll use often. I do my YouTube channel, in part, to keep my speaking and presentation chops up. Foisting that work on a digital avatar might reduce my workload, but it would also reduce my training and practice. Google is very careful to say that it's incorporating its SynthID digital fingerprinting technology in these videos, so they can be verified as having been produced with Omni. Google also said, "Beyond the avatar feature, in terms of editing videos to change audio and speech, we are still working to test this and better understand how we can bring this capability to users responsibly." Some of you may remember the early days of video games, when characters behaved more like ragdolls than objects in the physical world. As games got better, they began to incorporate physics models, so if something got shot, knocked back, or dropped, it did so in a matter consistent with the physics of the object. Omni now incorporates physics into the videos it creates. Google said it has "an improved intuitive understanding of forces like gravity, kinetic energy, and fluid dynamics." It also uses Gemini's knowledge to "connect language, imagery, and meaning in ways that go far beyond pattern matching." The company said Omni can build detailed videos from short prompts and can generate videos for things like explainers that break down fairly complex ideas. I don't doubt this. The analysis capabilities of NotebookLM's audio overview and video overview to be able to create explainers are astonishing. If some of that technology found its way into Omni, things could get interesting quickly. Also: I run a very small business. Here are 21 simple ways AI saves me time every day I actually fed marketing documents and spec sheets into NotebookLM and it produced explainer videos for various features of my security product that were better than anything I could have done by hand, especially in the time it took. The visuals at the time weren't great, but having complex features explained in a clean video in under 30 minutes was a force-multiplier for my product release schedule. One of Nano Banana's early standout features was its ability to recontextualize an image. For example, I had it take a picture of me walking in a park and change it so I was wearing something close to an admiral's uniform on the bridge of an aircraft carrier. While it didn't get the uniform fruit salad and brass quite right, it did manage to accurately reproduce my body and face. Also: I turned casual selfies into professional headshots with Gemini Omni proposes to take that to video, turning image, text, video, or audio into a "cohesive output." Right now, the only audio it will accept is voice recordings, but the company said it'll "roll out other types of audio inputs soon." The company also said you can create scenes, match styles, describe what you want in natural language, and get character consistency throughout the video. One aspect of producing videos I do not enjoy is the editing process. It's often enormously tedious. But, with Omni, "Gemini Omni gives you an easier way to edit video - with natural language. Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before." Google also said you can change elements in the video. I can see a huge benefit if it's possible to import a video and have the editor remove obstructions or change objects and backgrounds. It's not clear how long a clip can be, or exactly how much editing you can do with Omni on a given plan, but those possibilities are exciting. Two other transformations the company said the new Omni can do are: Additionally, Google hasn't yet specified video format or resolution. Will this be a professional tool that can handle 16:9 videos in 4K or 8K resolution, or is it meant to be a tool for the YouTube Shorts generation? Also: Are Sora 2 and other AI video tools risky to use? Here's what a legal scholar says When OpenAI introduced Sora, it was a novelty. While users abused it (we gave Sam Altman blue hair and made him sing ZDNET's praise), it never managed to be a tool that helped a professional's workflow. While producing AI avatar clones and replacing objects might be fun, I'm hoping this capability is extended so that it's usable either inside Final Cut, Premiere Pro, and DaVinci Resolve, or at least integrated enough that those tools can use edits created by Omni. It's possible. Omni's features will be rolling out to enterprise customers and developers via a Google API. I'm also curious if Omni will embed the little diamond watermark in the corner of its videos, like it does with Nano Banana's generated images. While it's nice to know a clip was generated by AI, such watermarking gets in the way of using the AI as a professional tool. Will we see licensing tiers where the watermark can be removed? Or will we see third-party tools crop up that remove the watermark, whether Google wants you to or not? Time will tell. Would you use Google Omni to create a digital avatar of yourself for videos you didn't want to record in person? Let us know in the comments below.
[4]
Google's Gemini Omni Tries to Fill the Void Left by OpenAI's Sora
OpenAI's Sora is gone, but Google is filling the void with its own AI-powered video-generation function called Gemini Omni. At Google I/O, the company debuted Gemini Omni, a tool for creating AI-generated video clips from your existing photos, selfies, or videos. In a demo, Google's AI chief Demis Hassabis showed how you can use the new AI model to drastically change your surroundings while taking a video of yourself. This might include placing yourself on Mars, in a lush forest, or adding a disco ball in the background. Omni isn't just an AI filter, though. It actually represents a step to build a "world" model designed to accurately simulate real-world physics as part of Google's effort to create artificial general intelligence. The same capability enables it to create realistic-looking videos across a wide variety of styles and topics. In another short demo, Hassabis showed Omni can create an educational video using claymation, breaking down scientific concepts for kids. Google plans on making the Omni Flash model available today through the Gemini App, Google Flow, and on YouTube Shorts. It arrives after OpenAI officially discontinued both the Sora app and web experience last month to use the company's AI computing power for other projects. Sora faced plenty of controversy and legal action over the technology creating AI-generated videos featuring characters from popular franchises and dead celebrities. In contrast, Google is framing Omni as a tool to reimagine your own personal photos or videos by adding fictional AI elements, which might help sidestep potential legal battles. Still, we could also see the capability unleashing deepfakes that could fool the public. For now, Omni will focus on creating video outputs. But the company will eventually expand it to include image and text.
[5]
Google Introduces Gemini Omni, a Multimodal AI That Knows the World
Built on Gemini modeling architecture, Omni is a true multi-modal input and output system, allowing you to create videos from text, images and existing videos. At launch, you'll be able to create videos with the aforementioned inputs, but image; text generations will be available in a future update. With Gemini at its core, Omni can process and interpret multiple types of inputs to produce a consistent, sophisticated final product. Omni builds on Google's existing products by integrating Gemini Intelligence. The rise of AI-created videos comes at a paradoxical time as companies such as Google make incredible advances with the technology, while social media feeds become more filled with AI slop. Google considers Omni the "next big step" toward building AI that can model and simulate the real world. It's a world model with advanced reasoning, capable of generating videos grounded in the world we know today. Omni demonstrates advanced physics capabilities, enabling it to create realistic video outputs. Here's what's coming in Gemini Omni from Google I/O. As with its powerful video generation, Omni also has advanced video editing capabilities. If you create a video with Omni, you can feed it back into the tool, make impressive changes with just a prompt or incorporate additional media. You can even upload your own videos and change or swap out individual elements, allowing for a new way to edit videos that has essentially never been available before. That ability to fully replace elements in a person's video could lead to some dark outcomes, making Omni's advanced editing abilities as alarming as they are impressive. But Google has built-in guardrails. First, any output from Omni will automatically include Google's SynthID watermark, so you know that what you're viewing has been altered in some way by AI. This is a big deal, as Omni essentially lets you change how reality is perceived. People will be able to play with Gemini Omni in a variety of ways. It's a prominent feature within the newly redesigned Gemini app, where you can add built-in templates to your camera roll with a single click. Additionally, you'll be able to create a custom avatar that looks and sounds like you and add it to videos. For some paid subscribers, Omni will be available on Google Flow and YouTube Shorts, starting on Tuesday. Omni will roll out to developers and enterprise customers via APIs in the coming weeks, allowing for custom integrations. Like most Gemini models, Omni will be split into Flash and Pro versions, though the former will be available initially. Google is working on an even more powerful model, Omni Pro, which will become available in the future.
[6]
Google's Gemini Omni can generate "anything from any input," including video
Patrick O'Rourke is XDA's News Editor and Entertainment Segment Lead. Previously, he was Pocket-lint's Editor-in-Chief, the Editor-in-Chief of Canadian tech publication MobileSyrup, and earlier in his career, he worked as the technology editor at the Financial Post and Postmedia. He's based in Toronto. Over the past 15 years, he's written thousands of articles. Patrick has also interviewed dozens of tech industry executives and covered GDC, E3, Gamescom, WWDC, Apple keynotes, Samsung Unpacked events and more. Patrick has a BA in journalism from Toronto Metropolitan University. Summary Gemini Omni creates and edits high-quality videos from images, audio, video, and text. Omni better models gravity, fluid dynamics, and kinetic energy to boost realism in video. Gemini Omni Flash rolls out to the Gemini app, Google Flow, and YouTube Shorts; SynthID tags all AI content. Among a flurry of AI-related announcements at I/O 2026, including more Search AI integration, Gemini 3.5, a new Gemini Spark AI personal assistant, Google revealed Gemini Omni, an AI model that can "create anything from any input -- starting with video." The platform's first model, Gemini Omni Flash, is rolling out today to the Gemini app, Google Flow, and YouTube Shorts. The tech giant describes Gemini Omni as the "next step up" from Nano Banana (and likely Veo 3.1, even though it's not mentioned). It allows users to "combine images, audio, video and text" to generate "high quality" videos, says Google. These videos can then be edited through natural language. While this sounds similar to Veo 3.1, Gemini Omni seems to be capable of taking this concept to the next level. For example, the AI tool lets you shoot a video and ask Omni to change the content, including adding "new characters or objects" and even turning "a moment into something unexpected." Google thinks Gemini Omni is a notable step over Nano Banana and Veo 3.1 It's unclear if the tech giant has dug itself out of the uncanny valley of AI weirdness During its I/O 2026 keynote, Google also outlined how Omni can better understand gravity, fluid dynamics, and kinetic energy to make the AI content it generates look more realistic. All content created with Omni is tagged with Google's SynthID digital watermark to indicate that it features AI and was created with Gemini Omni. It's unclear whether Gemini Omni has reached the point where it's gotten rid of the uncanny valley look that realistic AI videos almost always feature, but Google seems to think so given its lofty claims about the platform. The very brief Gemini Omni demo video was definitely impressive, but it's unclear what actual videos will look like when users start creating AI-generated and augmented videos with Omni Flash. Gemini Omni Flash is available to all Google AI Plus Pro and Ultra-tier subscribers globally and is rolling out to YouTube Shorts and the YouTube Create App later this week. Related Google's Gemini Intelligence can fill your shopping cart straight from your notes app Agentic mobile AI is here. Posts By Simon Batt
[7]
Google's Gemini Omni can generate 'anything from any input,' starting with video - Engadget
Google didn't forget AI creators in its latest round of Gemini announcements. Google didn't forget AI creators in its latest round of Gemini announcements as part of Google I/O. The company just officially revealed Gemini Omni, a new model that can "create anything from any input -- starting with video," according to Google. The first model called Gemini Omni Flash is rolling out today to the Gemini app, Google Flow and YouTube Shorts. Google called Gemini Omni "the next step" up from Nano Banana and, presumably, its current video generator, Veo 3.1. It lets you "combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge," according to the tech giant. You can then edit those videos through natural conversation, with each instruction building on the last to keep characters and other elements consistent. Where Veo 3.1 was limited to video creations via prompts and images, Gemini Omni will accept a wider range of inputs and do a lot more. For instance, you can shoot a video, then just ask Omni to change what's happening. "Your video becomes a starting point for something you never could have filmed yourself," Google explained. "Edit the action, add in new characters or objects, or transform a moment into something unexpected. Change the environment, angle, style or even specific details." Omni also better understands physical forces like gravity, kinetic energy and fluid dynamics, so that scenes will be more realistic. It marries that with "Gemini's knowledge of history, science and cultural context, bridging the gap from photorealism to meaningful storytelling." The app can supposedly create compelling explainers from short prompts to generate visuals that break down more complex ideas. However, it will only support voice references for audio output to start. If you want to generate videos where you're the star, Omni lets you use your own voice to create a digital avatar that looks and sounds like you. If that sounds like a potential privacy nightmare, Google says it has "clear policies to protect users from harm and governing the use of our AI tools." As far as editing videos to change audio and speech, the company is still testing that function in order to bring it to users "responsibly." All videos will also use Google's imperceptible SynthID digital watermark to verify that videos were generated with Gemini Omni. All of that sounds great, but the main problem with Veo 3.1 and other video generator apps is that the video has an "uncanny valley" look, and is often hated by end users. To that end, it'll be interesting to see if the output quality matches Google's breathless claims. We'll find out soon, as Gemini Omni Flash is now available to all Google AI Plus, Pro and Ultra subscribers globally and rolling out to users of YouTube Shorts and the YouTube Create App starting this week.
[8]
Google launches Gemini Omni Flash, a conversational video-generation model with avatar mode held back
The first model in DeepMind's new Omni family will generate and edit video from any combination of image, audio, video, and text inputs. Speech-editing is being withheld; SynthID watermarking is on by default. Google introduced Gemini Omni on Tuesday at the I/O 2026 developer conference, a new multimodal model family from Google DeepMind designed to generate and edit video from any combination of image, audio, video, and text inputs. The first model in the family, Gemini Omni Flash, started rolling out the same day to the Gemini app and Google Flow for Google AI Plus, Pro, and Ultra subscribers, and to YouTube Shorts and the YouTube Create app at no cost. API access for developers and enterprise customers will follow in the coming weeks. The product framing, from Koray Kavukcuoglu, CTO of Google DeepMind and Chief AI Architect at Google, is that Omni 'combines images, audio, video, and text as input and generates high-quality videos grounded in Gemini's real-world knowledge.' Inputs can be mixed in a single prompt. Edits are made conversationally, with each instruction building on the previous one, so that characters, physics, and scene context persist across turns. Output modalities beyond video, including image and audio generation, are 'coming in time,' Kavukcuoglu wrote on the company's blog. Omni's positioning, on the published materials, rests on three claims. First, the model has an improved intuitive understanding of physical forces, including gravity, kinetic energy, and fluid dynamics, allowing it to generate scenes with more accurate physics. Second, it draws on Gemini's existing world knowledge to connect language, imagery, and meaning beyond pattern-matching, with the company demonstrating prompts that range from claymation protein-folding explainers to chain-reaction physics tracks. Third, the conversational-editing layer preserves consistency across multi-turn revisions, where prior video models have tended to drift on character identity or scene continuity. The release also extends the Omni family to digital-avatar generation. Avatars let users record their own voice and likeness to create videos that look and sound like them, with onboarding requiring recording yourself and speaking a series of numbers aloud. ]Beyond avatars, Google is explicitly withholding general-purpose audio and speech editing inside Omni for now. 'We are still working to test this and better understand how we can bring this capability to users responsibly,' Kavukcuoglu wrote, in a paragraph that third-party coverage has read as a deliberate step back from the deepfake-adjacent territory of consent-free voice editing. All videos generated with Omni will carry Google's SynthID imperceptible digital watermark by default. Users can verify whether a clip was generated by Omni through the Gemini app, Gemini in Chrome and Google Search, the company said. The SynthID layer is the same watermarking infrastructure OpenAI adopted earlier this year under the C2PA open standard, and is now positioned as the cross-industry default for AI-generated visual provenance. On the disclosed initial limits, Flash-tier clips are capped at 10 seconds at launch, a deployment decision rather than a model constraint. The cap is shorter than OpenAI's Sora maximum of 60 seconds, where Sora's tokenisation-of-spatiotemporal-patches architecture is the closest published frontier-model comparison. Google has not disclosed the per-clip cost structure, the compute footprint per generation, or the benchmark suite it used to evaluate Omni against Veo 3 or third-party models such as ByteDance's Seedance. Omni is the headline model in a wider I/O 2026 announcement that also included Gemini 3.5 and what Sundar Pichai called the 'agentic Gemini era' in his keynote post. The strategic question for the model, on the announcement and immediate analyst reads, is whether the multi-input conversational editing flow is genuinely a new product category or a tighter integration of capabilities the broader frontier-video field has already demonstrated. The next visible proof point will be the API rollout to developers and enterprise customers in the coming weeks, where the cost structure and the upper bound on clip length under paid tiers will become public. What Google has not yet disclosed: the underlying Omni model architecture relative to Veo 3, the per-generation compute footprint, pricing for clips beyond the Flash tier, benchmark scores against DeepMind's own prior video models and competing frontier offerings, and the timeline for general-purpose audio and speech editing inside the Omni family. The avatar onboarding process and SynthID enforcement are, on the announcement, the company's formal answer to the consent-and-provenance questions the launch invites.
[9]
Google’s Gemini Omni AI Model Promises to Create 'Anything' From Any Type of Input
The tech giant is promoting Omni as essentially the Nano Banana of video. Google just announced Gemini Omni, a new AI model that it claims can “create anything from any input,†at its annual I/O developer conference on Tuesday. The company said the model is starting off with just video generation and editing capabilities. On its website, Google says to think of it like “Nano Banana â€" but for video,†referencing the company’s image model that came out last year. Gemini Omni Flash, the first model in the Omni family, can edit existing videos and generate new ones using plain-language prompts. It's already available to try on the Gemini app, Google Flow AI studio, and YouTube Shorts. “With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge. You can also easily edit your videos through conversation,†wrote Google DeepMind Chief Technology Officer Koray Kavukcuoglu in a blog post. As with Nano Banana, users can make edits that build off each other through natural conversation. The model is designed to keep characters and environments consistent across edits and use its knowledge of the real world including history, biology, physics, and narrative logic to make clips that actually make sense. The company has posted on its website several examples of what the model can do in practice. In one example, Google starts off with a video of a man touching a mirror. The model then creates several different versions of the clip based on text prompts like “make the mirror ripple beautifully like liquid†and “the entire environment turns into 3d voxel art†when the mirror is touched. Another example shows off the model’s audio capabilities. The video syncs the lights from an apartment building’s windows to a techno track. The model was even able to create a short claymation-style explainer on protein folding. But as with other video and image AI models, there are obvious concerns about abuse, including deepfakes and misinformation. Google says the model was developed with input from its internal safety, security, and responsibility teams. The model also underwent a range of evaluations including testing with specialists outside the model development team to help ensure it follows safety policies and produces desired outcomes. Ethics and safety reviews were also conducted ahead of its release. Additionally, Google says content created or edited with Omni will carry an invisible SynthID digital watermark which is meant to make it easier to verify whether content was generated using the model.
[10]
Google's newest Gemini Omni model can turn real videos into surreal fever dreams
You can also use it for free to create Remixes using YouTube Shorts. Video generation has been one of the most compelling creative uses of AI. Among the platforms that have helped fuel the phenomenon is Google's Veo, especially Gen 3, which has proven incredibly powerful at creating entire scenes with consistent elements and nearly perfect lip-syncs. While Veo 3 (and newer 3.1) has been limited to creating purely AI-generated videos with text and audio, Google is introducing a new model at Google I/O 2026 that goes a step further by letting you modify real-life footage into spectacular clips.
[11]
Gemini Omni, the 'create anything' model, starts today with lifelike video
Google has unveiled Gemini Omni, a new family of generative models designed to "create anything," and you can use it today to create surprisingly realistic videos. Something Google has been working on in recent years is a "world model" that can maintain a cohesive, grounded world. The company explored the idea through its Genie model, which generates interactive video-game-esque experiences based on user prompts. Google has also long offered the Veo and Nano Banana models that bring capable video and image creation/editing via text and image inputs. As part of I/O 2026, Google revealed Gemini Omni, a new model which leverages a similar level of multimodal understanding grounded in reality. While Omni is currently only designed to generate video content, it is presented as being designed to "create anything from any input." This means bringing together text, images, video, and audio (initially limited to speech samples) to create a unified output video. After generation, you can further refine your video in subsequent turns. Google's initial demos for Omni are quite impressive, showing how Gemini understands each of the elements in the final video. The rolling marble video is a great example, with believable physics for the ball and convincing sound effects for each bounce and the bell ring. Another demo presents a claymation-style video explainer of how protein folding works. Unlike the Genie model, which is still exclusively available to those paying for an AI Ultra subscription, Google is positioning the Gemini Omni series to be broadly accessible. The first model in the series, Gemini Omni Flash, is available now to all subscribers of AI Plus and higher. Or if you want to share your creations with the world, Gemini Omni will be available for free through YouTube Shorts and YouTube Create later this week. A higher-level model, "Omni Pro," was also teased, with details coming soon. Given the significant sense of realism presented, the company is taking several measures to ensure videos are generated responsibly. Taking a cue from OpenAI's recently discontinued Sora app, Gemini Omni will allow you to create a bespoke "Avatar" of yourself to be featured in the videos you create. Otherwise, Omni will not initially be able to edit audio and speech in videos until Google can "bring this capability to users responsibly." As another safety measure, all videos created by Gemini Omni will be watermarked with SynthID to be readily identified as AI generated.
[12]
Gemini Omni Flash can create and edit videos with your voice and it feels like the future of multimodal AI
Gemini Omni Flash sounds like it'll be an essential new AI content creation tool Google has already thrown its hat into the AI image generation and editing space with its Nano Banana AI image generator powered by Gemini. Nano Banana, which is now in its second iteration, has been used to help millions of users create professional-grade images through their text descriptions and image prompting, plus make adjustments to those visual creations via reference images that guide the process. Now Google has released another impressive piece of AI-assisted visual generation tech that promises to create videos from all manner of inputs, which include text, images, audio or even videos themselves. Alongside the ability to create new visuals based on those, you can also edit your newly generated videos through simple or complex conversations with Gemini Omni Flash. Here's a deeper look at what Google's latest AI video generator is capable of and how you can use it. Generating videos through a variety of inputs and editing them through chats Gemini Omni Flash is a multimodal AI video generation tool that promises to create videos from any voice references you mention about images, text or audio or just rely on input references. Your voice references can be used to describe the visual language needed to birth your visual creations or you can implement input references such as images of characters, scenes or drawings to apply the styles, motion or effects you want in your video clip. Google points to Omni Flash as being a highly intelligent AI video creation tool that can build incredibly lifelike scenes and work on its own to produce what happens next in each clip it generates. Its cohesive knowledge of gravity, kinetic energy and fluid dynamics helps it produce more realistic-looking scenes. Omni Flash is also intelligent enough to incorporate Gemini's understanding of language, imagery and meaning to create short or lengthy explainers from your vocal prompts. On the video editing front, Omni Flash can listen to your vocal instructions and build upon each one to craft your newly created video any way you see fit. You can speak to the AI video generator to change certain aspects of your video or change everything in it altogether. You can also tell Omni Flash to edit a video you recently shot by altering what's actually happening in it, adding in new objects or characters or completely transforming a specific moment into something else entirely. Google is already rolling out the first member of the Omni family to the Gemini app, Google Flow and YouTube Shorts. And as a starter project, you can generate videos with your voice by using Google's AI "Avatars," which is a digital version of yourself that uses your voice to sound just like you. Bottom line Gemini Omni Flash's multimodal AI capabilities sound impressive, especially since you're able to take full advantage of its video generation capabilities by simply voicing your commands. Building worlds, swapping in/out all forms of human and animal characters, switching up visual styles, altering certain details and more come together to offer one of the most powerful AI video generation tools we've ever seen. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Subscribe to Tom's Guide on YouTube and follow us on TikTok.
[13]
Google unveils Gemini Omni 'any-to-any' AI model: what enterprises should know
Although it was already discovered by intrepid AI power users weeks ahead of the official unveiling today at Google's annual I/O developer conference, the company's new Gemini Omni model marks a significantly new paradigm in the wider AI and tech marketplace. That's because as its "omni" (from the Latin omne -- meaning "all") prefix would suggest, this is Google's first truly native, multimodal model, that is "a model that can create anything from any input -- starting with video." The model marks Google's bid to collapse the multimodal generative stack -- text-to-image, image-to-video, video-to-video, audio generation -- into a single foundation model with a single editing surface. The big question for business leaders is: should you switch any of your own AI stack over to Gemini Omni now? Unfortunately, the truth is, you may not be able to just yet -- the model is only available to individual users through Google's AI subscription plans starting with the $20 per user per month "AI Plus" plan. While the company says it is ultimately going to be available via an application programming interface (API) -- which many enterprises rely on for their AI needs -- it's not ready yet. But, given the capabilities and faster editing enabled by the new Omni model, individual members of your team should probably give serious consideration to switching over to it, especially if they work creating visuals for technical diagrams, marketing and comms materials, training and corporate education courses, sales collateral, and basically anything that involves visuals. What Omni actually is Omni is the next chapter of the work that produced Nano Banana, the image-generation and editing model Google shipped roughly a year ago. The first model in the family, Gemini Omni Flash, accepts any combination of text, images, audio, and video as input and produces high-quality output across the same modalities -- all from a single model rather than a relay of specialized systems. Google says the model is "natively multimodal from the ground up," which matters less as marketing copy than as an architectural claim: a unified model can reason across modalities in the same forward pass, which generally translates into more coherent edits, fewer pipeline artifacts, and a far cleaner API surface for developers. OpenAI started this trend back in May 2024 with the release of GPT-4o, its first natively "omni" model, also trained from the ground-up to be able to analyze and generate multiple different types of content, from text to code, imagery, and audio. However, it did not support video generation, and the model was eventually deprecated following reports of sycophancy and even users demanding OpenAI retain it after developing parasocial relationships with it. Is Gemini Omni at risk of sparking a similarly devoted following? It remains to be seen. One big difference is that its headline interaction pattern is conversational video editing. Each instruction "builds on the last," and past directions persist across turns so the video evolves coherently as the user iterates. Practical examples Google highlighted include changing the world inside a clip, reimagining an action or camera angle, refining sequences over multiple turns, and generating explainer-style content from short prompts. Google also emphasizes improved physics -- gravity, kinetic energy, fluid dynamics -- which is the kind of detail that separates "looks like AI video" from "looks like footage." Rollout, pricing, and the API question The first thing enterprise leaders should read carefully is the rollout plan. Omni Flash is going live today inside the Gemini app for U.S. subscribers across AI Plus, AI Pro, and AI Ultra tiers -- including the new $100-per-month AI Ultra plan Google announced at the same event. Google says it will roll out to developers via Vertex AI APIs "in the coming weeks." That gap is significant. Until the Vertex API is generally available, Omni is effectively a consumer and prosumer tool. Enterprise pilots beyond individual seat-based experimentation should wait for the API, both because that's where Google's enterprise SLAs and data-handling commitments live, and because production-grade generative video without a programmatic interface is a non-starter. Its pricing through the API per million tokens (presumably) will also determine its viability as an enterprise product outside of film/TV/entertainment and the arts productions. For decision-makers weighing seat economics in the meantime, the new AI Ultra tier is positioned specifically at developers, technical leads, knowledge workers, and advanced creators, with priority access to Google Antigravity, higher usage limits, and bundled Omni Flash access. For small creative teams under tight deadlines, that may be the fastest way to evaluate the model before the API arrives. The enterprise use cases that really matter It is easy to default to "marketing video" as the use case, but Omni's value proposition for enterprises is broader if you think of it as a programmable video and media engine rather than a creative app: * Sales and marketing: rapid generation of variant ads, localized creative, and product demos without per-asset agency cycles. * Internal communications, learning and development (L&D): explainer videos, onboarding modules, and policy walkthroughs produced by non-specialists. * Customer support and documentation: dynamic, query-conditioned visual explainers attached to help articles. * Product and engineering: visualization of simulations, UI walkthroughs, and concept videos for spec reviews. * Field operations: short, situation-specific instructional clips generated on demand. What changes with Omni versus the previous generation of tools is the unification. Many enterprises stitched a workflow together from text-to-image, image-to-video, lip-sync, and voice models, each with its own contract, billing, and data path. A single Vertex AI-backed model collapses procurement and observability into one place -- assuming the eventual API delivers production-grade throughput and latency. The governance story is the most underrated part For CIOs and CISOs, the most important section of Google's announcement is not the model card; it is the provenance and content-safety work shipping alongside it. Every video generated by Omni carries Google's SynthID digital watermark. Google is expanding C2PA Content Credentials across its generative tools, and launching an AI Content Detection API on Agent Platform that lets businesses identify AI-generated content from both Google and other popular models. Partner integrations announced at the same event -- including Shutterstock, Avid (in Pro Tools), and at least one major newswire -- indicate where the standard is going. For enterprises, this matters in three concrete ways: There is also a "Personal Avatars" program that lets creators record short videos to authorize use of their voice and likeness across generated content, as Google leaders and employees showcased themselves today in posts centered around I/O featuring their AI generated likenesses. This puts it in direct competition with Synthesia, a UK-based AI unicorn focused primarily on enterprise-safe AI videos and avatars. For enterprises considering executive videos, training avatars, or branded spokesperson content, the consent model here is the right starting point -- but contracts and rights-management policies will need to extend to cover it. Risks worth flagging Omni's main risks are familiar but worth restating. The competitive landscape is crowded with the aforementioned Synthesia, TikTok parent company ByteDance's acclaimed Seedance model, Kuaishou Technology's Kling AI models, and the fast-improving open-source field all compete for the same workflows. Lock-in to any single video model is a real concern when output quality is still leapfrogging quarterly. Latency and cost for production-volume video generation remain unproven outside controlled demos. In addition, the legal status of training data for generative video is unsettled in multiple jurisdictions; enterprises should require clear indemnification language before deploying generated video into customer-facing channels. Furthermore, VentureBeat collaborator and AI YouTuber Sam Witteveen, CEO of enterprise machine learning vendor Red Dragon AI, received early access to Gemini Omni and reported the content restrictions (which some deem to be censorship) to be quite strict, potentially restricting and inhibiting all the potential use cases an enterprise would like to pursue. Thoughts for enterprises considering adoption Omni is worth piloting -- but the structure of the pilot matters. For most enterprises, the right move over the next 30 to 60 days is to fund a small, sanctioned experiment with one or two AI Ultra seats in marketing or L&D, while the platform and security teams use that runway to prepare for the Vertex AI API: define data-residency requirements, set up SynthID and C2PA verification in the content pipeline, and stand up the AI Content Detection API alongside existing media-governance tooling. Treat the consumer rollout as a UX preview, not a production plan. When the API arrives, the enterprises that have already done the governance work will be the ones moving Omni into real workflows while everyone else is still drafting policy. Omni is not, by itself, a reason to overhaul an enterprise AI strategy. But it is a strong signal that the multimodal generative stack is consolidating into single models with first-party provenance baked in -- and that is a shift technical decision-makers should be planning around now.
[14]
Google debuts new Omni world model at Google I/O with advanced AI video capabilities
Google just unveiled a brand new AI world model at Google I/O 2026 called Gemini Omni. While Google calls Gemini Omni a "new model that can create anything from any output," its showcase focused on the AI model's video-generation capabilities. The first release within the Omni AI model family is called Gemini Omni Flash. So, what's so special about Gemini Omni Flash compared to other AI video tools and Google's Genie world model? During a presentation at Google I/O, Google DeepMind CEO Demis Hassabis described Omni as a crucial step toward AGI. Hassabis said that in the future, Omni would be able to output "anything" the user wanted. Unlike text-to-video tools like Google Veo, Gemini Omni is multi-modal in both input and output. That means users can input text, audio, images, and video into the model, and Omni will generate a unique, interactive world utilizing Gemini's "real-world knowledge." Google says Omni will be able to generate videos with more accurate physics and, in turn, create more realistic-looking content. Gemini will also be able to understand the context of a prompt, such as a historical fact, to generate more accurate video content. In addition to video generation, Google says users can also edit videos through conversation with the Omni model. Google showcased some samples of what editing through Omni can do at I/O. For example, users can take a video they shot or an AI-generated video and edit specific aspects of the clip. If a user likes a shot but wants to change the background, they can do so with Omni. Omni can take a video and change the style, the angle, the scenery, or even a specific detail in the clip. With Gemini Omni Flash, users can create their own digital likeness through Avatars. However, Google says it is still testing this feature to ensure a responsible launch. All videos generated with Omni will be embedded with the SynthID watermark, allowing them to be verified as AI-generated. Google is rolling out Gemini Omni Flash today to paid Google AI Plus, Pro, and Ultra subscribers within the Gemini app and Google Flow. Later this week, Gemini Omni Flash will also launch in YouTube Shorts and the YouTube Create app at no cost to users.
[15]
Google's Gemini Omni is an all-purpose content generator that wants to replace your entire studio
Google's new Gemini Omni model launched at I/O 2026 promises to put a full AI video studio in your pocket, free for Shorts users, and surprisingly capable for everyone else. Google just walked into the video creation space, flipped the table, and handed everyone a powerful content creation tool, with no former camera or editing experience required. Announced at Google I/O 2026, Gemini Omni is the company's most ambitious AI model yet. It doesn't just generate video from text, but from anything like sketches, voice notes, shaky phone footage, a picture of your dog, and turns it into a polished, coherent video. Recommended Videos Google's own tagline? "Create anything from any input." Bold, and for once, not entirely hollow. So what actually makes Omni different from other AI video generators? Until now, AI video generators felt mostly fragmented. Some excelled at visuals but struggled with audio, while others can't keep characters or environments consistent between edits. That is the gap that Gemini Omni promises to bridge with continuity and conversation. Since the tool allows you to edit or create videos with voice-based inputs sent to Gemini, it always remembers the previous instructions, which, in practice, should keep the characters and story consistent across scenes. It's like having a conversation with your video editor and getting videos edited with much more creative liberty. Omni can also adjust physics-aware details like lighting, motion, and environment, without the entire footage falling apart. It even understands gravity and fluid dynamics. Who actually gets access, and what's the catch? Gemini Omni Flash is rolling out right now. YouTube Shorts users get it completely free, but how it actually works in practice is something that I'm yet to find out. For the Gemini app and Google Flow, you'll need an AI Plus, Pro, or Ultra subscription, starting at $7.99 per month. Enterprise API access arrives in the coming weeks. Every video created via Omni Flash gets SynthID watermarked invisibly. Whether that's enough to stop misuse is a separate, much longer conversation. For now, Google has handed creators a genuinely powerful tool, and I have a feeling that the content landscape is about to get very loud. Google has been playing catch-up in generative video for two years. Veo was capable but clunky, a text-to-video tool in a world that had moved on to full creative pipelines. Gemini Omni is the course correction: a unified model that handles the whole workflow.
[16]
Google Unveils Gemini Omni -- A Next-Gen AI Video Builder That Can 'Simulate the World' - Decrypt
Gemini Omni Flash is launching first through Flow and Flow Music for Google AI subscribers. Google on Tuesday introduced Gemini Omni, a new multimodal AI model that combines the company's Gemini AI models with its media-generation tools, including Veo, Nano Banana, and Genie. The announcement came during Google I/O 2026, where DeepMind CEO Demis Hassabis described Gemini Omni as "our new model that can create anything from any input." "It combines Gemini's intelligence with the best of our generative media models for a new level of world understanding, multimodality, and editing," Hassabis said. Google said the first release, Gemini Omni Flash, will launch through Flow, the company's AI filmmaking platform, and Flow Music, which focuses on AI-assisted music creation. Calling Omni a "step towards artificial general intelligence," Hassabis said Google has spent the past year extending Gemini into "a world model AI that can understand and simulate the world." Google's Omni rollout builds on the popularity of Nano Banana, the company's earlier AI image-editing model that helped push Gemini to the top of Apple's App Store last September. Nano Banana became widely used for meme generation and conversational image editing, briefly helping Gemini overtake ChatGPT in app downloads and Google search interest for the first time since OpenAI's chatbot launched in 2022. In Decrypt's comparison earlier this month, Nano Banana 2 outperformed OpenAI's GPT Image 2 in anime illustration and spatial composition tests, while OpenAI's model performed better with photorealism and text rendering. Google now appears to be extending many of those editing features into video through Gemini Omni. During the presentation, Google demonstrated Omni generating a claymation-style educational video explaining protein folding. The company also showed conversational editing tools that modified a selfie video by adding new visual elements and changing the surrounding environment. Google says Omni can keep the same characters, backgrounds, and movement consistent even after users make changes to a video -- something many AI video models struggle with. The company also says Omni uses Gemini's reasoning abilities to understand broader instructions, so users can describe the kind of scene they want without manually explaining every detail. The company also introduced Flow Agent, an AI assistant integrated into Google Flow that can brainstorm scenes, organize assets, recommend plot changes, and batch-edit projects. Additional updates include Flow Tools, which allows users to create custom editing workflows using natural-language prompts without coding experience. Hassabis said Google is starting with video generation, but plans to expand access to Omni, describing it as the long-term vision behind Gemini's multimodal design. "This was always our goal with Gemini, and why we built it to be multimodal from the very start," he said.
[17]
Gemini 'Omni' Will Generate Media From Any Input, Starting With Video
The model can "create anything from any input," according to the company. There are a flurry of AI-related announcements coming out of Google I/O 2026 today, but perhaps the most impressive is a new multimodal model called Gemini Omni. While it's launching as a video generator to begin with, it'll eventually be able to incorporate images and audio too, on both the input and output side. The idea is you can remix different audio, images, and video into a completely new clip, via a custom prompt. Right now, you can only generate videos from text prompts and images within Gemini, so you're getting the added ability to combine audio clips and existing videos too when generating something new -- multiple sources for input, and then an output that Google promises is better than ever in terms of realism and accuracy. While image and audio generation is on the way, the ability to create videos is coming first, with a model called Gemini Omni Flash. The example Google gives is picking a few styles from images in your phone's gallery, and then applying them to an existing video: So if you wanted to, you could make a video of you in the real world look like a Pixar animation. You can also edit your videos through "conversation," says Google. That conversation aspect will be familiar to anyone who already uses Gemini to make videos: You just explain what it is you want to see, and Omni takes care of it. You can use follow-up prompts to change something specific about the video, like an object or color, or to create your very own reshoots of the scene where the action changes. You can also change the angle or the environment of a video -- transporting yourself from a bedroom to a beach scene, perhaps. Google says you can take multiple turns to refine your videos, while still being able to get back to the original clip. Google says Gemini Omni uses "an intuitive understanding of physics" together with "Gemini's knowledge of history, science, and cultural context" to make videos as realistic and as consistent as possible -- though I'll have to try this out for myself to see if this all works as well as Google says it will. Omni now comes with a better understanding of forces like gravity, kinetic energy, and fluid dynamics, so there should be less AI weirdness on show. As well as building scenes, Google says, Gemini Omni reasons about what should happen next. AI videos can often collapse because they're trying to follow patterns from the vast number of videos in their training data, rather than follow the laws of physics. If a person disappears off-camera, they won't necessarily still be there when the camera pans back. Google claims Gemini Omni will show fewer issues like this. To protect against deepfakes, Google is putting some limits on video creation. For now, you'll only be able to use your own voice and a digital avatar based on you to generate outputs. In addition, all videos will carry Google's invisible SynthID watermark that indicates the content is AI-generated. Gemini Omni Flash is rolling out now in the Gemini app and Google Flow, for Google AI Plus, Pro, and Ultra subscribers. It's also going to be available for free in YouTube Shorts and the YouTube Create app later this week. At the time of writing, there's no word on usage limits. At the moment, those on a Google AI Plus plan ($7.99 a month) can generate two videos a day using the Veo 3.1 Lite model. It remains to be seen how generous Google is with Gemini Omni generations -- it looks like they take up a fair amount of AI processing power.
[18]
Google targets AI agents and video generation with Gemini 3.5 Flash and Omni - SiliconANGLE
Google targets AI agents and video generation with Gemini 3.5 Flash and Omni Google LLC today introduced two new generative artificial intelligence models that push its Gemini family further into AI agents and multimodal creation: Gemini 3.5 Flash, a fast reasoning model designed to power agentic workflows, and Gemini Omni, a creative model that can generate and edit video from nearly any input. Gemini 3.5 is the newest generation of Google's flagship model family, combining frontier intelligence with tool use. This version provides the scaffolding for building reasoning agents and begins with the release of Flash, the smallest and most nimble model in the series, which balances high speed with high performance at low cost. According to Google, Flash 3.5 is designed to outperform Gemini 3.1 Pro on challenging benchmarks such as Terminal-Bench 2.1, GDPval-AA and MCP Atlas. The company added that it also exceeds other frontier models on the market in speed, running four times faster than the fastest in the industry. Flash 3.5's speed and performance enable it to handle the long-horizon tasks required for AI agent work. When coupled with the new update to Antigravity, the company's agentic coding editor, the new large language model becomes a powerful AI engine capable of orchestrating multiple agents that collaborate at scale to solve complex problems. The company also released a new personal assistant named Spark. Google said it built 3.5 Flash to act as the "brain" that can help people navigate their lives and take actions on their behalf. It is rolling out to trusted testers today. The same model has also become the default for the Gemini app and AI Mode in Search globally. Today, Google introduced Gemini Omni, bringing the company's flagship large language model reasoning to the ability to create anything from any input, starting with video. The company said that with Omni, users can combine images, audio, video and text as input, and it will generate videos using Gemini's real-world knowledge to produce high-fidelity output. Users can then use conversation to iterate on and edit those videos. The first model in the new family, Omni Flash, will be available starting today in the Gemini app, Google Flow and YouTube Shorts. Google said that using Gemini Omni Flash, users can start with whatever formats they like to produce wild but lifelike videos. That means they can take an image or a video and insert themselves into it. They could also take a short video and change the style from realistic to cartoon or anime, or make it look as though they were walking through a Renaissance painting. Every conversation with the model layers changes and transformations according to the last request. This allows users to change specific details or broader visual elements. The model also takes into account the physics and consequences of requests, allowing users to change the environment, angle, style and action, as well as add new characters, objects, details and more. The company stressed that it's dedicated to developing AI responsibly and is designing policies to protect users from harm involving the use of its AI tools. In line with this, it's incorporating SynthID, an imperceptible watermark that identifies videos generated by Omni and other AI sources.
[19]
Google launches Gemini Omni for multimodal video creation
Google announced the launch of Gemini Omni, a new model designed to create content from a variety of inputs, with an initial focus on video. The first version, dubbed Gemini Omni Flash, is rolling out today to users of the Gemini app, Google Flow, and YouTube Shorts. According to Google, Gemini Omni is considered "the next step" beyond its previous models, including Nano Banana and the existing video generator, Veo 3.1. The model enables users to combine images, audio, video, and text as input to generate high-quality videos that are grounded in advanced real-world knowledge. Editing capabilities allow users to modify videos through natural conversation, building upon previous instructions for consistency in characters and elements. This contrasts with Veo 3.1, which was restricted to generating video content based solely on prompts and images. Gemini Omni Flash allows users to shoot a video and then request modifications, transforming their initial content into something new. Google stated, "Your video becomes a starting point for something you never could have filmed yourself," indicating that users can alter actions, add characters, and change settings seamlessly. Video: Google The model better understands physical principles such as gravity and kinetic energy to generate more realistic scenes. Gemini Omni integrates knowledge from various domains, including history, science, and cultural context, to enhance storytelling within the generated content. The application can produce visual explainers from simple prompts to simplify complex ideas. However, initial audio features will support only voice references. Gemini Omni also includes functionality to create a digital avatar based on the user's appearance and voice. Google emphasized that it has established "clear policies to protect users from harm" while utilizing its AI tools. Editing features for modifying audio and speech are currently still under testing. All content generated with Gemini Omni will incorporate Google's imperceptible SynthID digital watermark for verification purposes. Users have expressed concerns over the "uncanny valley" effect seen in output quality from Veo 3.1, and it remains to be seen if Gemini Omni's results will alleviate these issues. Gemini Omni Flash is now accessible to Google AI Plus, Pro, and Ultra subscribers globally, with rollouts to users of YouTube Shorts and the YouTube Create App anticipated to begin this week.
[20]
Google unveils Gemini Omni to push AI beyond chatbots into full-scale video creation
Google has launched Gemini Omni, a multimodal AI model focused on cinematic video generation and conversational editing. The platform can create videos using text, images, audio and video prompts with improved realism and context awareness. Google is positioning Omni as its next big push in AI-powered content creation. Google has introduced Gemini Omni, a new family of multimodal AI models aimed at transforming how users create and edit video content, marking the company's latest push to expand artificial intelligence beyond text-based assistants and into full-scale creative production workflows. The first model in the lineup, Gemini Omni Flash, is designed to generate cinematic videos using combinations of text, images, audio and video prompts. Unlike traditional AI video tools that largely rely on isolated prompts, Google says Omni can reason across multiple forms of input simultaneously to produce more coherent and context aware outputs. The launch comes as competition in generative AI rapidly intensifies, with companies racing to build platforms capable of handling increasingly complex creative and enterprise tasks. AI-generated video has emerged as one of the fastest-growing segments within the broader AI ecosystems, attracting interest from creators, marketers, studios and enterprises seeking faster and more scalable production pipelines. A key feature highlighted by Google is Omni's conversational editing capability. Users can modify videos through natural language instructions such as changing environments, adjusting camera movements, adding visual effects or transforming artistic styles while maintaining continuity across scenes. The system also supports interactive editing allowing users to refine outputs across multiple prompts without restarting the workflow. Google says the model demonstrates stronger "world understanding", enabling more realistic rendering of motion, lighting and environmental interactions. The company claims the system better interprets concepts such as gravity, movement and spatial consistency, areas that have traditionally remained challenging for generative video models. Gemini Omni also builds on the momentum created by Google's widely discussed AI image model "Nano Banana" officially known as Gemini Flash Image. The tool gained traction for its conversational image editing capabilities, allowing users to generate and modify visuals using natural language prompts while maintaining character consistency and realism. Industry observers view Gemini Omni as Google's attempt to extend the same intuitive creative workflow from static images into full scale video generation and editing. The platform additionally supports reference-based generation, allowing users to upload sketches, images, existing footage or audio clips that can then be transformed into stylised or photorealistic videos. To address concerns around synthetic media and deepfakes, all the AI-generated videos created through Gemini Omni will include Google's SynthID watermarking technology. Gemini Omni Flash will initially roll out across the Gemini app, Google Flow, YouTube Shorts and YouTube Create, with developer and enterprise API access expected later. As AI competition increasingly moves beyond chatbots and search into creative production, Gemini Omni signals Google's ambition to become a major player in AI-powered media creation. With conversational video editing, multimodal generation and tighter integration across YouTube and Gemini products, the company is positioning AI not just as an assistant, but as a full-scale creative engine for the next phase of digital content creation. Nominate for ET AI Awards Disclaimer Statement: This content is authored by a 3rd party. The views expressed here are that of the respective authors/ entities and do not represent the views of Economic Times (ET). ET does not guarantee, vouch for or endorse any of its contents nor is responsible for them in any manner whatsoever. Please take all steps necessary to ascertain that any information and content provided is correct, updated, and verified. ET hereby disclaims any and all warranties, express or implied, relating to the report and any content therein.
[21]
Forget Seedance: Why Google's Gemini Omni is the Future of AI Video
Google's latest AI video model, Gemini Omni, introduces a new level of creative flexibility for video production. As highlighted by AI Master, this model uses features like multimodal inputs and conversational editing to simplify complex tasks. For instance, users can animate static images or apply natural language commands to make precise edits, such as adjusting lighting or adding slow-motion effects. While its current limitations, like a 10-second video cap, may restrict certain projects, Gemini Omni's ability to simulate realistic physics and create lifelike avatars makes it a compelling option for creators across various fields. Dive into this overview to explore how Gemini Omni can enhance your workflow. Learn how its template library enables quick transformations, from retro-style videos to animated infographics and discover its potential for industries like education and marketing. You'll also gain insight into the ethical considerations addressed by its SynthID watermarking and the subscription tiers available for different user needs. This breakdown offers a clear look at what Gemini Omni can do today and where it might evolve in the future. Gemini Omni is equipped with innovative functionalities that cater to a wide range of creative needs. These features distinguish it from other tools in the market: These features collectively make Gemini Omni a versatile and user-friendly tool for video creation. Gemini Omni includes an extensive library of templates and presets, allowing users to create professional-quality videos quickly and efficiently. These tools are designed to accommodate a variety of styles and purposes: These resources save time while allowing users to maintain creative control, making it easier to produce polished and visually appealing content. Deep dive into the latest in Google Gemini by exploring our other resources and articles. Gemini Omni's versatility makes it a valuable tool for a wide range of industries and use cases. Here are some practical applications: These capabilities make Gemini Omni a powerful asset for content creators, educators, marketers and professionals seeking to elevate their projects. While Gemini Omni offers impressive features, it is important to consider its current limitations: These limitations highlight areas where the technology may evolve, offering potential for future enhancements. Gemini Omni is available through tiered subscription plans, providing flexibility to suit different user needs: For users who prefer to explore the tool before committing, limited functionality is available for free through platforms like YouTube Shorts Remix and YouTube Create. These options offer a glimpse into AI-powered video creation, albeit with fewer features compared to the paid plans. Gemini Omni is designed to cater to a diverse audience, making it an ideal tool for: Its multimodal input capabilities, creative presets and user-friendly interface make it particularly appealing to those who prioritize flexibility and speed in their creative processes. Google's Gemini Omni represents a significant advancement in AI-driven video technology. By combining multimodal inputs, realistic physics simulations and conversational editing, it offers a unique and powerful tool for content creation. While it has limitations, such as short video lengths and the lack of audio editing, its potential applications across industries are vast. Whether you are crafting a social media clip, developing an explainer video, or creating educational content, Gemini Omni provides the tools to bring your vision to life with efficiency and precision. Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
[22]
Testing Google's New Video Generation Model | TechPulse
At Google I/O 2026, Google introduced its new Omni AI model for video generation, aimed at making AI-powered video creation more accessible to users. Available for Plus and Pro subscribers, the feature supports image-to-video generation and video-to-video animation conversion directly inside the Gemini app.We tested the new capabilities using the Gemini 3.5 Flash model and tried both image-to-video and video-to-video workflows. The results were impressive overall, delivering smooth animations and strong visual output, with only a few minor hiccups during testing.Watch our hands-on demo to see how Google's latest AI video tools perform in real-world use cases.
[23]
Google Launches Gemini Omni to Transform Text, Images and Audio Into Cinematic AI Videos
Google has unveiled Gemini Omni, a next-generation multimodal artificial intelligence model family designed to revolutionize how users create, edit and interact with video content using natural language. Announced during Google I/O 2026, Gemini Omni represents one of Google's most ambitious steps yet toward building AI systems capable of understanding and generating content across text, audio, images and video simultaneously. The company describes Gemini Omni as a foundational "world model" capable of blending advanced reasoning with cinematic-quality multimedia generation, positioning it as a major leap in the evolution of generative AI and intelligent media production. Gemini Omni Brings Conversational AI to Video Creation Unlike traditional video editing platforms that rely heavily on complex timelines, manual effects and technical workflows, Gemini Omni introduces a conversational approach to media creation. Users can simply type or speak instructions in natural language to generate or modify videos in real time. The AI system is capable of understanding context across multiple prompts while maintaining: * Character consistency * Camera continuity * Environmental details * Scene transitions * Visual style coherence Google says the technology allows users to perform advanced editing tasks that would traditionally require professional production tools and technical expertise. With Gemini Omni, users can: * Replace backgrounds instantly * Insert or remove characters * Modify objects within scenes * Apply cinematic effects and zooms * Change visual styles dynamically * Generate entirely new video sequences from prompts The company demonstrated how videos can evolve through continuous conversational interaction rather than traditional editing interfaces. Multimodal AI Powers the Next Generation of Video Production At the core of Gemini Omni is a fully multimodal AI engine capable of processing text, audio, images and video inputs simultaneously. Unlike earlier AI systems that convert visual or audio content into text before analysis, Gemini Omni directly interprets multiple media formats together, enabling more context-aware generation and editing. Gemini Omni Expands Google's Push Into AI-Generated Media Google confirmed that Gemini Omni Flash, the first implementation of the model family, is being integrated directly into: * The Gemini app * Google Flow * YouTube Shorts The company also stated that while the current focus is on video generation and editing, future versions of Gemini Omni will expand into direct image and audio generation capabilities. This positions Google more aggressively within the rapidly intensifying AI media creation market, where major technology companies are competing to develop next-generation generative video platforms. AI Video Creation Moves Beyond Traditional Editing Software One of the most disruptive aspects of Gemini Omni is its potential to simplify professional-grade video production. Google says the platform is designed to help enterprises, creators and businesses generate high-quality visual content without requiring expensive production infrastructure or advanced editing expertise. Potential enterprise use cases include: * AI-powered e-commerce virtual try-ons * Personalized marketing videos * Automated content localization * Interactive customer experiences * Dynamic advertising campaigns * Streamlined post-production workflows The company believes conversational AI-driven creation could significantly reduce the complexity and time associated with traditional video production pipelines. Sundar Pichai Highlights AI-Powered Creative Transformation Ahead of Google I/O 2026, Sundar Pichai previewed Gemini Omni's capabilities through demonstrations shared online, showcasing how users can transform uploaded footage through simple conversational prompts. Google Accelerates the AI Media Race With Gemini Omni, Google is positioning itself at the forefront of this transition, signaling a future where cinematic content creation may increasingly be driven through natural language conversations rather than conventional software interfaces. The company also hinted that Gemini Omni represents an early step toward larger long-term ambitions involving highly intelligent AI systems capable of understanding and generating rich multimedia experiences at human-like levels of creativity and contextual awareness.
[24]
Google's Gemini Omni could change how we create and edit video entirely
Google has introduced Gemini Omni, a new multimodal AI model designed to transform video creation and editing using advanced generative capabilities. Building on its earlier Nano Banana image tools, Omni extends similar functionality to video, allowing users to generate and edit clips through simple conversational prompts instead of traditional editing software. Google has officially introduced Gemini Omni, a new multimodal AI model focused on video generation and editing. The announcement follows weeks of speculation around the project, with references to "Omni" reportedly appearing for some users inside the Gemini dashboard just two days before launch. Omni builds on the work Google started with Nano Banana last year, which brought Gemini-powered image generation and editing tools to users. While Nano Banana focused primarily on still images, Gemini Omni expands those capabilities into video and broader multimodal generation. The first release in the lineup is Gemini Omni Flash, which is rolling out starting today across the Gemini app, Google Flow and YouTube Shorts. A major part of Omni's pitch is conversational video editing. Instead of using traditional editing tools, users can modify clips using natural language prompts. Google says edits can build on top of each other while maintaining consistency between characters, environments and motion across scenes. The company demonstrated examples where users could alter the action inside existing videos, transform environments or add entirely new visual elements. In one showcased prompt, a person touches a mirror, causing it to ripple like liquid while their arm gradually transforms into reflective mirror material. Google is also positioning Omni as more than just a visual generation model. According to the company, the system has been trained to better understand physical interactions such as gravity, motion and fluid dynamics, allowing generated scenes to behave more realistically. Gemini's broader knowledge model is also being integrated to help generate explainers and context-aware visuals rather than purely aesthetic clips. Omni supports multiple input types, including text, images, video and voice references. Users can combine different references together to guide the style, motion or structure of the final generated clip. Audio support is currently limited to voice references, with broader audio input capabilities expected later. Another feature being introduced is AI-generated Avatars, allowing users to create digital versions of themselves for video generation using their own voice. Google says broader speech editing capabilities are still being tested before a wider rollout. As with its other generative AI products, Google says all videos created with Gemini Omni will include SynthID watermarking for content verification and AI transparency. Gemini Omni Flash is launching first for Google AI Plus, Pro and Ultra subscribers globally through the Gemini app and Google Flow. The company is also bringing the technology to YouTube Shorts and the YouTube Create app at no additional cost starting this week. Google says developer and enterprise API access will arrive in the coming weeks.
Share
Copy Link
Google introduced Gemini Omni at its I/O developer conference, marking a significant step toward multimodal AI that can create videos from text, images, and audio. The AI content generation tool includes digital avatar capabilities and SynthID watermarking to combat deepfakes, but raises concerns about AI slop flooding social media platforms.
Google unveiled Gemini Omni at its I/O developer conference, introducing a multimodal AI model that CEO Sundar Pichai says will be able to "create anything from any input."
1
The launch represents a concrete step toward Google's three-year-old vision of building a single neural network trained on text, image, audio, and video that can generate content in any format. Unlike simple stitching tools, Gemini Omni reasons across all inputs to produce consistent outputs that demonstrate understanding of physics, culture, history, and science.
Source: ET
The timing of Google's announcement is notable, as it arrives after OpenAI discontinued both the Sora app and web experience last month to redirect AI computing power elsewhere.
4
Google is positioning Gemini Omni as more than just an update to its existing Veo video model. Nicole Brichtova, Google DeepMind director of product management, emphasized that "it's the next step towards the progression of combining the intelligence of Gemini with the rendering capabilities of our media models." The tool can generate video from text and images while incorporating advanced physics capabilities that accurately simulate forces like gravity, kinetic energy, and fluid dynamics.3
One of the most intriguing and controversial features allows users to create AI-generated video clips with digital avatars that look and sound like themselves.
4
To prevent deepfakes, users must complete a dedicated onboarding process that involves recording themselves speaking a series of numbers. The avatar then gets stored for future use, enabling creators to generate videos without appearing on camera themselves. Google is framing this as a tool for reimagining personal photos or videos by adding fictional AI elements, which might help sidestep potential legal battles that plagued OpenAI Sora.4

Source: Lifehacker
All videos created with Gemini Omni will include Google's SynthID digital fingerprinting technology, allowing users to verify whether content was generated via Gemini products. Google is also adding Content Credentials verification across its Gemini app to show whether content was created with AI or a camera, and whether it's been edited with AI.
2
This comes as CNET research found that 51% of US adults believe we need better AI labels online, and 94% believe they see AI-generated or altered content on social media.2
Only 44% say they can confidently distinguish real content from AI-generated photos and videos.The first model in the family, Gemini Omni Flash, rolled out to the Gemini app, YouTube Shorts, and AI creative studio Google Flow. Flash can render 10 seconds of video, which Brichtova clarified isn't a model limitation but rather a decision based on getting it into more hands and anticipating that most users won't want much longer videos yet. Longer video durations are planned for the near future. During a media briefing, DeepMind chief technologist Koray Kavukcuoglu demonstrated how Omni could quickly render a claymation explainer video about protein folding from a simple prompt, complete with accurate scientific voice-over.

Source: CNET
Related Stories
While Google is pitching Gemini Omni Flash as primarily a consumer tool for creating personalized content, the enterprise implications are substantial. Google will make Gemini Omni available via API in the coming weeks, enabling developers and enterprise customers to build custom integrations.
5
The model's text-rendering capabilities could prove particularly valuable for advertising, allowing marketers to place products or slogans seamlessly into generated videos. An end-to-end multimodal workflow could transform how advertisers and filmmakers approach content creation.Despite Google's technical achievements, consumer sentiment reveals significant hesitancy toward AI-generated content. CNET found that only 11% of people say AI content is useful, informative, or entertaining, while 21% believe there should be a total ban on AI-generated content on social media.
2
Critics argue that between Nano Banana Pro and Gemini Omni, Google appears to be creating a paradox—the same tech giant providing tools to create AI-generated content is also developing tools to verify it.2
The concern is that Gemini Omni will simply add to the growing volume of AI slop flooding social media feeds.Google considers Gemini Omni a critical step toward building artificial general intelligence and world models that can accurately simulate reality.
4
Pichai explained that "with world models, AI is moving from predicting text to simulating reality. Gemini Omni is the next step in that direction." The long-term vision extends beyond video generation to include generating images from audio or audio from video. Google is also working on an even more powerful Omni Pro model for future release.5
As the technology advances, questions remain about how society will navigate the tension between creative possibilities and concerns about authenticity, privacy, and the proliferation of synthetic media.Summarized by
Navi
[1]
[3]
12 May 2026•Technology

18 May 2026•Technology

11 Jul 2025•Technology

1
Technology

2
Business and Economy

3
Health
