10 Sources
[1]
Google's Gemini Omni turns images, audio, and text into video -- and that's just the start | TechCrunch
When Google launched Gemini three years ago, the goal was to build a multimodal large language model -- a single neural network that was trained on text, image, audio, and video and could generate content in any of those formats. Today, at its Google I/O developer conference, the company took a concrete step toward that goal with Gemini Omni, a new family of multimodal models that Google CEO Sundar Pichai says will be able to "create anything from any input." Omni will start with video. Users can now combine images, audio, video, and text, and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The result is high-quality videos that reflect an understanding of physics, culture, history, and science. Omni also lets users edit photos with plain text commands rather than complex editing software, similar to Google's Nano Banana. Google already has a dedicated video model, Veo, that lets users turn text and images into videos, and even direct and customize avatars. But Google DeepMind director of product management Nicole Brichtova says that today's release is more than a Veo update: "It's the next step towards the progression of combining the intelligence of Gemini with the rendering capabilities of our media models." One example that Koray Kavukcuoglu, DeepMind's chief technologist, gave reporters during a media briefing on Monday: When Omni was given a simple prompt like "a claymation explainer of protein folding," it quickly rendered a video of a stop-motion explainer with a voice-over that said, "Proteins start as chains of amino acids. They fold into patterns like the alpha helix and flat sections called beta sheets, forming a perfect three-dimensional shape." The long-term vision for Omni is broader, involving the model being used to do things like generate images from audio, or audio from video. "When we first announced Gemini, it was our first AI model to be natively multimodal," Pichai said during the briefing. "We knew that training it on a combination of text, code, audio, images, and video would give it a deeper understanding of the world. With world models, AI is moving from predicting text to simulating reality. Gemini Omni is the next step in that direction." As part of the release, users will also be able to create videos with their own digital avatars -- something OpenAI popularized on its now-defunct Sora app with Cameos. To prevent deepfakes, users will have to go through a dedicated product onboarding, which involves recording themselves and speaking out a series of numbers, per Brichtova. The avatar then gets stored for future use. Additionally, all videos created with Omni will include Google's SynthID digital watermark, which allows users to verify if videos were generated via the Gemini products. The first model in the family is Gemini Omni Flash, which will roll out today to the Gemini app, YouTube Shorts, and AI creative studio Flow. Flash will be capable of rendering 10 seconds of video, which Brichtova says isn't a model limitation, but rather a decision based both on a desire to get it into more hands and an anticipation that most users won't want to make much longer videos yet. Longer video durations are in the pipeline for the near future, though. Google seems to be pitching Omni Flash as more of a consumer tool. The examples Brichtova and Gabe Barth-Maron, a research engineer at DeepMind, gave on a call with TechCrunch of uses for digital avatars were all personal: Making a video of yourself winning an award or going to the moon, or removing a passerby from the background of a video you took on vacation. Barth-Maron put it more simply: "They're like personalized memes." "We definitely did focus on making this easy to use for consumers," Brichtova said. "Not many video models have breached that chasm with consumers, so this is our play to do that." The ease of use comes with a caveat: Brichtova and Barth-Maron noted that editing prompts will need to be highly specific, otherwise Omni risks over-editing or unintentionally altering elements the user wanted to keep -- a problem Nano Banana users would have run into. Despite the near-term consumer focus, Omni's enterprise and creative implications are obvious, and Google will make Omni available via API in the coming weeks. The avatar-generating tool -- a capability that is available today on Shorts -- is something Google expects content creators to pick up. But more broadly, an end-to-end multimodal workflow could be transformative for advertisers and filmmakers. Startup Luma AI is building something similar, an agentic tool that can generate an entire ad campaign based on a short brief and a product image, powered by its own "unified" model. "We're actually pretty proud of the model's text-rendering capabilities, which is really useful for things like advertising," Brichtova said. "If you want a product somewhere, or even just a slogan, it needs to be accurate ... We definitely anticipate filmmakers and other kinds of creators are going to be using this model as well." The more professional use cases might be better served by the Omni Pro model, which should perform better across all Omni tasks. Google hasn't said when it will release Pro yet, but Brichtova said that will happen when "we feel like we're at a point where we have a step change above Flash."
[2]
Google Introduces Gemini Omni, a Multimodal AI That Knows the World
Built on Gemini modeling architecture, Omni is a true multi-modal input and output system, allowing you to create videos from text, images and existing videos. At launch, you'll be able to create videos with the aforementioned inputs, but image; text generations will be available in a future update. With Gemini at its core, Omni can process and interpret multiple types of inputs to produce a consistent, sophisticated final product. Omni builds on Google's existing products by integrating Gemini Intelligence. The rise of AI-created videos comes at a paradoxical time as companies such as Google make incredible advances with the technology, while social media feeds become more filled with AI slop. Google considers Omni the "next big step" toward building AI that can model and simulate the real world. It's a world model with advanced reasoning, capable of generating videos grounded in the world we know today. Omni demonstrates advanced physics capabilities, enabling it to create realistic video outputs. Here's what's coming in Gemini Omni from Google I/O. As with its powerful video generation, Omni also has advanced video editing capabilities. If you create a video with Omni, you can feed it back into the tool, make impressive changes with just a prompt or incorporate additional media. You can even upload your own videos and change or swap out individual elements, allowing for a new way to edit videos that has essentially never been available before. That ability to fully replace elements in a person's video could lead to some dark outcomes, making Omni's advanced editing abilities as alarming as they are impressive. But Google has built-in guardrails. First, any output from Omni will automatically include Google's SynthID watermark, so you know that what you're viewing has been altered in some way by AI. This is a big deal, as Omni essentially lets you change how reality is perceived. People will be able to play with Gemini Omni in a variety of ways. It's a prominent feature within the newly redesigned Gemini app, where you can add built-in templates to your camera roll with a single click. Additionally, you'll be able to create a custom avatar that looks and sounds like you and add it to videos. For some paid subscribers, Omni will be available on Google Flow and YouTube Shorts, starting on Tuesday. Omni will roll out to developers and enterprise customers via APIs in the coming weeks, allowing for custom integrations. Like most Gemini models, Omni will be split into Flash and Pro versions, though the former will be available initially. Google is working on an even more powerful model, Omni Pro, which will become available in the future.
[3]
Google's new Omni AI tool will let you video clone yourself - I'm intrigued (and concerned)
Today, Google announced a new AI video capability that will either help creatives produce higher-quality videos more easily, or vastly increase the amount of AI slop on YouTube. I'm betting it'll be a mix of both. Google announced Gemini Omni, a tool that raises the ability to create video via AI to an entirely new level. The company compared this announcement to the level of AI image generation improvement that came about when it released Nano Banana. Also: I tested ChatGPT Images 2.0 vs. Gemini Nano Banana to see which is better Nano Banana raised the bar considerably on what was possible with image generation. Omni purports to do the same with video. Omni will be rolling out starting today, but I didn't have a chance to play with it prior to the announcement. Google described Omni as "where Gemini's ability to reason meets the ability to create." Interestingly, according to the company, "With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge." Although Omni is "starting with video," Google said the new model can "create anything from any input," so presumably we'll see other media types generated by the tool within due time. Omni will also be available in model tiers, starting now with Gemini Omni Flash. The capability is coming to the Gemini app, Google Flow, and YouTube Shorts. It's not clear whether the web version of Gemini will support Omni, or whether you'll need to use the Flow interface via your browser. There are some standout features that make this a very interesting offering. I honestly can't decide if this is going to be a standout feature, a very big concern for privacy, or an untethered slop generator. The company said you can create videos "with your own voice by using Avatars, which create a digital version of yourself so you can generate videos that look and sound like you." Also: I used Nano Banana 2 to make perfect sketchnotes: 5 lessons learned As a regular producer of YouTube videos for my channel, I'm intrigued. There have been times when I wanted to put out a video, but was having a bad hair day, a bad voice day, or a bad attitude day, and I just didn't want that to come across in video. Could I just feed a script into my digital twin avatar and have RoboDave do the talking? Would my audience notice? Would they care? Would they hate it? Would I? Clearly that's an area worthy of experimentation, but it's probably not something I'll use often. I do my YouTube channel, in part, to keep my speaking and presentation chops up. Foisting that work on a digital avatar might reduce my workload, but it would also reduce my training and practice. Google is very careful to say that it's incorporating its SynthID digital fingerprinting technology in these videos, so they can be verified as having been produced with Omni. Google also said, "Beyond the avatar feature, in terms of editing videos to change audio and speech, we are still working to test this and better understand how we can bring this capability to users responsibly." Some of you may remember the early days of video games, when characters behaved more like ragdolls than objects in the physical world. As games got better, they began to incorporate physics models, so if something got shot, knocked back, or dropped, it did so in a matter consistent with the physics of the object. Omni now incorporates physics into the videos it creates. Google said it has "an improved intuitive understanding of forces like gravity, kinetic energy, and fluid dynamics." It also uses Gemini's knowledge to "connect language, imagery, and meaning in ways that go far beyond pattern matching." The company said Omni can build detailed videos from short prompts and can generate videos for things like explainers that break down fairly complex ideas. I don't doubt this. The analysis capabilities of NotebookLM's audio overview and video overview to be able to create explainers are astonishing. If some of that technology found its way into Omni, things could get interesting quickly. Also: I run a very small business. Here are 21 simple ways AI saves me time every day I actually fed marketing documents and spec sheets into NotebookLM and it produced explainer videos for various features of my security product that were better than anything I could have done by hand, especially in the time it took. The visuals at the time weren't great, but having complex features explained in a clean video in under 30 minutes was a force-multiplier for my product release schedule. One of Nano Banana's early standout features was its ability to recontextualize an image. For example, I had it take a picture of me walking in a park and change it so I was wearing something close to an admiral's uniform on the bridge of an aircraft carrier. While it didn't get the uniform fruit salad and brass quite right, it did manage to accurately reproduce my body and face. Also: I turned casual selfies into professional headshots with Gemini Omni proposes to take that to video, turning image, text, video, or audio into a "cohesive output." Right now, the only audio it will accept is voice recordings, but the company said it'll "roll out other types of audio inputs soon." The company also said you can create scenes, match styles, describe what you want in natural language, and get character consistency throughout the video. One aspect of producing videos I do not enjoy is the editing process. It's often enormously tedious. But, with Omni, "Gemini Omni gives you an easier way to edit video - with natural language. Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before." Google also said you can change elements in the video. I can see a huge benefit if it's possible to import a video and have the editor remove obstructions or change objects and backgrounds. It's not clear how long a clip can be, or exactly how much editing you can do with Omni on a given plan, but those possibilities are exciting. Two other transformations the company said the new Omni can do are: Additionally, Google hasn't yet specified video format or resolution. Will this be a professional tool that can handle 16:9 videos in 4K or 8K resolution, or is it meant to be a tool for the YouTube Shorts generation? Also: Are Sora 2 and other AI video tools risky to use? Here's what a legal scholar says When OpenAI introduced Sora, it was a novelty. While users abused it (we gave Sam Altman blue hair and made him sing ZDNET's praise), it never managed to be a tool that helped a professional's workflow. While producing AI avatar clones and replacing objects might be fun, I'm hoping this capability is extended so that it's usable either inside Final Cut, Premiere Pro, and DaVinci Resolve, or at least integrated enough that those tools can use edits created by Omni. It's possible. Omni's features will be rolling out to enterprise customers and developers via a Google API. I'm also curious if Omni will embed the little diamond watermark in the corner of its videos, like it does with Nano Banana's generated images. While it's nice to know a clip was generated by AI, such watermarking gets in the way of using the AI as a professional tool. Will we see licensing tiers where the watermark can be removed? Or will we see third-party tools crop up that remove the watermark, whether Google wants you to or not? Time will tell. Would you use Google Omni to create a digital avatar of yourself for videos you didn't want to record in person? Let us know in the comments below.
[4]
Google's Gemini Omni can generate 'anything from any input,' starting with video - Engadget
Google didn't forget AI creators in its latest round of Gemini announcements. Google didn't forget AI creators in its latest round of Gemini announcements as part of Google I/O. The company just officially revealed Gemini Omni, a new model that can "create anything from any input -- starting with video," according to Google. The first model called Gemini Omni Flash is rolling out today to the Gemini app, Google Flow and YouTube Shorts. Google called Gemini Omni "the next step" up from Nano Banana and, presumably, its current video generator, Veo 3.1. It lets you "combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge," according to the tech giant. You can then edit those videos through natural conversation, with each instruction building on the last to keep characters and other elements consistent. Where Veo 3.1 was limited to video creations via prompts and images, Gemini Omni will accept a wider range of inputs and do a lot more. For instance, you can shoot a video, then just ask Omni to change what's happening. "Your video becomes a starting point for something you never could have filmed yourself," Google explained. "Edit the action, add in new characters or objects, or transform a moment into something unexpected. Change the environment, angle, style or even specific details." Omni also better understands physical forces like gravity, kinetic energy and fluid dynamics, so that scenes will be more realistic. It marries that with "Gemini's knowledge of history, science and cultural context, bridging the gap from photorealism to meaningful storytelling." The app can supposedly create compelling explainers from short prompts to generate visuals that break down more complex ideas. However, it will only support voice references for audio output to start. If you want to generate videos where you're the star, Omni lets you use your own voice to create a digital avatar that looks and sounds like you. If that sounds like a potential privacy nightmare, Google says it has "clear policies to protect users from harm and governing the use of our AI tools." As far as editing videos to change audio and speech, the company is still testing that function in order to bring it to users "responsibly." All videos will also use Google's imperceptible SynthID digital watermark to verify that videos were generated with Gemini Omni. All of that sounds great, but the main problem with Veo 3.1 and other video generator apps is that the video has an "uncanny valley" look, and is often hated by end users. To that end, it'll be interesting to see if the output quality matches Google's breathless claims. We'll find out soon, as Gemini Omni Flash is now available to all Google AI Plus, Pro and Ultra subscribers globally and rolling out to users of YouTube Shorts and the YouTube Create App starting this week.
[5]
Google's newest Gemini Omni model can turn real videos into surreal fever dreams
You can also use it for free to create Remixes using YouTube Shorts. Video generation has been one of the most compelling creative uses of AI. Among the platforms that have helped fuel the phenomenon is Google's Veo, especially Gen 3, which has proven incredibly powerful at creating entire scenes with consistent elements and nearly perfect lip-syncs. While Veo 3 (and newer 3.1) has been limited to creating purely AI-generated videos with text and audio, Google is introducing a new model at Google I/O 2026 that goes a step further by letting you modify real-life footage into spectacular clips.
[6]
Gemini Omni, the 'create anything' model, starts today with lifelike video
Google has unveiled Gemini Omni, a new family of generative models designed to "create anything," and you can use it today to create surprisingly realistic videos. Something Google has been working on in recent years is a "world model" that can maintain a cohesive, grounded world. The company explored the idea through its Genie model, which generates interactive video-game-esque experiences based on user prompts. Google has also long offered the Veo and Nano Banana models that bring capable video and image creation/editing via text and image inputs. As part of I/O 2026, Google revealed Gemini Omni, a new model which leverages a similar level of multimodal understanding grounded in reality. While Omni is currently only designed to generate video content, it is presented as being designed to "create anything from any input." This means bringing together text, images, video, and audio (initially limited to speech samples) to create a unified output video. After generation, you can further refine your video in subsequent turns. Google's initial demos for Omni are quite impressive, showing how Gemini understands each of the elements in the final video. The rolling marble video is a great example, with believable physics for the ball and convincing sound effects for each bounce and the bell ring. Another demo presents a claymation-style video explainer of how protein folding works. Unlike the Genie model, which is still exclusively available to those paying for an AI Ultra subscription, Google is positioning the Gemini Omni series to be broadly accessible. The first model in the series, Gemini Omni Flash, is available now to all subscribers of AI Plus and higher. Or if you want to share your creations with the world, Gemini Omni will be available for free through YouTube Shorts and YouTube Create later this week. A higher-level model, "Omni Pro," was also teased, with details coming soon. Given the significant sense of realism presented, the company is taking several measures to ensure videos are generated responsibly. Taking a cue from OpenAI's recently discontinued Sora app, Gemini Omni will allow you to create a bespoke "Avatar" of yourself to be featured in the videos you create. Otherwise, Omni will not initially be able to edit audio and speech in videos until Google can "bring this capability to users responsibly." As another safety measure, all videos created by Gemini Omni will be watermarked with SynthID to be readily identified as AI generated.
[7]
Google unveils Gemini Omni 'any-to-any' AI model: what enterprises should know
Although it was already discovered by intrepid AI power users weeks ahead of the official unveiling today at Google's annual I/O developer conference, the company's new Gemini Omni model marks a significantly new paradigm in the wider AI and tech marketplace. That's because as its "omni" (from the Latin omne -- meaning "all") prefix would suggest, this is Google's first truly native, multimodal model, that is "a model that can create anything from any input -- starting with video." The model marks Google's bid to collapse the multimodal generative stack -- text-to-image, image-to-video, video-to-video, audio generation -- into a single foundation model with a single editing surface. The big question for business leaders is: should you switch any of your own AI stack over to Gemini Omni now? Unfortunately, the truth is, you may not be able to just yet -- the model is only available to individual users through Google's AI subscription plans starting with the $20 per user per month "AI Plus" plan. While the company says it is ultimately going to be available via an application programming interface (API) -- which many enterprises rely on for their AI needs -- it's not ready yet. But, given the capabilities and faster editing enabled by the new Omni model, individual members of your team should probably give serious consideration to switching over to it, especially if they work creating visuals for technical diagrams, marketing and comms materials, training and corporate education courses, sales collateral, and basically anything that involves visuals. What Omni actually is Omni is the next chapter of the work that produced Nano Banana, the image-generation and editing model Google shipped roughly a year ago. The first model in the family, Gemini Omni Flash, accepts any combination of text, images, audio, and video as input and produces high-quality output across the same modalities -- all from a single model rather than a relay of specialized systems. Google says the model is "natively multimodal from the ground up," which matters less as marketing copy than as an architectural claim: a unified model can reason across modalities in the same forward pass, which generally translates into more coherent edits, fewer pipeline artifacts, and a far cleaner API surface for developers. OpenAI started this trend back in May 2024 with the release of GPT-4o, its first natively "omni" model, also trained from the ground-up to be able to analyze and generate multiple different types of content, from text to code, imagery, and audio. However, it did not support video generation, and the model was eventually deprecated following reports of sycophancy and even users demanding OpenAI retain it after developing parasocial relationships with it. Is Gemini Omni at risk of sparking a similarly devoted following? It remains to be seen. One big difference is that its headline interaction pattern is conversational video editing. Each instruction "builds on the last," and past directions persist across turns so the video evolves coherently as the user iterates. Practical examples Google highlighted include changing the world inside a clip, reimagining an action or camera angle, refining sequences over multiple turns, and generating explainer-style content from short prompts. Google also emphasizes improved physics -- gravity, kinetic energy, fluid dynamics -- which is the kind of detail that separates "looks like AI video" from "looks like footage." Rollout, pricing, and the API question The first thing enterprise leaders should read carefully is the rollout plan. Omni Flash is going live today inside the Gemini app for U.S. subscribers across AI Plus, AI Pro, and AI Ultra tiers -- including the new $100-per-month AI Ultra plan Google announced at the same event. Google says it will roll out to developers via Vertex AI APIs "in the coming weeks." That gap is significant. Until the Vertex API is generally available, Omni is effectively a consumer and prosumer tool. Enterprise pilots beyond individual seat-based experimentation should wait for the API, both because that's where Google's enterprise SLAs and data-handling commitments live, and because production-grade generative video without a programmatic interface is a non-starter. Its pricing through the API per million tokens (presumably) will also determine its viability as an enterprise product outside of film/TV/entertainment and the arts productions. For decision-makers weighing seat economics in the meantime, the new AI Ultra tier is positioned specifically at developers, technical leads, knowledge workers, and advanced creators, with priority access to Google Antigravity, higher usage limits, and bundled Omni Flash access. For small creative teams under tight deadlines, that may be the fastest way to evaluate the model before the API arrives. The enterprise use cases that really matter It is easy to default to "marketing video" as the use case, but Omni's value proposition for enterprises is broader if you think of it as a programmable video and media engine rather than a creative app: * Sales and marketing: rapid generation of variant ads, localized creative, and product demos without per-asset agency cycles. * Internal communications, learning and development (L&D): explainer videos, onboarding modules, and policy walkthroughs produced by non-specialists. * Customer support and documentation: dynamic, query-conditioned visual explainers attached to help articles. * Product and engineering: visualization of simulations, UI walkthroughs, and concept videos for spec reviews. * Field operations: short, situation-specific instructional clips generated on demand. What changes with Omni versus the previous generation of tools is the unification. Many enterprises stitched a workflow together from text-to-image, image-to-video, lip-sync, and voice models, each with its own contract, billing, and data path. A single Vertex AI-backed model collapses procurement and observability into one place -- assuming the eventual API delivers production-grade throughput and latency. The governance story is the most underrated part For CIOs and CISOs, the most important section of Google's announcement is not the model card; it is the provenance and content-safety work shipping alongside it. Every video generated by Omni carries Google's SynthID digital watermark. Google is expanding C2PA Content Credentials across its generative tools, and launching an AI Content Detection API on Agent Platform that lets businesses identify AI-generated content from both Google and other popular models. Partner integrations announced at the same event -- including Shutterstock, Avid (in Pro Tools), and at least one major newswire -- indicate where the standard is going. For enterprises, this matters in three concrete ways: There is also a "Personal Avatars" program that lets creators record short videos to authorize use of their voice and likeness across generated content, as Google leaders and employees showcased themselves today in posts centered around I/O featuring their AI generated likenesses. This puts it in direct competition with Synthesia, a UK-based AI unicorn focused primarily on enterprise-safe AI videos and avatars. For enterprises considering executive videos, training avatars, or branded spokesperson content, the consent model here is the right starting point -- but contracts and rights-management policies will need to extend to cover it. Risks worth flagging Omni's main risks are familiar but worth restating. The competitive landscape is crowded with the aforementioned Synthesia, TikTok parent company ByteDance's acclaimed Seedance model, Kuaishou Technology's Kling AI models, and the fast-improving open-source field all compete for the same workflows. Lock-in to any single video model is a real concern when output quality is still leapfrogging quarterly. Latency and cost for production-volume video generation remain unproven outside controlled demos. In addition, the legal status of training data for generative video is unsettled in multiple jurisdictions; enterprises should require clear indemnification language before deploying generated video into customer-facing channels. Furthermore, VentureBeat collaborator and AI YouTuber Sam Witteveen, CEO of enterprise machine learning vendor Red Dragon AI, received early access to Gemini Omni and reported the content restrictions (which some deem to be censorship) to be quite strict, potentially restricting and inhibiting all the potential use cases an enterprise would like to pursue. Thoughts for enterprises considering adoption Omni is worth piloting -- but the structure of the pilot matters. For most enterprises, the right move over the next 30 to 60 days is to fund a small, sanctioned experiment with one or two AI Ultra seats in marketing or L&D, while the platform and security teams use that runway to prepare for the Vertex AI API: define data-residency requirements, set up SynthID and C2PA verification in the content pipeline, and stand up the AI Content Detection API alongside existing media-governance tooling. Treat the consumer rollout as a UX preview, not a production plan. When the API arrives, the enterprises that have already done the governance work will be the ones moving Omni into real workflows while everyone else is still drafting policy. Omni is not, by itself, a reason to overhaul an enterprise AI strategy. But it is a strong signal that the multimodal generative stack is consolidating into single models with first-party provenance baked in -- and that is a shift technical decision-makers should be planning around now.
[8]
Google debuts new Omni world model at Google I/O with advanced AI video capabilities
Google just unveiled a brand new AI world model at Google I/O 2026 called Gemini Omni. While Google calls Gemini Omni a "new model that can create anything from any output," its showcase focused on the AI model's video-generation capabilities. The first release within the Omni AI model family is called Gemini Omni Flash. So, what's so special about Gemini Omni Flash compared to other AI video tools and Google's Genie world model? During a presentation at Google I/O, Google DeepMind CEO Demis Hassabis described Omni as a crucial step toward AGI. Hassabis said that in the future, Omni would be able to output "anything" the user wanted. Unlike text-to-video tools like Google Veo, Gemini Omni is multi-modal in both input and output. That means users can input text, audio, images, and video into the model, and Omni will generate a unique, interactive world utilizing Gemini's "real-world knowledge." Google says Omni will be able to generate videos with more accurate physics and, in turn, create more realistic-looking content. Gemini will also be able to understand the context of a prompt, such as a historical fact, to generate more accurate video content. In addition to video generation, Google says users can also edit videos through conversation with the Omni model. Google showcased some samples of what editing through Omni can do at I/O. For example, users can take a video they shot or an AI-generated video and edit specific aspects of the clip. If a user likes a shot but wants to change the background, they can do so with Omni. Omni can take a video and change the style, the angle, the scenery, or even a specific detail in the clip. With Gemini Omni Flash, users can create their own digital likeness through Avatars. However, Google says it is still testing this feature to ensure a responsible launch. All videos generated with Omni will be embedded with the SynthID watermark, allowing them to be verified as AI-generated. Google is rolling out Gemini Omni Flash today to paid Google AI Plus, Pro, and Ultra subscribers within the Gemini app and Google Flow. Later this week, Gemini Omni Flash will also launch in YouTube Shorts and the YouTube Create app at no cost to users.
[9]
Gemini 'Omni' Will Generate Media From Any Input, Starting With Video
The model can "create anything from any input," according to the company. There are a flurry of AI-related announcements coming out of Google I/O 2026 today, but perhaps the most impressive is a new multimodal model called Gemini Omni. While it's launching as a video generator to begin with, it'll eventually be able to incorporate images and audio too, on both the input and output side. The idea is you can remix different audio, images, and video into a completely new clip, via a custom prompt. Right now, you can only generate videos from text prompts and images within Gemini, so you're getting the added ability to combine audio clips and existing videos too when generating something new -- multiple sources for input, and then an output that Google promises is better than ever in terms of realism and accuracy. While image and audio generation is on the way, the ability to create videos is coming first, with a model called Gemini Omni Flash. The example Google gives is picking a few styles from images in your phone's gallery, and then applying them to an existing video: So if you wanted to, you could make a video of you in the real world look like a Pixar animation. You can also edit your videos through "conversation," says Google. That conversation aspect will be familiar to anyone who already uses Gemini to make videos: You just explain what it is you want to see, and Omni takes care of it. You can use follow-up prompts to change something specific about the video, like an object or color, or to create your very own reshoots of the scene where the action changes. You can also change the angle or the environment of a video -- transporting yourself from a bedroom to a beach scene, perhaps. Google says you can take multiple turns to refine your videos, while still being able to get back to the original clip. Google says Gemini Omni uses "an intuitive understanding of physics" together with "Gemini's knowledge of history, science, and cultural context" to make videos as realistic and as consistent as possible -- though I'll have to try this out for myself to see if this all works as well as Google says it will. Omni now comes with a better understanding of forces like gravity, kinetic energy, and fluid dynamics, so there should be less AI weirdness on show. As well as building scenes, Google says, Gemini Omni reasons about what should happen next. AI videos can often collapse because they're trying to follow patterns from the vast number of videos in their training data, rather than follow the laws of physics. If a person disappears off-camera, they won't necessarily still be there when the camera pans back. Google claims Gemini Omni will show fewer issues like this. To protect against deepfakes, Google is putting some limits on video creation. For now, you'll only be able to use your own voice and a digital avatar based on you to generate outputs. In addition, all videos will carry Google's invisible SynthID watermark that indicates the content is AI-generated. Gemini Omni Flash is rolling out now in the Gemini app and Google Flow, for Google AI Plus, Pro, and Ultra subscribers. It's also going to be available for free in YouTube Shorts and the YouTube Create app later this week. At the time of writing, there's no word on usage limits. At the moment, those on a Google AI Plus plan ($7.99 a month) can generate two videos a day using the Veo 3.1 Lite model. It remains to be seen how generous Google is with Gemini Omni generations -- it looks like they take up a fair amount of AI processing power.
[10]
Google targets AI agents and video generation with Gemini 3.5 Flash and Omni - SiliconANGLE
Google targets AI agents and video generation with Gemini 3.5 Flash and Omni Google LLC today introduced two new generative artificial intelligence models that push its Gemini family further into AI agents and multimodal creation: Gemini 3.5 Flash, a fast reasoning model designed to power agentic workflows, and Gemini Omni, a creative model that can generate and edit video from nearly any input. Gemini 3.5 is the newest generation of Google's flagship model family, combining frontier intelligence with tool use. This version provides the scaffolding for building reasoning agents and begins with the release of Flash, the smallest and most nimble model in the series, which balances high speed with high performance at low cost. According to Google, Flash 3.5 is designed to outperform Gemini 3.1 Pro on challenging benchmarks such as Terminal-Bench 2.1, GDPval-AA and MCP Atlas. The company added that it also exceeds other frontier models on the market in speed, running four times faster than the fastest in the industry. Flash 3.5's speed and performance enable it to handle the long-horizon tasks required for AI agent work. When coupled with the new update to Antigravity, the company's agentic coding editor, the new large language model becomes a powerful AI engine capable of orchestrating multiple agents that collaborate at scale to solve complex problems. The company also released a new personal assistant named Spark. Google said it built 3.5 Flash to act as the "brain" that can help people navigate their lives and take actions on their behalf. It is rolling out to trusted testers today. The same model has also become the default for the Gemini app and AI Mode in Search globally. Today, Google introduced Gemini Omni, bringing the company's flagship large language model reasoning to the ability to create anything from any input, starting with video. The company said that with Omni, users can combine images, audio, video and text as input, and it will generate videos using Gemini's real-world knowledge to produce high-fidelity output. Users can then use conversation to iterate on and edit those videos. The first model in the new family, Omni Flash, will be available starting today in the Gemini app, Google Flow and YouTube Shorts. Google said that using Gemini Omni Flash, users can start with whatever formats they like to produce wild but lifelike videos. That means they can take an image or a video and insert themselves into it. They could also take a short video and change the style from realistic to cartoon or anime, or make it look as though they were walking through a Renaissance painting. Every conversation with the model layers changes and transformations according to the last request. This allows users to change specific details or broader visual elements. The model also takes into account the physics and consequences of requests, allowing users to change the environment, angle, style and action, as well as add new characters, objects, details and more. The company stressed that it's dedicated to developing AI responsibly and is designing policies to protect users from harm involving the use of its AI tools. In line with this, it's incorporating SynthID, an imperceptible watermark that identifies videos generated by Omni and other AI sources.
Share
Copy Link
Google unveiled Gemini Omni at I/O 2026, a multimodal AI model that can generate video from any combination of images, audio, video, and text. The AI video tool lets users create digital avatars that look and sound like them, edit videos through natural conversation, and modify real footage. All outputs include SynthID watermarks to combat deepfakes.
At Google I/O 2026, the tech giant took a concrete step toward its vision of true multimodal AI with the launch of Google Gemini Omni, a new family of models that CEO Sundar Pichai says can "create anything from any input."
1
The announcement marks three years since Google first introduced Gemini with the goal of building a single neural network trained on text, image, audio, and video that could generate content in any format. Starting with video generation, Omni represents what Google calls "the next step" toward building AI that can model and simulate the real world.2

Source: ZDNet
Unlike Google's existing video model Veo, which lets users turn text and images into videos, Google Gemini Omni reasons across multiple input types simultaneously rather than simply stitching them together. Users can combine images, audio, video, and text to produce consistent, high-quality outputs.
1
Nicole Brichtova, DeepMind's director of product management, emphasized that this release is more than a Veo update: "It's the next step towards the progression of combining the intelligence of Gemini with the rendering capabilities of our media models."1
The multimodal AI models demonstrate an improved understanding of physics, incorporating forces like gravity, kinetic energy, and fluid dynamics to create realistic video outputs.
4
Gemini Omni also leverages knowledge of history, science, and cultural context to bridge the gap from photorealism to meaningful storytelling. When given a simple prompt like "a claymation explainer of protein folding," the AI video tool quickly rendered a stop-motion video with a voice-over explaining how proteins fold into alpha helixes and beta sheets.1

Source: CNET
Users can generate video from text, generate video from images, or upload their own footage and transform it entirely. "Your video becomes a starting point for something you never could have filmed yourself," Google explained.
4
The ability to modify real-life footage into AI-generated content represents a significant leap beyond previous capabilities.5
One of the standout features allows users to edit videos through natural conversation, with each instruction building on the last to keep characters and elements consistent.
4
Users can change environments, angles, styles, or specific details with plain text commands rather than complex editing software. The multimodal AI models also enable users to create a digital avatar that looks and sounds like them by recording themselves speaking a series of numbers during onboarding.1
The video cloning capability has sparked both intrigue and concern. Content creators could use digital avatars for personalized videos or when they're having "a bad hair day, a bad voice day, or a bad attitude day," as one observer noted.
3
However, Brichtova and research engineer Gabe Barth-Maron positioned the feature more as a consumer tool for creating "personalized memes" rather than professional content.1
Related Stories
To combat potential misuse and deepfakes, Google has implemented several safeguards. All videos created with Omni will include Google's SynthID digital watermark, an imperceptible fingerprint that allows users to verify whether videos were generated via Gemini products.
1
4
The watermarks are particularly important given Omni's ability to fully replace elements in videos, which could lead to concerning outcomes.2
Google stated it has "clear policies to protect users from harm and governing the use of our AI tools."
4
For editing videos to change audio and speech, the company is still testing the capability to bring it to users "responsibly."3
The first model in the family, Gemini Omni Flash, is rolling out today to the Gemini app, YouTube Shorts, and AI creative studio Google Flow.
1
Flash can render 10 seconds of video, a limitation based on user expectations rather than technical constraints, with longer durations planned for the near future.1
The model is available to all Google AI Plus, Pro, and Ultra subscribers globally.4

Source: Lifehacker
While Google is positioning Omni Flash as a consumer tool, the enterprise and creative implications are clear. The company will make Omni available via API in the coming weeks for developers and enterprise customers.
2
Brichtova highlighted the model's text-rendering capabilities as particularly useful for advertising applications.1
An end-to-end multimodal workflow could transform how advertisers and filmmakers work, though the technology's reception will depend on whether it can overcome the "uncanny valley" look that has plagued AI-driven video generation.4
The long-term vision extends beyond video to generate images from audio or audio from video, positioning Gemini Omni as a step toward AI that can simulate reality rather than just predict text.
1
A more powerful Gemini Omni Pro version is in development for future release.2
Summarized by
Navi
[1]
[3]
[4]
[5]
12 May 2026โขTechnology

12 Apr 2025โขTechnology

11 Jul 2025โขTechnology

1
Technology

2
Technology

3
Technology
