3 Sources
3 Sources
[1]
Microsoft's MAI-Image-2 enters the top three AI image generators
The second version of Microsoft's in-house image model lands at #3 on Arena.ai's leaderboard, behind only Google and OpenAI, and begins rolling out across Copilot and Bing Image Creator today. A year ago, Microsoft was generating images for Bing and Copilot almost entirely with OpenAI's models. On Thursday, the company's in-house team announced MAI-Image-2, a second-generation image model that has debuted at number three on the Arena.ai text-to-image leaderboard, placing Microsoft's own technology directly behind Google's Gemini 3.1 Flash and OpenAI's GPT Image 1.5. The announcement comes from the Microsoft AI Superintelligence team, the internal research group that Mustafa Suleyman formed in November 2025 and now leads full-time following a leadership reorganisation at Microsoft announced just two days ago. Mustafa Suleyman stepped back from his broader CEO role at Microsoft AI on Monday to focus exclusively on that team and its frontier model ambitions. MAI-Image-2 is the first model to arrive publicly since that shift. MAI-Image-1, the predecessor, launched in October 2025 and debuted in the top ten on LMArena, the same crowd-sourced preference leaderboard, then known by a slightly different name. At the time, it was Microsoft's first image generation model developed entirely in-house, and the company integrated it into Bing Image Creator and Copilot alongside DALL-E 3 and GPT-4o. MAI-Image-2 extends that trajectory: built with input from photographers, designers, and visual storytellers, and focused on three areas where creatives said the gap was largest. The first is photorealism, natural light, accurate skin tones, environments with physical texture and wear. Microsoft says the model is designed to reduce the post-production work that currently sits between generation and usable output. The second is in-image text: MAI-Image-2 is built to handle readable lettering within scenes, from signage to infographics to typographic layouts, a category where many image models still struggle to produce consistent, accurate characters. The third is detailed scene generation: dense compositions, surreal concepts, cinematic framing, and the kind of imaginative work where precise prompting and high fidelity matter most. Access is rolling out through multiple channels. The MAI Playground, Microsoft's public model testing environment at playground.microsoft.ai, has the model available now. MAI-Image-2 is also beginning to roll out across Copilot and Bing Image Creator. Enterprise customers can access the model via API today, and Microsoft says API access will open to any developer through Microsoft Foundry "soon", though no specific date has been given for that broader availability. A commercial application form is available for organisations interested in large-scale image generation use. The announcement also notes that the team's next-generation GB200 compute cluster is now operational, a reference to NVIDIA's Blackwell-architecture hardware. No details were provided on cluster scale. The infrastructure claim appears to be positioning context for the models the superintelligence team plans to release next, rather than a technically verifiable specification. The pace is notable. Microsoft announced its first in-house voice model (MAI-Voice-1) and its first text model preview (MAI-1-preview) in August 2025. MAI-Image-1 followed in October. Now, five months later, the second image generation model is placing in the top three on the most widely cited crowd-sourced image leaderboard in the field. That cadence suggests the superintelligence team is moving at a different pace from Microsoft's historically slower consumer product cycles, and doing so with hardware and infrastructure it increasingly owns rather than rents from OpenAI.
[2]
Microsoft Launches MAI-Image-2 Text-to-Image Model -- And It's Better Than Expected - Decrypt
Strict filters, usage caps, and missing features currently limit real-world usefulness, however. Microsoft has been quietly building its own image generator. Announced Thursday by the company's AI Superintelligence team, MAI-Image-2 has already landed at #3 on the Arena.ai leaderboard -- behind only the models from Google and OpenAI -- making Microsoft a legitimate player in a space it had previously outsourced to its partners. That last part is worth sitting with. Microsoft has been paying OpenAI billions to power Copilot and Bing Image Creator. Building a competing image model in-house is an interesting business move. MAI-Image-2 is available now in the MAI Playground, with a gradual rollout to Copilot and Bing Image Creator underway. API access is currently limited to select enterprise customers, with broader availability on Microsoft Foundry coming soon. The team says it built the model by talking directly to photographers, designers, and visual storytellers. Three things came out of those conversations: improved photorealism, more reliable in-image text generation, and stronger capacity for detailed, imaginative scene construction. Whether or not that process translated into a genuinely useful tool is a different question. The first thing you notice when you open the MAI Playground is how understated it is. The interface is minimal and clean, visually somewhere between Claude and Hume, with none of the maximalist dashboard energy you get from Midjourney or the chatbot experience you get from Gemini. The images themselves are genuinely pretty strong. Photorealism is a real strength here -- the model has a solid grasp of natural light, surface texture, and spatial relationships. It doesn't quite hit the level of Google's Nano Banana Pro, which still rules the leaderboard for a reason, but in some realism tests it comes surprisingly close. Better prompting likely pushes it further; our initial results improved noticeably as we dialed in our descriptions. Even complex, unrealistic scenes with parameters that defied logic were properly handled by the model, beating other models in details like the body proportions, limb position, depth, and spatial positioning. For example, this image of a dog riding a bike in the middle of the ocean is arguably the most accurate one we've produced in zero-shot tests. Text generation is a legitimate highlight. MAI-Image-2 handled complex typography with far more consistency than we expected -- large blocks of text in images, posters, signage -- without the typical garbling you see from most models. We even pushed it toward multilingual text: It managed to generate some hanzi Chinese characters, though the accuracy wasn't perfect. Still, the fact that it tried and got partway there is notable. The model understands artistic style well, shifting between photographic realism, graphic design aesthetics, and illustrated styles without much friction. It reads prompts carefully, including stylistic instructions, and delivers something coherent on the other end. For a broad range of visual tasks, it's versatile. Now for the harder truths. MAI-Image-2 is aggressively filtered -- more so than Google Imagen, and more so than OpenAI's DALL-E. We ran our usual test of a cartoon drawing of a spider chasing a woman, and got a flat refusal. Again, that's a drawing -- of a spider. The content moderation here is tuned to a level that will frustrate anyone doing creative work in gray areas, horror illustration, or anything that reads as remotely tense. The usage limits are equally restrictive. Each generation triggers a 30-second cooldown. After 15 images, you're locked out for 24 hours. For casual experimentation, that's manageable. For any kind of production workflow, it's a dealbreaker in the native UI. There's also only one resolution: 1:1. No landscape, no portrait, no custom ratios. In 2026, that's a significant limitation -- particularly for social media content, which is precisely where Microsoft presumably wants this embedded in Copilot. And speaking of Copilot: MAI-Image-2 isn't there yet. The rollout is happening, but as of today, the product you'd actually want it in doesn't have it. One more missing piece: This is purely a text-to-image tool. No image-to-image, no inpainting, no outpainting, no reference image support. For users expecting anything close to Firefly or Midjourney's editing capabilities, this will feel half-finished. MAI-Image-2 performs better than its leaderboard ranking suggests. In our hands-on tests, it beat GPT-Image on image quality and text rendering, which is interesting given that GPT-Image sits above it on Arena.ai's leaderboard. Benchmark positions don't always tell the full story. The strategic logic behind building this is clear. Microsoft has been licensing OpenAI's image models for Copilot while simultaneously funding OpenAI's biggest competitor, Anthropic. Having a capable in-house model reduces dependency, cuts costs at scale, and gives Microsoft something to iterate on without asking for permission. From that angle, MAI-Image-2 doesn't need to beat Nano Banana. It just needs to be good enough -- and it is. The problem is the product constraints. The generation caps, the strict content policy, the 1:1-only output, the missing editing features, etc; these are the kinds of limitations that put a ceiling on real-world utility. A model this capable deserves infrastructure that matches it. MAI-Image-2 is a strong technical foundation hamstrung by conservative product decisions. Once Microsoft loosens the restrictions, this becomes a serious contender. Right now, it's a promising preview of what Microsoft's image stack could actually become.
[3]
Microsoft unveils MAI Image 2 with better photorealism and text generation: How to use it
The company is also rolling out MAI Image 2 in Copilot and Bing Image Creator. Microsoft has announced a new AI image generation model called MAI Image 2, which is said to create more realistic images and generate clearer text within visuals. According to the tech giant, the development of MAI Image 2 involved feedback from people working in creative fields. The company spoke with photographers, designers, and visual storytellers to understand where existing AI tools fall short in everyday creative work. One of the biggest upgrades in MAI Image 2 is enhanced photorealism. Microsoft says the model focuses on elements such as natural lighting, accurate skin tones, and realistic environments. The goal is to make images look as if they exist in the real world rather than appearing artificial. This could help creatives spend less time correcting details in post-production. Also read: OpenAI plans AI superapp to merge ChatGPT, Codex and Atlas browser: Here's why Another improvement comes in the way the model handles text within images. 'From poster type to the sign in the background of a scene, text can be a key part of imagery. MAI-Image-2 enables consistent creation of infographics, slides, diagrams, and more, with little lost between direction and creation,' the company said. The model is also designed to create complex and visually rich scenes. Microsoft says MAI Image 2 can handle surreal ideas, cinematic visuals, and detailed compositions more effectively. Also read: Perplexity launches Comet AI browser for iPhone users: Check features Microsoft has made a preview version of the model available through the MAI Playground. The company is also rolling out MAI Image 2 in Copilot and Bing Image Creator. API access is already available to select customers such as WPP that need large-scale image generation.
Share
Share
Copy Link
Microsoft launched MAI-Image-2, its second-generation AI image generation model, securing third place on the Arena.ai leaderboard behind only Google and OpenAI. The text-to-image model brings enhanced photorealism, accurate in-image text generation, and detailed scene creation to Copilot and Bing Image Creator, marking a strategic shift as Microsoft builds in-house capabilities.
Microsoft announced MAI-Image-2 on Thursday, a second-generation AI image generation model that debuted at number three on the Arena.ai text-to-image leaderboard, positioning the company directly behind Google's Gemini 3.1 Flash and OpenAI's GPT Image 1.5
1
. The announcement comes from the Microsoft AI Superintelligence team, the internal research group led by Mustafa Suleyman, who stepped back from his broader CEO role at Microsoft AI just two days earlier to focus exclusively on frontier model development1
.
Source: Decrypt
This marks a significant strategic shift for Microsoft, which has been paying OpenAI billions to power Copilot and Bing Image Creator with models like DALL-E 3 and GPT-4o
2
. Building a competing AI image generation model in-house reduces dependency on external partners and cuts costs at scale, while giving Microsoft greater control over its product roadmap2
.The development of MAI-Image-2 involved direct feedback from photographers, designers, and visual storytellers who identified where existing AI tools fall short in everyday creative work
3
. The model focuses on three core areas: enhanced photorealism with natural lighting, accurate skin tones, and realistic environments that reduce post-production work between generation and usable output1
.Hands-on testing revealed that the text-to-image model demonstrates a solid grasp of surface texture and spatial relationships, with photorealism approaching the level of top-tier competitors
2
. Complex scenes with parameters that defied logic were handled with accurate body proportions, limb positioning, depth, and spatial awareness2
. The goal is to make images look as if they exist in the real world rather than appearing artificial, helping creative professionals spend less time correcting details3
.MAI-Image-2 is built to handle readable lettering within scenes, from signage to infographics to typographic layouts, a category where many image models still struggle to produce consistent, accurate characters
1
. Microsoft emphasized that text can be a key part of imagery, enabling consistent creation of infographics, slides, and diagrams with little lost between direction and creation3
.Testing confirmed this capability as a legitimate highlight, with the model handling complex typography and large blocks of text in posters with far more consistency than expected
2
. The model even managed to generate some Chinese hanzi characters, though accuracy wasn't perfect, demonstrating potential for multilingual applications2
.The third focus area is detailed scene generation, with the model designed to handle dense compositions, surreal concepts, cinematic framing, and imaginative work where precise prompting and high fidelity matter most
1
. The model understands artistic style well, shifting between photographic realism, graphic design aesthetics, and illustrated styles without friction2
. It reads prompts carefully, including stylistic instructions, and delivers coherent results across a broad range of visual tasks2
.MAI-Image-2 is available now through the MAI Playground at playground.microsoft.ai, Microsoft's public model testing environment
1
. The model is beginning to roll out across Copilot and Bing Image Creator, though as of the announcement, the integration wasn't yet complete2
. Enterprise customers can access the model via API access today, with select customers such as WPP already using it for large-scale image generation3
. Microsoft says API access will open to any developer through Microsoft Foundry soon, though no specific date has been provided1
.Related Stories
Despite strong performance, MAI-Image-2 faces limitations that may frustrate users. The model employs aggressive content moderation, more restrictive than Google Imagen and OpenAI's offerings, with refusals for prompts that other models handle without issue
2
. Usage limits include a 30-second cooldown between generations and a 15-image cap before a 24-hour lockout, making production workflows challenging in the native interface2
.The model currently supports only 1:1 resolution with no landscape, portrait, or custom ratios, a significant limitation for social media content creation
2
. It functions purely as a text-to-image tool with no image-to-image, inpainting, outpainting, or reference image support, features that creative professionals expect from competing platforms2
.The pace of development is notable. Microsoft announced its first in-house voice model and text model preview in August 2025, followed by MAI-Image-1 in October 2025, which debuted in the top ten on the Arena.ai leaderboard
1
. Five months later, MAI-Image-2 is placing in the top three, suggesting the Microsoft AI Superintelligence team is moving at a different pace from Microsoft's historically slower consumer product cycles1
. The team's next-generation GB200 compute cluster is now operational, positioning infrastructure for future frontier model releases1
.Summarized by
Navi
[1]
[2]
14 Oct 2025β’Technology

05 Nov 2025β’Technology

29 Aug 2025β’Technology

1
Technology

2
Policy and Regulation

3
Business and Economy
