Curated by THEOUTPOST
On Thu, 27 Mar, 12:04 AM UTC
12 Sources
[1]
OpenAI's new AI image generator is potent and bound to provoke
The arrival of OpenAI's DALL-E 2 in the spring of 2022 marked a turning point in AI when text-to-image generation suddenly became accessible to a select group of users, creating a community of digital explorers who experienced wonder and controversy as the technology automated the act of visual creation. But like many early AI systems, DALL-E 2 struggled with consistent text rendering, often producing garbled words and phrases within images. It also had limitations in following complex prompts with multiple elements, sometimes missing key details or misinterpreting instructions. These shortcomings left room for improvement that OpenAI would address in subsequent iterations, such as DALL-E 3 in 2023. On Tuesday, OpenAI announced new multimodal image generation capabilities that are directly integrated into its GPT-4o AI language model, making it the default image generator within the ChatGPT interface. The integration, called "4o Image Generation" (which we'll call "4o IG" for short), allows the model to follow prompts more accurately (with better text rendering than DALL-E 3) and respond to chat context for image modification instructions. An AI-generated cat in a car drinking a can of beer created by OpenAI's 4o Image Generation model. OpenAI An AI-generated cat in a car drinking a can of beer created by OpenAI's 4o Image Generation model. OpenAI An AI-generated photo of Abraham Lincoln holding an Ars Technica sign created by OpenAI's 4o Image Generation model. OpenAI An AI-generated photo of Abraham Lincoln holding an Ars Technica sign created by OpenAI's 4o Image Generation model. OpenAI An AI-generated image of "a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting" created by OpenAI's 4o Image Generation model. An AI-generated image of "a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting" created by OpenAI's 4o Image Generation model. An AI-generated photo of Abraham Lincoln holding an Ars Technica sign created by OpenAI's 4o Image Generation model. OpenAI An AI-generated image of "a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting" created by OpenAI's 4o Image Generation model. An AI-generated "Queen of the Universe" by OpenAI's 4o Image Generation model. OpenAI An AI-generated plate of pickles created by OpenAI's 4o Image Generation model. OpenAI Generating a gaming PC with 1000 RGB lights using OpenAI's 4o Image Generation model in ChatGPT. OpenAI / Benj Edwards Generating a flaming cheeseburger using OpenAI's 4o Image Generation model in ChatGPT. OpenAI / Benj Edwards The new image generation feature began rolling out Tuesday to ChatGPT Free, Plus, Pro, and Team users, with Enterprise and Education access coming later. The capability is also available within OpenAI's Sora video generation tool. OpenAI told Ars that the image generation when GPT-4.5 is selected calls upon the same 4o-based image generation model as when GPT-4o is selected in the ChatGPT interface. Like DALL-E 2 before it, 4o IG is bound to provoke debate as it enables sophisticated media manipulation capabilities that were once the domain of sci-fi and skilled human creators into an accessible AI tool that people can use through simple text prompts. It will also likely ignite a new round of controversy over artistic styles and copyright -- but more on that below. 4o IG can change our perception of media reality. Given this actual photo of a dog... Benj Edwards 4o IG can change our perception of media reality. Given this actual photo of a dog... Benj Edwards ...the AI model can change what the dog is doing in a realistic way, such as playing with a fictional puppy inserted into the scene. OpenAI / Benj Edwards ...the AI model can change what the dog is doing in a realistic way, such as playing with a fictional puppy inserted into the scene. OpenAI / Benj Edwards 4o IG can change our perception of media reality. Given this actual photo of a dog... Benj Edwards ...the AI model can change what the dog is doing in a realistic way, such as playing with a fictional puppy inserted into the scene. OpenAI / Benj Edwards Some users on social media initially reported confusion since there's no UI indication of which image generator is active, but you'll know it's the new model if the generation is ultra slow and proceeds from top to bottom. The previous DALL-E model remains available through a dedicated "DALL-E GPT" interface, while API access to GPT-4o image generation is expected within weeks. Truly multimodal output 4o IG represents a shift to "native multimodal image generation," where the large language model processes and outputs image data directly as tokens. That's a big deal, because it means image tokens and text tokens share the same neural network. It leads to new flexibility in image creation and modification. Despite baking-in multimodal image generation capabilities when GPT-4o launched in May 2024 -- when the "o" in GPT-4o was touted as standing for "omni" to highlight its ability to both understand and generate text, images, and audio -- OpenAI has taken over 10 months to deliver the functionality to users, despite OpenAI president Greg Brock teasing the feature on X last year. OpenAI was likely goaded by the release of Google's multimodal LLM-based image generator called "Gemini 2.0 Flash (Image Generation) Experimental," last week. The tech giants continue their AI arms race, with each attempting to one-up the other. And perhaps we know why OpenAI waited: At a reasonable resolution and level of detail, the new 4o IG process is extremely slow, taking anywhere from 30 seconds to one minute (or longer) for each image. Generating a four-panel comic using OpenAI's 4o Image Generation model in ChatGPT. OpenAI / Benj Edwards Generating a four-panel comic using OpenAI's 4o Image Generation model in ChatGPT. OpenAI / Benj Edwards Giving the man in the four-panel comic a beard using OpenAI's 4o Image Generation model in ChatGPT. OpenAI / Benj Edwards Giving the man in the four-panel comic a beard using OpenAI's 4o Image Generation model in ChatGPT. OpenAI / Benj Edwards Generating a four-panel comic using OpenAI's 4o Image Generation model in ChatGPT. OpenAI / Benj Edwards Giving the man in the four-panel comic a beard using OpenAI's 4o Image Generation model in ChatGPT. OpenAI / Benj Edwards Even if it's slow (for now), the ability to generate images using a purely autoregressive approach is arguably a major leap for OpenAI due to its flexibility. But it's also very compute-intensive, since the model generates the image token by token, building it sequentially. This contrasts with diffusion-based methods like DALL-E 3, which start with random noise and gradually refine an entire image over many iterative steps. Conversational image editing In a blog post, OpenAI positions 4o Image Generation as moving beyond generating "surreal, breathtaking scenes" seen with earlier AI image generators and toward creating "workhorse imagery" like logos and diagrams used for communication. The company particularly notes improved text rendering within images, a capability where previous text-to-image models often spectacularly failed, often turning "Happy Birthday" into something resembling alien hieroglyphics. OpenAI claims several key improvements: users can refine images through conversation while maintaining visual consistency; the system can analyze uploaded images and incorporate their details into new generations; and it offers stronger photorealism -- although what constitutes photorealism (for example, imitations of HDR camera features, detail level, and image contrast) can be subjective. In its blog post, OpenAI provided examples of intended uses for the image generator, including creating diagrams, infographics, social media graphics using specific color codes, logos, instruction posters, business cards, custom stock photos with transparent backgrounds, editing user photos, or visualizing concepts discussed earlier in a chat conversation. Notably absent: Any mention of the artists and graphic designers whose jobs might be affected by this technology. As we covered throughout 2022 and 2023, job impact is still a top concern among critics of AI-generated graphics. Fluid media manipulation Shortly after OpenAI launched 4o Image Generation, the AI community on X put the feature through its paces, finding that it is quite capable at inserting someone's face into an existing image, creating fake screenshots, and converting meme photos into the style of Studio Ghibli, South Park, felt, Muppets, Rick and Morty, Family Guy, and much more. It seems like we're entering a completely fluid media "reality" courtesy of a tool that can effortlessly convert visual media between styles. The styles also potentially encroach upon protected intellectual property. Given what Studio Ghibli co-founder Hayao Miyazaki has previously said about AI-generated artwork ("I strongly feel that this is an insult to life itself."), it seems he'd be unlikely to appreciate the current AI-generated Ghibli fad on X at the moment. An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Barsee An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Barsee An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Justine Moore An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Justine Moore An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Barsee An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Justine Moore An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Justine Moore An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Justine Moore An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Manuel Calavera An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Manuel Calavera An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Justine Moore An Internet meme converted into "Studio Ghibli" style art by people on X. OpenAI / Manuel Calavera To get a sense of what 4o IG can do ourselves, we ran some informal tests, including some of the usual CRT barbarians, queens of the universe, and beer-drinking cats, which you've already seen above (and of course, the plate of pickles.) The ChatGPT interface with the new 4o image model is conversational (like before with DALL-E 3), but you can suggest changes over time. For example, we took the author's EGA pixel bio (as we did with Google's model last week) and attempted to give it a full body. Arguably, Google's more limited image model did a far better job than 4o IG. While my pixel avatar was commissioned from the very human (and talented) Julia Minamata in 2020, I also tried to convert the inspiration image for my avatar (which features me and legendary video game engineer Ed Smith) into EGA pixel style to see what would happen. In my opinion, the result proves the continued superiority of human artistry and attention to detail. We also tried to see how many objects 4o Image Generation could cram into an image, inspired by a 2023 tweet by Nathan Shipley when he was evaluating DALL-E 3 shortly after its release. We did not account for every object, but it looks like most of them are there. On social media, other people have manipulated images using 4o IG (like Simon Willison's bear selfie), so we tried changing an AI-generated note featured in an article last year. It worked fairly well, though it did not really imitate the handwriting style as requested. To take text generation a little further, we generated a poem about barbarians using ChatGPT, then fed it into an image prompt. The result feels roughly equivalent to diffusion-based Flux in capability -- maybe slightly better -- but there are still some obvious mistakes here and there, such as repeated letters. We also tested the model's ability to create logos featuring our favorite fictional Moonshark brand. One of the logos not pictured here was delivered as a transparent PNG file with an alpha channel. This may be a useful capability for some people in a pinch, but to the extent that the model may produce "good enough" (not exceptional, but looks OK at a glance) logos for the price of $o (not including an OpenAI subscription), it may end up competing with some human logo designers, and that will likely cause some consternation among professional artists. Frankly, this model is so slow we didn't have time to test everything before we needed to get this article out the door. It can do much more than we have shown here -- such as adding items to scenes or removing them. We may explore more capabilities in a future article. Limitations By now, you've seen that, like previous AI image generators, 4o IG is not perfect in quality: It consistently renders the author's nose at an incorrect size. Other than that, while this is one of the most capable AI image generators ever created, OpenAI openly acknowledges significant limitations of the model. For example, 4o IG sometimes crops images too tightly or includes inaccurate information (confabulations) with vague prompts or when rendering topics it hasn't encountered in its training data. The model also tends to fail when rendering more than 10-20 objects or concepts simultaneously (making tasks like generating an accurate periodic table currently impossible) and struggles with non-Latin text fonts. Image editing is currently unreliable over many multiple passes, with a specific bug affecting face editing consistency that OpenAI says it plans to fix soon. And it's not great with dense charts or accurately rendering graphs or technical diagrams. In our testing, 4o Image Generation produced mostly accurate but flawed electronic circuit schematics. Safety and impact Even with those limitations, multimodal image generators are an early step into a much larger world of completely plastic media reality where any pixel can be manipulated on demand with no particular photo editing skill required. That brings with it potential benefits, ethical pitfalls, and the potential for terrible abuse. In a notable shift from DALL-E, OpenAI now allows 4o IG to generate adult public figures (not children) with certain safeguards, while letting public figures opt out if desired. Like DALL-E, the model still blocks policy-violating content requests (such as graphic violence, nudity, and sex). The ability for 4o Image Generation to imitate celebrity likenesses, brand logos, and Studio Ghibli films reinforces and reminds us how GPT-4o is partly (aside from some licensed content) a product of a massive scrape of the Internet without regard to copyright or consent from artists. That mass-scraping practice has resulted in lawsuits against OpenAI in the past, and we would not be surprised to see more lawsuits or at least public complaints from celebrities (or their estates) about their likenesses potentially being misused. On X, OpenAI CEO Sam Altman wrote about the company's somewhat devil-may-care position about 4o IG: "This represents a new high-water mark for us in allowing creative freedom. People are going to create some really amazing stuff and some stuff that may offend people; what we'd like to aim for is that the tool doesn't create offensive stuff unless you want it to, in which case within reason it does." Zooming out, GPT-4o's image generation model (and the technology behind it, once open source) feels like it further erodes trust in remotely produced media. While we've always needed to verify important media through context and trusted sources, these new tools may further expand the "deep doubt" media skepticism that's become necessary in the age of AI. By opening up photorealistic image manipulation to the masses, more people than ever can create or alter visual media without specialized skills. While OpenAI includes C2PA metadata in all generated images, that data can be stripped away and might not matter much in the context of a deceptive social media post. But 4o IG doesn't change what has always been true: We judge information primarily by the reputation of its messenger, not by the pixels themselves. Forgery existed long before AI. It reinforces that everyone needs media literacy skills -- understanding that context and source verification have always been the best arbiters of media authenticity. For now, Altman is ready to take on the risks of releasing the technology into the world. "As we talk about in our model spec, we think putting this intellectual freedom and control in the hands of users is the right thing to do, but we will observe how it goes and listen to society," Altman wrote on X. "We think respecting the very wide bounds society will eventually choose to set for AI is the right thing to do, and increasingly important as we get closer to AGI. Thanks in advance for the understanding as we work through this."
[2]
GPT 4o's image update unlocked a huge opportunity most people are ignoring
GPT 4o's new image generator might be the sign you've been looking for. Here's everything you need to know, including six easy steps to cash in. This is INSANEEEEEEEEEEEEEE! Hands down, this is one of the first real AI innovations we've seen this year that lives up to the AI hype. This ish right here is a game changer (I said that using my best Katt Williams impression). 🤣 Also: I tried ChatGPT's new image generator, and it shattered my expectations You don't need professional design skills, expensive software, or even an ounce of artistic talent to produce jaw-dropping visuals. All you really need is your imagination and the right prompt. Our imagination will soon become AI's biggest limitation. Every new AI announcement makes it clear that creators and marketers will own the future. AI keeps breaking down barriers to entry in nearly every area. No coding skills? There's an AI for that. No design skills? There's an AI for that, too. 🤯 In this article, I'll bring you up to speed on exactly what this update means, how image generation works and even a money-making side hustle you can launch as soon as today. If you're new to my work, my name is Lester, but feel free to call me Les. I'm an award-winning performance marketer and chairman of a group of DTC brands. These days, I'm focused on helping everyday people navigate the crowded AI space. Also: Why you should ignore 99% of AI tools - and which four I use every day If you're looking for practical and helpful AI tips and tricks, sign up for my free newsletter, No Fluff Just Facts. I share AI news, marketing insights, and trends I'm seeing that move the needle. But enough about me. Let's discuss why all this matters to you. 🫶🥹 Before jumping into it, let me give you a crash course on image generation. Image generation takes existing images and gradually adds noise, making the images appear like static on a TV. It then trains by reversing that process, learning how to remove noise step by step until it can reconstruct an image from pure randomness. While learning, the model receives captions that describe each image, like "a cat playing," helping it associate text with visual patterns. Also: From zero to millions? How regular people are cashing in on AI Once trained, if you type new words like 'a red car,' the model starts with a noisy image and refines it to match the description.🤓 Back in the day, and by back in the day, I mean a few months ago, image generation was helpful but still felt like a novelty. Even when the results were decent, you typically needed editing skills to correct imperfections. Even though image generation was helpful, it could also be frustrating as your output would completely change when you reentered a prompt, making it difficult to iterate and improve. Adding text to an image was nearly impossible as fonts looked like hieroglyphics, words seemed invented, and the text often turned out blurry. It was a complete mess. 😖 Fast forward to today with the ChatGPT update; you can quickly iterate, and janky text is no longer a problem. While working with the new image generator, two things really stood out to me. First, the consistency when creating and iterating on the same output without changing styles or completely altering the project is outstanding. Also: 3 lucrative side hustles you can start right now with OpenAI's Sora video generator Second, the AI periodically asks for clarity while generating images, which I love. For once, I didn't feel like going all Mike Tyson on my keyboard while trying to get my work done. Dare I say, it was pleasant to work with. Take that with a grain of salt, tho. It's still early days, and it could easily piss me off tomorrow...but I digress. 😩 Sure, it's got some solid upgrades, like being faster than other tools and working right inside ChatGPT, plus all the usual good stuff. But I care more about two things: my sanity while using AI tools and whether the thing works. I couldn't care less about it beating some made-up benchmark. "Does it work?" is my benchmark. And so far, I'm very, very, very impressed. The power this update gives everyday people is unmatched. No more waiting on anyone else to bring your vision to life; you can just get on with it. 😑 Also: Your customers don't want more AI - here's what to invest in instead Oh, and I didn't have to pay more to access these features. It's included in the Pro plan. Here's why this is important. You can now create freely using natural language, and if I'm Adobe, I'm very concerned about where this goes. GAME CHANGER!!! All these advancements are cool, but if you're anything like me, you've got bills to pay. So, while I'm thrilled for Sam Altman and his crew, let's jump straight to the part you care about: how to use all this AI power to launch a profitable side hustle or level up your current business. I will show you exactly how to go from idea to launch using ChatGPT. If you ever get stuck or feel unsure about any step, guess what? Just ask ChatGPT, lol. 🤪 Reddit and ChatGPT can help you find and validate great ideas quickly. I wrote an in-depth article on exactly how to do that here: Million-dollar ideas pop up in your feed every day - here's exactly how to spot them Following those exact steps, I decided to start a skincare line. This idea checked several boxes. It has a high potential for repeat customers and solves a problem that allows me to build a loyal community. I'm calling it "seL," which is Les spelled backward. Why? Because I'm fancy, duh. 💁♂️ Now that we have a solid concept let's work on the design. seL is fancy, so naturally, it has to look the part. Here's the exact prompt I used to create the visual branding for seL: Create a realistic, studio-quality product mockup of a luxurious skincare brand called "seL." Design the product packaging as a sleek rectangular bottle with a matte, burnt-orange finish, accompanied by a matching rectangular product box. Both the bottle and the box should feature minimalist, sophisticated branding with the text "seL" in an elegant, modern serif font, using black typography. Beneath the brand name, include the subtle phrase "SKIN TREATMENT BY LES" and a playful yet understated tagline at the bottom reading "MADE WITH 100% FANCY INGREDIENTS JUST SO HE COULD PROVE A POINT." The bottle should have a matte black pump dispenser, contrasting gently with the warm burnt-orange packaging. Present both items clearly against a seamless background matching the burnt-orange color of the packaging. Use subtle, even lighting to highlight the matte textures and luxurious branding details clearly and distinctly, with minimal shadows and reflections. With the idea and design locked in, it's time for the rubber to meet the road and get seL manufactured. Using ChatGPT Operator, I generated a list of domestic skincare manufacturers. 🏭 Here's the prompt I used: Identify an accurate, up-to-date list of 5 reputable domestic skincare manufacturers in the United States specializing exclusively in luxury skincare products. Focus specifically on manufacturers offering premium packaging options, customizable formulations emphasizing botanical and natural ingredients, and proven experience with producing serums, moisturizers, cleansers, and treatment lotions. Present the information clearly in a structured table format including columns for Company Name, Official Website, Email Address, Phone Number, and a Brief Overview of Key Capabilities. Once I have my list, I can use Operator to contact these manufacturers for me. Full disclosure: You may have to guide Operator to keep things on track, but it's doable even if you're new to AI. 🤞 It's time to make the website. I wouldn't use ChatGPT to build a site from scratch when Shopify exists. However, I do want a custom theme so I can stand out. For this next step, I'll ask ChatGPT to come up with a few mockup ideas so I can tailor the Shopify theme to my liking. It's GAME TIME. Using ChatGPT, I can develop a marketing concept and ad creative. Moisturizer strong enough for the driest places. seL by Les locks in hydration and leaves skin glowing, smooth, and nourished. Treat yourself today and enjoy free same day shipping. But why stop at online sales only? seL should be in stores. Using ChatGPT Operator, I can have AI compile a list of big box retailers in my area. With this list, I can then use AI to help me craft powerful cold emails to get the buyers' attention. Also: Could your job be at risk due to AI? Do this before it's too late I can have Operator assist me with outreach by giving it access to my email and the list of contacts. Just like that, AI can help us find an idea and bring it to market. 📈 This update is way bigger than just making Ghibli art of water bottles and bananas, IYKYK. Jeff Bezos once said, "Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day." Before these innovations, you needed a huge budget and team to achieve the same results. Not only does AI bridge the gap on budget and team size, it bridges another gap people aren't even talking about yet... and that's time. Also: AI won't take your job, but this definitely will It's pushing the boundaries of what's possible but doing the impossible quickly. 🤑 As I said earlier, the real limitation to AI success might be our imagination, not the AI itself. Now look, while I'm hyped about GPT-4o and all it can do, it's not perfect. Even though the images are amazing and a gigantic step forward, they're not always spot on. The AI can sometimes get confused, especially with detailed requests. While it's way better than before, we're not quite at instant masterpiece status yet. Another thing to consider is originality. GPT-4o creates images based on existing data, so similar visuals may pop up. You will still need the human touch to create truly unique content. 😔 Finally, there's the elephant in the room: who owns this work? There are genuine concerns about creators getting proper credit and compensation. As AI-generated images become more common, it will get more complicated to know who holds the creative rights or how to fairly recognize the original human creators whose work these AI models learned from. We urgently need clear rules of engagement. It's essential to determine the proper legislation and safeguards because, at the current pace of AI evolution, what we're doing today will seem primitive just months from now. 🫠 At some point, we can't keep kicking this can down the road -- feel me? I came at you with a lot today, so I'll leave you with this. There are two camps when it comes to AI. 🧐 Camp one focuses on AI's limitations, pointing out flaws as reasons not to take action or as excuses to stick with the old way of doing things. Let's call them the glass-half-empty crowd. Then there's camp two, which actively implements AI and takes it for what it's worth today. They don't get too high or too low on the hype. Also: How to use ChatGPT: A beginner's guide to the most popular AI chatbot Instead, they steadily chip away, experimenting with new tools and figuring things out as they go. We'll call them the glass-half-full crowd. I'm not here to debate who's right or wrong or even predict what the future holds as AI has soooooo much hype surrounding it, but that doesn't mean it's not helpful today, like literally right now, as you're reading this. I've gone on record several times, sharing the AI tools I use, how I use them, and how they've improved my workflow. 🤓☝️ But let's be real for a second. While AI is incredibly useful, it's no magic pill. Intangibles like storytelling, building authentic connections with your audience, and recognizing good creative content will become more valuable. You'll need to embrace AI alongside these best practices for your venture to thrive. As I step down from my soapbox, I can't help but feel excited and optimistic about what's coming even though, yes, I still have some concerns about AI. 😇 Hope this helps. P.S. If you want more easy and helpful AI tips and tricks, sign up for my free newsletter, No Fluff Just Facts.
[3]
I tried ChatGPT's new image generator, and it shattered my expectations
The newly released model can finally compete with Midjourney, Google's Imagen 3, and Adobe's Firefly. OpenAI may have kicked off the text-to-image generation craze with its DALL-E model, but since those earlier glory days, the AI company's offering has been lapped by much more capable image models. As a result, when OpenAI released its latest and greatest GPT-4o image generation model, I was skeptical. After testing it, I have changed my mind entirely. When DALL-E first launched, it lived on its standalone website; since then, it has moved to ChatGPT. The move came with many benefits, including being able to ask the AI chatbot for an image you want in the same interface where you're already chatting about something else, thereby eliminating the need for constant context switching. With the release of GPT-4o image generation, OpenAI kept this convenient format, switching the default image generator from DALL-E to GPT-4o for paid subscribers. As a result, it was super easy to start creating new images from my ChatGPT Plus account. All I had to do was enter the prompt for what I wanted to see, and then it would generate them. Users can also access it from the Sora interface. Also: How to use OpenAI's Sora to create stunning AI-generated videos Beware: You can still generate images similarly if you are a free user. However, if you're unimpressed, that's because even though at launch, the model was announced to be coming to all users, including free ones, OpenAI CEO Sam Altman announced a day later that the rollout to the free tier would now be "delayed for awhile." The moment you have been waiting for -- the images. After you insert a prompt, the AI outputs the generation in under a minute. The process does take a bit longer than it used to, but the images are worth the wait, delivering lots of details, texture, realism, and even text accuracy. Instead of describing it, I will include examples below so you can see for yourself. Prompt: Can you generate a realistic image of a chameleon, up close, shot as if it were in National Geographic in 16:9 ratio? Prompt: Can you generate an image of a laptop open on a desk that says, "This model is so good that it can even get text and hands right, which are usually major challenges for AI models," with hands typing on a keyboard in 16:9 ratio? Prompt: Can you generate a realistic photo of a close-up of a woman in a crowd in Times Square looking at the camera and smiling, with the quality of one taken on a DSLR? As seen above, the image generator does a great job of adhering to the prompt and delivering high-quality, realistic images. However, when testing an AI model, one of the true performance metrics is how it compares to competitors on the market. To give you a good indicator of this, I made it generate the same prompt I tested across all of the major AI image generators, including Midjourney, Google's Imagen 3, Adobe Firefly, and more. I am attaching GPT-4o's rendition below. You can see how it fares against all of the other AI image generators in this article, including DALL-E's rendition, which clearly is far behind what the new model can do. Prompt: Can you generate an image of a vibrant, realistic hummingbird perched on a tree? Even though the quality of the images is perhaps one of the model's biggest wins, there are other benefits as well. One of the biggest is that it lives in the chatbot's interface, which makes it easy to tweak the generations with simple natural language prompts. Also, because the chatbot has the context of what you just asked it, it can consider that in building the image. For example, if you are chatting with it about throwing a birthday party, you may be able to say, "Can you now create an invite that has the information above on it?" instead of having to retype. For example, I started chatting with ChatGPT about throwing a housewarming, and when asking to make it create an invite, I didn't have to repeat the information I previously said. You can also upload reference images and then ask ChatGPT to create a different version or use them as elements of a new one. For example, you can input it as a selfie and have it generated in anime style, as seen in Altman's new X post. All of these customization features make it a really strong offering for creatives, who can also request that it be rendered on a transparent background or incorporate brand style guides such as hex codes or logos. Speaking of Altman, I was able to generate an image of him wearing a party hat. I could do so because the new model has much looser safeguards, meant to allow users to lean into their creative freedom. The blog post announcing the model noted that it limits what can be created when real people are in the context, including "particularly robust safeguards around nudity and graphic violence." I can't tell if there is a practical use case for this feature, but it is a notable change I needed to try out for myself. When I tried to create an image of Mickey Mouse, it said it couldn't due to copyright implications, so it seems not all public figures are fair game. Overall, the GPT-4o image generator is a big win over the DALL-E models and perhaps among the best of the many I've tested. Is it worth the $20 per month? If you are just interested in high-quality image generation, there are still free versions you can explore that are really capable, such as Adobe Firefly or Google's Imagen 3. Also: The best AI image generators: Tested and reviewed Having said this, if you are a frequent ChatGPT user, the upgrade to ChatGPT Plus gets significantly more enticing. With this upgrade, you will have access to all of OpenAI's latest and greatest chatbot features, as well as high-quality image and video generation, all for $20 a month, which is not a bad deal, especially considering other offerings on the market. For example, Midjourney's subscription starts at $10 per month and only offers image generation.
[4]
ChatGPT's AI image generator just got a huge upgrade -- here's 7 incredible examples of what it can do
In its latest update, OpenAI loaded ChatGPT-4o with one key new feature: an upgraded image generator. The chatbot is now better, smarter, and more aware of context when it comes to making images. The model can now use full conversations to generate images, understanding the context to design the image you want. It can also create better text in images, focus on smaller details, and is more lifelike than ever. While it hasn't been made available to those on ChatGPT's free plan, users on the Pro and Plus accounts have been given full access to the new tool's image abilities. With access to the image generator being placed into the hands of the public, the internet has been set alight with new images. Some are strange, some highlight the flaws of the model and some are entirely terrifying. However, for the most part, the artwork coming out of this new model is really impressive, highlighting the major improvements that have been made. Here are some of my favorites: A quick search through X will show you exactly what everyone's first image generation has been: Studio Ghibli. Whether it's turning memes into the art style of the famous anime house or transitioning selfies into the style, it has become the go-to test for the abilities of ChatGPT's latest model. There are plenty of examples out there but we especially like these examples of famous memes in the style of Studio Ghibli. To get this art style, the user @Ranlarovich suggests the prompt "restyle image in Studio Ghibli style, keep all details". Everyone's favorite office-based sci-fi dystopian TV show (and probably the only one, really) has been getting the AI image treatment. While many have jumped to turning the characters into a variety of different art styles, this was our favorite version. Using the prompt "Create a book cover for the TV show Severance, make it like it's from 1973 and slightly worn", X user @fofrAI got two versions. Both clearly hit the criteria, showing worn books that highlight the themes of the show. The second version even includes the actor's names. From book covers and anime-style memes to hyper-realistic images, the diversity of what has come out of this latest update is impressive. This image, originally published by the OpenAI team themselves, highlights the model's abilities for realism. Given the prompt "A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a t-shirt with a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection." What is most impressive about this image is the details. The t-shirt has the OpenAI logo and is printed within a crease of the t-shirt. All of the text on the whiteboard is readable (a problem AI images often have is illegible text) and in the reflection, you can see the bridge and photographer. A follow-up image, created with the prompt "selfie view of the photographer, as she turns around to high-five him" is equally impressive. It shows the whiteboard (with the same text, demonstrating the model's new memory abilities), along with the two people high-fiving. You can still see the bridge in the reflection, allowing a reflection of the two people high-fiving. The only small issue is that the hands don't connect properly for the high-five. Compared to the hyperrealism above, this one is pretty simple. Given a product (a metal wallet by Ridge) and the prompt "Create a fun cartoon ad using this image", user @jacob_posel used ChatGPT to turn the wallet into a walking character carrying some money. It has a slogan, and a clear art style, and even keeps the exact shape, style, and branding of the product. This is also another example of the improvements made to text generation in images in this latest model. The other images in this list demonstrate ChatGPT's ability to be creative or its dedication to realism. This one shows a different kind of skill. The user @adonis_singh gave the prompt: "Create an image: a screenshot of a Wikipedia page for cats with images and proper explanations for how cats work". In response, a screenshot is given, replicating a Wiki page. The impressive part here is the combination of text, in the style and font that Wikipedia uses, paired with images of cats, both in diagrams and life-like images. This is more a testament to the model's ability to process multiple concepts at the same time. Following the trend of photorealism, X user @minchoi utilized the AI tool to turn famous pieces of artwork into realistic images of people. The Girl with a Pearl Earring turned into a lifelike person while the Mona Lisa was redesigned as a woman from modern times. What's especially impressive here is the attention to detail, bringing in lines in the skin, details in clothing, and accurate representation of color. Some ChatGPT users are turning famous pieces of art into realistic images... others are recreating memes on 1980s computers. The prompt used was "Create an image of a 1980s beige computer with a green monochrome monitor. On the screen is ASCII art of the attached image". On the screen is the famous meme known as "Success Kid". For those unaware, it's a child celebrating with a very determined look on his face.
[5]
ChatGPT 4o image generation is so good we will never be able to trust iPhone renders (and photos) again
Thanks, Sam Altman, for giving us access to ChatGPT's new integrated image-generation skills. They're, as Steve Jobs might've described them, insanely good. So good, in fact, that I'm worried now about my little corner of the universe where we try to discern the accuracy of renders, models, and pre-production leaks that might tell us the tale of future Apple products, like the rumored iPhone 17 Air. For those who don't know, the iPhone 17 Air (or Slim) is the oft-talked-about but never-confirmed ultra-slim iPhone 16 Plus/iPhone 16e/SE hybrid that could be the most exciting iPhone update when Apple likely unveils a whole new iPhone 17 line in September. While it may not be the most powerful iPhone, it should be the biggest and thinnest of the bunch. Even the single rear camera might not be enough to keep potential buyers away. Imagining what it could look like, well, that's my job. Or it was until I started working with ChatGPT running the recently updated 4o model, which is capable of generating images out of thin air or based on photos and images you upload into it. It's a slightly methodical model, taking up to 45 seconds to generate an image that flows in slowly, almost one microscopic, horizontal line of pixels at a time. The results are something to behold. It's not just the quality but how ChatGPT can maintain the thread and cohesion of images from prompt to prompt. Usually, if you start with image generation in something like OpenAI's Dall-E or, say, X's Grok, it'll do a good job with the first image. However, when you request changes, elements of the original disappear or end up altered. It's even harder to create a series of images that appear to be part of the same story or theme. There are usually too many differences. ChatGPT 4o image generation appears different and, possibly, more capable. Having already experimented a bit with the model shortly after Altman and other OpenAI engineers announced it, I quickly found that ChatGPT 4o did its best work when you started with a solid source. I initially had fun turning images of myself and even photos I took this week of a peregrine hawk into anime. However, I was curious about ChatGPT's photo-realism capabilities, especially as they relate to my work. Apple announced this week that WWDC 2025's keynote would fall on June 9. It's an event where the tech giant outlines platform updates (iOS, iPadOS, macOS, etc) that inform much of how we think about Apple's upcoming product lineup. With information like this, we can start to map out the future of the anticipated iPhone 17 line. Visualizing what that will look like can be tough, though. So, I decided to let ChatCPT's newest image model show me the way. Since the iPhone 17 Air would conceivably be the newest member of the iPhone family (shoving aside the less-than exciting iPhone 16e), I decided to focus on that. Initially, I handed ChatGPT an older iPhone SE review image with this prompt: "Use this photo to imagine what an Apple iPhone 17 Air might look like. Please make it photo-realistic and a nice, bright color." ChatGPT did a good job of maintaining the settings from the original photo and most of my hand, though I think I lost a finger. It did well updating the finish and even added a second camera, making it part of a raised camera bump. I followed with this prompt: "This is good. Since the iPhone 17 Air is supposed to be super-thin, can you show it from the side?" ChatGPT lost the background and made the image look like an ad for the iPhone 17 Air. It was a nice touch, but the phone didn't look thin enough. I prompted ChatGPT to make it thinner, which it did. This was progress, but I quickly realized my error. I hadn't based the prompt on available iPhone 17 Air rumors, and maybe I wasn't being prescriptive enough in my prompts. Since the iPhone SE is now a fully retired design, I decided to start over with a review image of the iPhone 16 Pro and initially used the same prompt, which delivered an iPhone 16 Pro in a lovely shade of blue. This time, when I asked to see the thin side of the phone. I told ChatGPT, "Don't change the background." I was pleased to see that ChatGPT more or less kept my backyard bushes intact and seamlessly inserted the new phone in something that now sort of looked like a more attractive version of my hand. Some iPhone 17 Air rumors claim the phone might have just one camera, so I told ChatGPT to remove two cameras and rerender. In previous prompts, I'd told ChatGPT to "make it thinner," but what if I gave the chatbot an exact measurement? "Now show me the side of the iPhone 17 Air. It should be 5.4mm thick and the same color." This was almost perfect. I did notice, though, that there was no discernable camera bump, which seems unlikely in a 5.4mm-thick iPhone. Even the anticipated ultra-thin Samsung Galaxy S25 Edge features a camera bump. There is no way the iPhone 17 Air will get away without one. Finally, I asked for a render of the screen: "Now show me the iPhone 17 Air screen. Make sure it shows the Dynamic Island. The screen should be bright and look like an iPhone home screen with apps and widgets." Once again, ChatGPT did an excellent job, except for an "iOS IAir" label just above the dock. The rest of the App Icon labels are perfect, which is impressive when you consider the difficulty most image generation models have with text. ChatGPT doesn't produce images with AI watermarks; only the file names tell you these are ChatGPT images. That's concerning, as is the exceptional quality. I expect the internet will soon be flooded with ChatGPT iPhone and other consumer electronics hardware renders. We won't know what's a leak, what's a hand-made render, or what's direct from the mind of ChatGPT based on prompts from one enterprising tech editor.
[6]
I compared ChatGPT's new image generator to DALL-E 3, and it's an astonishing improvement, if you have the patience
The mania for AI tools often centers around image generators for the obvious reason that they are, by definition, more visually interesting to play with and demonstrate. OpenAI recently dropped a new image creator inside ChatGPT, showcasing that fact. The new model is not an upgrade to DALL-E 3, the standard AI image creator from OpenAI, but an entirely new technology. Not to give away too much early in this article, but yes, the new image creator makes some impressive art. It takes some time to produce- a couple of minutes sometimes- compared to the 30 seconds or less from DALL-E, but the results speak for themselves. It's good to the point of being problematic, in fact. It mimics the style of human artists to a degree that feels too close. Irrespective of that, I decided to match the two up in a few prompt comparisons. Here's how it went, with DALL-E 3's images on the left and ChatGPT's new generator making the one on the right. The first thing I wanted to test was whether either model could nail a classic AI Achilles' heel: readable text in images. So I asked for: a street sign in New York City that says, "Welcome to the Future." Both managed to get the text of the sign right, but DALL-E's New York didn't look nearly as real as ChatGPT's. Plus, the other signs in the ChatGPT image were spelled correctly, while the One Way sign from DALL-E wasn't quite right. Next up was a test how each model handled the challenge of merging two very different animals: a lion and an eagle. The idea was to get something regal, something mythic. My prompt was: "Make a hybrid creature that combines features of a lion and an eagle, perched majestically on a mountain peak." DALL-E had a pretty good landscape, and the animal looked fairly realistic, but it was mainly a lion with wings. It also had some random feather strips and a weird tail. ChatGPT made a creature that looks like a painting of a griffin from an alternate world natural history museum. Even the coloring blended, and the musculature of the wings actually looked like they would fold onto the creature's back successfully. After the unpleasantness of the Ghibli mimicry, I wanted to emulate an artist who is long gone, Raphael, but with an event he would never have painted. I asked for "A depiction of scientists unveiling a groundbreaking invention, painted in the style of Raphael." ChatGPT responded with an image that looked like a sci-fi Renaissance depiction of the invention of the light bulb, with people not dissimilar from what you'd find in the homes of rich people five hundred years ago, minus the electricity. DALL-E 3 had a more spectacular representation of the same kind of concept. It's hard to tell if it's exactly like Raphael, but it is Rennaisance-esque, at least. And, honestly, a more fun vision of the idea. After the artistic style mimicry, I decided to get very distinct and historical. Recreating something as specific as the Wright brothers' first flight is no small task. I wanted a scene that felt like a documentary photo. I asked the two to "Make a photo of the Wright brothers' first flight at Kitty Hawk, with the aircraft in mid-air and spectators watching." ChatGPT gave me a very odd airplane not very similar to the real first flight, and frankly, the crowd and landscape veered into the surreal. ChatGPT made a very impressive imitation of a photo, with spectators who look like real people and the correct number of passengers in the first plane (one). It's worth noting that I was only looking at image generation here. You can also perform impressive image edits on photos you upload to ChatGPT, which you can't do with DALL-E, but that's a whole different subject. ChatGPT's new image generator is amazingly creative and good at following your intent in its images. That led to things like the Ghibli controversy and other questions about artistic ethics. Besides that, it's the clear winner in every matchup. On the other hand, it takes approximately five times as long to make an image, and it only does one at a time. DALL-E makes good images quickly and two at a time. It also doesn't have the limits I discovered with ChatGPT, where I had to wait for eight minutes to start making images again at one point, despite being a ChatGPT Plus subscriber. If I want to impress someone with AI image-making, though, it's ChatGPT all the way. The winner: ChatGPT
[7]
Review: OpenAI's New Image Generator Is Great Again - Decrypt
OpenAI has just overtaken the AI image generation race once more. The tech giant's integration of native image generation directly into ChatGPT via its GPT-4o model is not an incremental change but a Within hours of its release yesterday, the model quickly went viral, with anime-style creations flooding social platforms and showcasing technical capabilities that leave DALL-E 3 in the dust. The new model can easily compete against dedicated image-generation platforms while eliminating traditional workflow barriers. The $20 monthly ChatGPT Plus subscription now delivers a comprehensive creative ecosystem that would previously require multiple specialized tools and subscriptions. Prompt: A high-resolution photograph of a bustling city street at night, neon signs illuminating the scene, people walking along the sidewalks, cars driving by, a street vendor selling hot dogs, reflections of lights on wet pavement, the overall style is hyper-realistic with attention to detail and lighting, a neon sign says "Decrypt." Our urban nightscape challenge -- requiring sophisticated light physics, crowd rendering, and architectural precision -- revealed distinct performance profiles across competitors. ChatGPT delivered impressively vibrant environments with neon signage, creating rich reflections across meticulously rendered wet pavement. While excelling in crowd dynamics and element inclusion, the minor perspective inconsistencies occasionally betrayed its synthetic nature. The lighting was also goo,d but sometimes veered into theatrical rather than naturally urban. It also was not the best at reflections, but this is something that only the most picky ones would catch. It also generated legible neon signs besides the "Decrypt" one, which also adds to the realism. Reve is for us the winner through good light physics modeling, particularly the subtle interactions between neon sources and reflective surfaces. Its cinematic framing and atmospheric elements (steam wisps, motion blur) created superior dimensional authenticity. However, it reduced crowd density, which was a clever hack since it didn't have to generate a lot of faces, making it harder to spot unrealistic details. The system prioritized mood over literal prompt adherence. Freepik Mystik (Flux) interpreted our prompts through a different lens and was the model that deviated the most from the realistic style. It mixed Asian with Western lettering, generated different Decrypt signs instead of just one, and suffered from technical limitations in human rendering and dimensional depth. Its reflective surfaces lacked the physical accuracy displayed by ChatGPT. Winner: Reve narrowly secured the realism crown through superior rendering of complex lighting interactions. ChatGPT established itself as a remarkably close second, particularly impressive given its integration within a broader multimodal system rather than a specialized image generator. Prompt adherence and spatial awareness Prompt: A dog with a red hat standing on top of a TV showing the word 'Decrypt is the best Crypto+AI media site in the world' on the screen. On the left there is a blonde woman in a business suit holding a coin, on the right there is a robot standing on top of a first aid box, a green pyramid stands behind the box,. The overall scenery is surreal. A cat is standing upside down on top of a white soccer ball, next to the dog. An Astronaut from NASA holds a sign that reads "Emerge" and is placed next to the robot. Keep a widescreen format. How intricate could instructions become before systems failed to render elements in their specified relationships? This is what we wanted to test here, so realism, beauty, or other aspects were not as critical. Current models are so good at prompt adherence that we need to tweak our testing prompts. We progressively increased complexity in our prompt until reaching a surrealist composition requiring precise placement of over 25 distinct elements. All the other models failed in previous stages ChatGPT demonstrated extraordinary prompt fidelity, accurately rendering 23 of 25 specified elements in their correct spatial relationships. The achievement represents unprecedented prompt comprehension, like watching an experienced artist transform detailed verbal instructions into nearly perfect visual execution with only minor deviations. For those picky enough, the only two major bugs we found were the cat not being upside down and the green color spilling from the pyramid to the first aid kit. Freepik Mystik showed significant comprehension degradation, correctly rendering approximately half the requested elements while misinterpreting spatial relationships and modifying key components. It was the model that failed the test first. The colors spilled to different elements of the composition (the red hat generated a red TV and a red wall), and the concepts also spilled -- the dog on the TV spilled to generate an astronaut dog, for example. Reve demonstrated poorer prompt fidelity than ChatGPT but better than Flux. It fundamentally reimagined the composition with good enough adherence to instructions. Still, it introduced unauthorized elements that completely transformed the requested scene -- this AI that prioritizes its aesthetic vision over literal instruction following. It generated a black background, the cat was not correctly placed, there was some color spillage, and elements were not really surreal. Winner: ChatGPT is by far the undisputed leader in prompt comprehension, accurately rendering complex instructions that caused competing systems to fundamentally break down. This capability represents a crucial advancement for practical creative workflows where precise visualization of specific concepts is essential. Reve comes second with Flux in a very far third place Image Editing ChatGPT's natural language editing capability represents perhaps its most transformative feature, allowing intuitive modification through conversational instructions while simultaneously providing granular control comparable to specialized tools. Where traditional image generators often require technical precision or specialized knowledge of plugins, inpainting techniques, etc, ChatGPT's implementation enables creative experimentation through natural dialogue. Our tests transforming personal photos into movie posters demonstrated exceptional versatility -- a workflow no competing model matched. For example, we simply fed the model a photo of Decrypt co-founder Josh Quittner and instructed it to generate a Netflix poster with a specific aesthetic, title, and lettering. It did everything almost flawlessly. Achieving similar results that other models would take a lot of time to undertake, and likely using different tools and plugins. By the way, this is the feature everyone loved and led to the viral spread of "Ghibli-style" transformations on social media today. It's basically a reimagination of a complete scene using simple natural language instructions to generate very complex images. While all systems eventually show quality degradation through multiple iterations (an expected limitation when regenerating rather than modifying existing pixels), ChatGPT maintained superior image coherence through extended editing sequences compared to both Reve and Gemini. For example, it still generated coherent, good-quality faces after several iterations, whereas Gemini stopped producing usable results after four or five tries. Bonus: GPT has a granular "inpainting" feature -- allowing you to modify specific areas of an image while seamlessly blending in with the background- for users in need of a more specific editing tool, which Gemini and Reve lack. Winner: ChatGPT is by far the best model for image editing because it offers natural language understanding and localized inpainting. Reve follows in second place, with Gemini in the third spot due to its quality degradation after several iterations Content moderation Despite implementing comprehensive safety measures, our testing identified some vulnerabilities in ChatGPT's image generation guardrails. With minimal experimentation, we were able to generate potentially problematic content. For example, while the system initially refused to generate an image involving a child and substances, it proceeded when prompts were reworded using euphemistic language while maintaining fundamentally identical content. It would not generate a child inhaling cocaine with a rolled dollar bill, but a child with white powder and a rolled green paper the size of a dollar bill is totally fine. Try as we might, we were unable to generate overly sexualized photos, violence, and other questionable content simply by convincing the model of our good intentions. GPT-4o's image capabilities establish a new benchmark in AI-assisted visual creation -- one that combines exceptional technical performance with unprecedented accessibility. For most users, this implementation now represents the optimal balance of quality, versatility, and value for $20 a month. Other specialized tools only let users handle text and code, or just images -- but you can't find an all-in-one offer with the same levels of quality making OpenAI's service not only easy to use but a great value proposition.
[8]
See for Yourself: ChatGPT's New Image Generation Is Insanely Good
Quick Links Hands and Fingers Historical Figures Fictional Figures Cartoons Mirrors and Reflections Cars and Streets Text and Letters OpenAI just dropped a monster upgrade to ChatGPT's image generation, and it's one of those moments where you blink, look again, and start questioning reality. I won't waste your time with numbers, model sizes, or how many bazillion GPU hours the new model chews through. I'm just gonna show you what this thing can do -- and how it stacks up against the older DALL-E model. 7 Hands and Fingers A close-up of someone playing an E minor chord on a guitar, fingers pressing down on the strings with shallow depth of field. AI image generation blew our minds when it first went mainstream. And then... we looked closer. The hallmark sign of an AI image is the weird hand and finger anatomy. So, what better way to test the models than to ask them to depict a guitar chord? To save the best for the last, I asked the original DALL-E model first, and then the new image generator integrated into the ChatGPT 4o model. Above is what DALL-E came up with. Despite DALL-E's shortcomings, it actually handled the fingers and general anatomy decently here. But the chord itself ... not so much. The hand's positioned way too high on the fretboard to be playing E minor. If you zoom in a bit, you'll catch that the guitar has more than seven strings. The spacing between the strings is also all over the place. With that in mind, let's move on to ChatGPT 4o. I could've told you I'm joking and that this is actually an old photo from back when I played guitar. ChatGPT 4o is that good. Six strings, evenly spaced, and the chord is actually E minor. I'm impressed. 6 Historical Figures Albert Einstein eating an ice cream in Central Park, wearing a casual shirt and suspenders. Now that we've gotten our hands (and fingers) dirty, let's mess with some faces. I figured we'd try historical figures since they won't get offended, and it would be fun to see them in a modern setting. A total letdown. To be fair, DALL-E did warn me it couldn't use Einstein himself and would go with someone "closely resembling" him instead. One of DALL-E's classic tells is its cartoonish-yet-realistic style, which shows up in full force here. The San Remo in the background does hint that this is Central Park, but that's about the only win here. Moving on to ChatGPT 4o. Slap a black-and-white filter on it, and I could've convinced you it's a real vintage photo. The cream on the cone looks properly creamy, Albert's rocking his signature nonchalant vibe, and the San Remo is still back there, standing tall. Everything checks out. ChatGPT 4o nailed it. 5 Fictional Figures A figure similar to a Sith Lord calling for a taxi in George Square, Glasgow, with light rain and traffic lights in the background. By now we've seen that ChatGPT can paint historical figures pretty well. Since faces and people are still one of the best ways to stress-test an AI, let's try some more. I went with "similar" to get the bot to cooperate without hitting me with the copyright speech. DALL-E's result is okay. The figure does remind you of a Sith, and the rest of the elements are more or less accurate. There's nothing explicitly cartoonish about it, but it just doesn't feel real. Want real? Check out what ChatGPT 4o produced with the same prompt: I love the atmosphere -- the lighting, the drizzle, the brooding Sith lord presence. It's all there. The only problem is that our dark lord is standing in the street calling a taxi while facing... the sidewalk. Oh, and the taxi sign says "TAXL." Let's pivot from future fiction to historic fiction. Something like: A character similar to Geralt of Rivia shopping for groceries in a modern supermarket, pushing a cart and frowning at canned goods. Not bad at all. The image still carries that synthetic cartoony vibe and the text on the cereal boxes is total gibberish, as expected. ChatGPT 4o initially refused the prompt because of copyright -- but it worked once I swapped "similar to" with "resembling." Behold: I'm speechless. Like most people, ChatGPT's interpretation of Geralt is basically just Henry Cavill, not the video game version -- but, it nailed it. The scowl is on point, and the setting feels natural. This could pass as a shot from the set of a weird crossover ad. And yes, I read The Witcher books before the show was a thing. 4 Cartoons A cartoon-style pirate captain with a long red coat and a cybernetic arm, laughing on the deck of a flying ship. Transparent background. OpenAI's image generation isn't limited to realism. While DALL-E always leans a bit airbrushed no matter what you throw at it, I decided to push both models into full cartoon mode. DALL·E actually did a solid job here -- and it even understood the request for a transparent background. Sort of. What we got was the classic gray-and-white checkerboard pattern that usually means transparent... except here, it's baked into the image. So, not transparent at all. Also, ironically, our AI pirate's biological hand has four fingers while the cybernetic one has five. Maybe he chromed the wrong arm? ChatGPT 4o's version feels sharper and more deliberate. The coloring style is different -- whether it's better or not is subjective -- but it clearly looks like an artist meant to draw it that way. The background is also actually transparent. You could slap this on a T-shirt, print it out, or even turn it into a WhatsApp sticker on the spot. 3 Mirrors and Reflections A modern bathroom sink with a toothbrush and razor on the counter, both visible in the mirror and real-world view -- lighting is soft and even. Mirrors reflect -- and reflections need spatial logic to look natural. I threw out a prompt I knew DALL-E would fumble. As expected. Something is trying to be a reflection from the faucet in the mirror, but it's way too long. The toothbrush is levitating, inside the sink, and casting no reflection. DALL-E really strapped on its AI helmet for this one. The newer model does a much better job of making the image feel real, like an actual photograph. The faucet's reflection is a little skewed but passable. Then there's the toothbrush, which has a reflection but doesn't exist in the physical world -- like a reverse vampire. No clear winner here. AI results are inconsistent, so I gave both another shot with something a little more ambitious: A woman standing in front of a full-length mirror in a sunlit bedroom, her outfit and pose mirrored exactly, with visible reflection of the window behind her. ... I don't even want to dignify this one with an analysis. Folks, If you want to make DALL-E look bad, just toss the word "mirror" into your prompt. Moving on. As expected, ChatGPT 4o's version looks a lot more realistic -- but maybe a bit surreal this time? The woman's pose and outfit are mirrored, but only partially, like a Photoshop 3D pop-out effect. The reflection angles are also off. AI still can't handle spatial logic. 2 Cars and Streets A 2006 Ford GT and a Peugeot 206 behind a red traffic light on Wall Street, New York, midday. I'm a car enthusiast. When AI image generators first hit the scene, one of the first things I tried was making photos of cars. The results back then weren't good, but with the new model out, I had to give it another shot. There goes DALL-E again with its increasingly annoying cartoon aesthetic. The Peugeot is on the sidewalk, the traffic lights I asked for are facing the buildings, and the plate numbers are all gibberish. ChatGPT 4o's results are significantly better. The cars are properly depicted -- even the Peugeot's wheel cover is spot-on and era-correct. That kind of detail isn't accidental. But it gets even better: I could actually use this one as my phone wallpaper. The lighting, the composition, the reflections -- it all checks out. Other than the weird emptiness of the street, this could straight-up pass for a real photo. 1 Text and Letters A handwritten letter on aged paper with cursive script, resting next to a fountain pen and an ink bottle. Finally, we aim at the Achilles' heel of every image generator. Most image generator AIs struggle to get text right. By now, you've seen enough gibberish from DALL-E in the earlier examples to know what I mean. To make it more interesting -- and consistent -- I added that the letter should contain the text of King Terenas' speech to Arthas from Warcraft III. DALL-E did what it does best with text: turned it into smudgy, unintelligible gibberish. It managed to get some words right, and the atmosphere works -- the pen and ink bottle look solid. ChatGPT 4o nails it -- every single word, in clean cursive script. Letter-perfect. Compared to DALL-E, this is a massive leap forward. Hats off, OpenAI. Related These 6 AI Photo Editors Are Better Than Photoshop: Here's Why Ever since the AI boom, you no longer need expertise or big bucks for Photoshop. Check out these AI alternatives instead. Posts AI image generation has come a long way -- and it shows. ChatGPT 4o feels like the first model that genuinely gets it when it comes to lighting, texture, and context. At this point, the only real question left is: how strong are ChatGPT's safeguards? I easily got past its copyright restrictions. How long before someone jailbreaks ChatGPT and starts generating whatever content they want using this absurdly capable model?
[9]
Easily Create Stunning Visuals with ChatGPT's New 4o Image Generator
OpenAI has unveiled the ChatGPT Image Generator, a innovative tool designed to replace the previous integration with DALL-E. This innovation represents a significant step forward in AI-powered image creation, offering enhanced photorealism, improved alignment with user prompts, and a wide range of creative applications. However, it is currently limited to paid accounts and has areas for improvement, such as processing speed and facial replication accuracy. Despite these limitations, the tool is a promising resource for professionals and hobbyists alike. If you've been following the evolution of AI image generation, you might already be familiar with DALL-E, OpenAI's previous tool for creating visuals. While DALL-E had its strengths, it often fell short in areas like facial accuracy and stylistic precision. The ChatGPT 4o Image Generator, however, takes a significant leap forward, addressing many of these shortcomings while introducing new possibilities for creative professionals and hobbyists alike. Whether you're a marketer, designer, or just someone who loves experimenting with visuals, this tool promises to unlock a world of creative potential. The move from DALL-E to the ChatGPT Image Generator reflects OpenAI's commitment to advancing its technology for greater usability and precision. While DALL-E 3 struggled with challenges like realistic human faces and stylistic details, the new tool addresses these shortcomings. It delivers more lifelike visuals and adheres more closely to user instructions, making sure that creative ideas are realized with greater accuracy. This transition highlights OpenAI's focus on refining its tools to meet the evolving needs of users, making the ChatGPT Image Generator a more reliable and versatile solution for diverse creative projects. The ChatGPT Image Generator is a versatile tool designed to cater to a broad spectrum of creative needs. Whether you are a professional designer, marketer, or hobbyist, its capabilities can significantly enhance your projects. Some of its most notable applications include: These features make the ChatGPT Image Generator a valuable asset for tasks ranging from marketing and content creation to personal projects, offering users a powerful tool to bring their ideas to life. Take a look at other insightful guides from our broad collection that might capture your interest in AI Image Generators. The ChatGPT Image Generator offers several improvements over its predecessor, DALL-E 3. Its ability to produce photorealistic images and adhere more closely to user prompts sets it apart. For example, DALL-E 3 often struggled with rendering realistic human faces and capturing stylistic nuances, whereas the new tool delivers more accurate and visually appealing results. These enhancements make it particularly useful for projects requiring high levels of detail and precision, such as product designs, marketing materials, and professional-grade visuals. By addressing some of the key limitations of DALL-E 3, the ChatGPT Image Generator positions itself as a more reliable and effective solution for creative professionals. While the ChatGPT Image Generator offers significant advancements, it is not without its limitations. Users should be aware of the following challenges: These limitations highlight areas where further refinement is needed. OpenAI's ongoing updates are expected to address these challenges, making the tool even more robust and user-friendly in the future. Currently, the ChatGPT Image Generator is available exclusively to paid subscribers, including those on Plus, Pro, and Teams plans. Free-tier users do not have access to this feature, which limits its reach among casual users. This exclusivity positions the tool as a premium offering, catering to individuals and businesses that require advanced image generation capabilities. For professionals and organizations willing to invest in innovative technology, the ChatGPT Image Generator offers a compelling solution for creative and design needs. OpenAI is actively working to enhance the ChatGPT Image Generator, with updates expected to address its current limitations. Anticipated improvements include faster processing speeds, better facial replication accuracy, and expanded functionality to support more complex creative tasks. These advancements will likely solidify the tool's position as an indispensable resource for creative professionals, businesses, and hobbyists. As the technology evolves, the ChatGPT Image Generator is poised to become a cornerstone of AI-driven design, allowing users to push the boundaries of their creativity with greater ease and precision.
[10]
Unlock Studio Ghibli Magic with ChatGPT's Image Creator
Have you ever found yourself staring at a blank canvas, struggling to bring your creative vision to life? Whether you're a designer trying to perfect a storyboard, a marketer crafting the perfect ad, or an educator creating engaging visuals, the process can often feel overwhelming and time-consuming. Enter ChatGPT-4's image creator -- a tool that promises to transform how we approach visual design. With its ability to transform rough ideas into polished visuals and its knack for handling everything from character consistency to intricate text rendering, this AI-powered tool is quickly becoming a fantastic option for creative professionals across industries. But, like any tool, it's not without its quirks. While its features open up exciting possibilities, understanding its strengths and limitations is key to unlocking its full potential. This guide by Cutting Edge School dives into the ins and outs of ChatGPT-4's image generation capabilities, offering a practical guide to help you navigate its features, avoid common pitfalls, and integrate it seamlessly into your workflow. Whether you're a seasoned pro or just dipping your toes into the world of AI-assisted design, this tutorial will equip you with the insights you need to stay ahead in an ever-evolving creative landscape. ChatGPT-4's image creator is equipped with a variety of features designed to cater to diverse creative needs. These features provide users with tools to streamline their design processes while maintaining high-quality outputs: These features make the tool adaptable for both simple and complex projects, empowering users to produce professional-grade visuals efficiently. ChatGPT-4's image creator distinguishes itself from competitors through its advanced capabilities and user-centric design. These unique strengths make it a valuable tool for professionals seeking both efficiency and creativity: These features position ChatGPT-4 as a powerful tool for professionals who value precision, adaptability, and creative freedom in their workflows. Take a look at other insightful guides from our broad collection that might capture your interest in ChatGPT-4 image generation. The versatility of ChatGPT-4's image generation capabilities makes it a valuable asset across a wide range of industries. Its ability to adapt to different creative needs allows professionals to enhance their productivity and output quality: These applications highlight the tool's potential to transform workflows and enable professionals across industries to achieve their creative goals more efficiently. While ChatGPT-4's image creator offers impressive capabilities, it is important to recognize its limitations to set realistic expectations and use the tool effectively: Understanding these constraints can help users plan their projects more effectively and identify scenarios where supplementary tools or manual adjustments may be necessary. The rise of AI-powered tools like ChatGPT-4 is reshaping the creative landscape, offering both opportunities and challenges for professionals. Adapting to this shift requires a proactive approach to learning and innovation: By embracing these changes, creative professionals can position themselves for success in an AI-driven world and unlock new opportunities for growth and innovation. The future of AI in design and multimedia is filled with potential, with several trends shaping the trajectory of the industry. These developments highlight the growing role of AI in creative workflows: These trends underscore the fantastic potential of AI in shaping the future of creative industries, offering new opportunities for innovation and collaboration. As AI continues to evolve, it is essential for creative professionals to embrace these tools and adapt to the changing industry landscape. By focusing on continuous learning, collaboration, and maintaining a growth mindset, you can unlock new opportunities and thrive in the age of AI-powered creativity.
[11]
ChatGPT vs Canva : Who Wins the Design Battle?
For many of us, tools like Canva have been a lifesaver, simplifying the design process and making creativity more accessible. But what if there was a way to take that ease a step further -- something that could not only generate stunning visuals but also understand your needs through a simple conversation? Enter ChatGPT's new image generation update, a feature that promises to redefine how we approach design. This latest upgrade, available on OpenAI's $20/month plan, brings a fresh twist to visual creation by combining AI-driven precision with conversational simplicity. Imagine describing your vision -- whether it's a polished Facebook ad, an engaging infographic, or a sleek e-commerce banner -- and watching it come to life with minimal effort. By integrating advanced text placement, real-time editing, and iterative refinement, this tool offers a streamlined alternative to traditional design platforms. But could it really rival industry giants like Canva? No Code MBA looks into the details and explores whether this innovation is a fantastic option or just another tool in the crowded design landscape. With its ability to produce professional-grade marketing materials, educational visuals, and social media content, this feature raises an important question: can it compete with established platforms like Canva? Below, we explore its features, practical applications, and potential impact on the design landscape. One of the most notable advancements in ChatGPT's image generation capabilities is its ability to integrate text into visuals with precision. Unlike traditional design tools that often require manual adjustments, this AI-driven system automates text placement, making sure it aligns harmoniously with the overall composition. For example, if you're designing a Facebook ad, you can provide the model with a tagline and product description. The result is a polished visual where the text is not only legible but also aesthetically balanced. This feature is particularly beneficial for professionals who need to create high-quality designs quickly and for beginners who may lack advanced design skills. By eliminating the need for manual adjustments, the tool streamlines the design process, allowing users to focus on their creative vision. ChatGPT's conversational editing feature stands out as a significant innovation. Instead of navigating complex menus or mastering intricate design software, users can simply describe the changes they want to see. Whether it's resizing an element, adjusting colors, or repositioning text, the AI responds to prompts and updates the image in real time. For instance, if your design feels cluttered, you can ask the AI to simplify the layout or emphasize specific elements. This iterative process allows you to refine your visuals step by step, focusing on creativity rather than technical execution. The result is a streamlined workflow that saves time and reduces frustration, making it an appealing option for users with varying levels of design expertise. Take a look at other insightful guides from our broad collection that might capture your interest in ChatGPT Image Generation. ChatGPT's image generation capabilities are versatile, making them suitable for a wide range of applications. Its adaptability allows users to create visuals tailored to specific needs across various industries. Some practical use cases include: These examples highlight the tool's ability to cater to diverse needs, from marketing professionals and educators to small business owners. Its capacity to produce professional-quality visuals with minimal effort makes it an attractive option for users seeking efficiency and creativity. ChatGPT's image generation model delivers impressive results, particularly when handling both generic and specific prompts. The visuals it produces meet professional standards, making it a reliable tool for most design needs. However, there are some limitations to consider: Despite these challenges, the overall performance remains robust. The tool offers a practical solution for users seeking a balance between efficiency and precision, making it a valuable addition to the design toolkit. ChatGPT's new image generation capabilities position it as a noteworthy contender in the design tool market. By automating complex tasks and simplifying the creative process, it appeals to users who prioritize efficiency and ease of use. For small businesses, entrepreneurs, and individuals with limited design expertise, this tool offers a cost-effective alternative to traditional platforms. However, established tools like Canva continue to hold a competitive edge with their extensive template libraries, integrations, and collaborative features. While ChatGPT's update represents a significant step forward, further enhancements in speed, customization, and advanced features may be necessary to fully rival Canva's comprehensive ecosystem. Looking ahead, the potential applications of ChatGPT's image generation model extend far beyond its current capabilities. With ongoing advancements in AI technology, the tool could evolve to support more complex projects, such as magazine layouts, posters, and even interactive designs. As design automation becomes increasingly integrated into various industries, the possibilities for creativity and innovation will continue to expand.
[12]
15 Practical Uses of ChatGPT 4o for Image Generation and Design
Imagine being able to bring your creative ideas to life without needing a full design team, expensive software, or endless hours of trial and error. Whether you're a marketer looking to craft eye-catching ads, a designer aiming to experiment with new styles, or just someone who loves dabbling in visual storytelling, the process of creating high-quality visuals can often feel overwhelming. But what if there was a tool that could simplify all of that -- one that could merge art styles, generate photorealistic images, or even transform rough sketches into polished designs? Enter ChatGPT 4o's image generation capabilities, an innovative technology that's here to make visual creation not only easier but also more accessible and fun. This guide by the AI Grid explores 15 practical ways you can use ChatGPT 4o's image generation features to elevate your projects and spark your imagination. From creating consistent character designs for storytelling to generating infographics that simplify complex information, this tool is packed with possibilities for professionals and hobbyists alike. Whether you're looking to save time, cut costs, or explore entirely new creative avenues, these use cases will show you how ChatGPT can transform the way you approach visual content. Let's explore how this innovative technology can empower your creativity and redefine what's possible in design. ChatGPT 4o enables the seamless blending of multiple art styles into a single, unified design. This capability is particularly beneficial for branding, where creating a unique visual identity is essential. For example, you can combine minimalist and abstract styles to craft a logo that is both distinctive and visually engaging. This feature allows for endless customization, making sure your designs stand out in competitive markets. The process of creating marketing assets becomes faster and more efficient with ChatGPT. By integrating product images with pre-designed templates, you can quickly prototype advertisements tailored to your brand. This functionality is ideal for businesses looking to scale their marketing efforts without requiring extensive resources. It allows marketers to focus on strategy while the tool handles the heavy lifting of design. ChatGPT eliminates the need for costly photoshoots and location scouting by allowing photorealistic product placement in diverse settings. Whether you want to showcase a product on a tropical beach or in a bustling urban café, this feature allows you to create high-quality visuals with minimal effort. It's a fantastic option for advertising campaigns, offering flexibility and cost-effectiveness. Transforming images into different artistic styles is effortless with ChatGPT. Whether you aim to replicate the charm of Studio Ghibli or the geometric simplicity of low-poly art, this tool ensures consistency while offering creative flexibility. This feature is particularly useful for projects requiring a cohesive aesthetic, such as animations, branding, or themed campaigns. Explore further guides and articles from our vast library that you may find relevant to your interests in ChatGPT Image Generation. Simplify complex information by generating visually engaging infographics from text prompts. These infographics are perfect for social media, presentations, or educational materials, making your content more accessible and impactful. By combining clear visuals with concise information, ChatGPT helps you communicate ideas effectively to diverse audiences. Maintaining a consistent visual identity across platforms like Instagram, Twitter, and YouTube is crucial for brand recognition. ChatGPT assists in designing on-brand visuals, making sure uniformity across all channels. This strengthens your digital presence and enhances audience engagement by presenting a cohesive and professional image. Streamline your UI design process by adapting existing layouts to new themes or topics. ChatGPT can generate interface elements, cards, and layouts, saving time and effort while maintaining a polished, professional look. This feature is particularly valuable for app developers and web designers seeking to create user-friendly interfaces efficiently. ChatGPT excels at embedding text seamlessly into images, making it ideal for creating infographics, posters, and educational visuals. This ensures that text and imagery work together harmoniously, enhancing the overall impact of your designs. Whether for marketing or instructional purposes, this feature simplifies the process of combining visuals with clear messaging. For storytelling, animation, or game design, maintaining consistent character designs across multiple images is essential. ChatGPT ensures that characters retain their defining features, allowing cohesive narratives and visual continuity. This capability is particularly useful for projects requiring detailed world-building or serialized content. Apply realistic textures, such as marble, wood, or fabric, to objects or characters with ease. This feature is invaluable for 3D modeling, product design, and other projects requiring lifelike materials. By automating the application of textures, ChatGPT saves time while delivering professional-quality results. Turn rough sketches into polished visuals effortlessly. This capability allows designers to quickly iterate on ideas, transforming initial concepts into refined images suitable for presentations, prototypes, or final designs. It bridges the gap between conceptualization and execution, making the creative process more efficient. Extract elements from images to create transparent assets, perfect for stickers, compositing designs, or multimedia projects. This feature simplifies the process of isolating and repurposing visual elements, allowing greater flexibility in design. It's particularly useful for creating layered visuals or interactive content. ChatGPT can produce visuals so realistic they are indistinguishable from photographs. This opens up exciting possibilities for digital media, but it also raises important discussions about authenticity and ethical considerations. By understanding the implications of this technology, users can harness its potential responsibly. Create eye-catching thumbnails for videos, presentations, or articles in seconds. This feature ensures your content grabs attention at first glance, enhancing engagement and click-through rates. By automating the design process, ChatGPT allows creators to focus on producing high-quality content. ChatGPT encourages playful exploration, such as turning people into cartoon characters or experimenting with surreal concepts. These creative experiments can inspire innovation and uncover new ways to engage with visual media. By pushing the boundaries of traditional design, users can discover unique approaches to storytelling and branding. ChatGPT 4o's image generation capabilities provide a versatile toolkit for professionals and creators across industries. From enhancing workflows to allowing entirely new creative possibilities, this technology is transforming how visual content is produced and used. Whether you're working on branding, marketing, education, or personal projects, ChatGPT enables you to bring your ideas to life with precision, efficiency, and creativity.
Share
Share
Copy Link
OpenAI's new GPT-4o image generation model, integrated into ChatGPT, offers significant improvements in image quality, text rendering, and contextual understanding, challenging competitors and raising concerns about media manipulation.
OpenAI has introduced a significant upgrade to its AI image generation capabilities with the release of GPT-4o Image Generation (4o IG), integrated directly into the ChatGPT interface 1. This new model represents a major advancement in AI-powered visual creation, offering improved quality, accuracy, and contextual understanding compared to its predecessors.
The 4o IG model boasts several notable enhancements:
The new image generation feature is now available to ChatGPT Free, Plus, Pro, and Team users, with Enterprise and Education access coming later 1. It's also accessible within OpenAI's Sora video generation tool 1. API access to GPT-4o image generation is expected within weeks 1.
The release of 4o IG puts OpenAI back in competition with other leading image generation models like Midjourney, Google's Imagen 3, and Adobe's Firefly 3. Early user reports suggest that the quality and capabilities of 4o IG are on par with or surpassing these competitors in many aspects 34.
The improved capabilities of 4o IG open up new possibilities for creators and marketers:
While the advancements are impressive, the release of 4o IG has also raised some concerns:
The release of GPT-4o Image Generation marks a significant milestone in the evolution of AI-powered visual creation tools. As these technologies continue to improve, they are likely to reshape various industries, from graphic design and advertising to journalism and entertainment. However, they also underscore the growing need for discussions around AI ethics, media literacy, and the verification of digital content in an increasingly AI-influenced world 5.
Reference
[1]
[4]
A comprehensive look at the diverse applications of AI in daily life and professional settings, highlighting its impact on creativity, productivity, and personal branding.
3 Sources
3 Sources
An in-depth analysis of various AI image generators, comparing their features, quality, and accessibility for users seeking to create AI-generated art.
2 Sources
2 Sources
As AI technology advances, it offers new tools for enhancing work productivity. However, its application in creative fields like novel writing raises concerns among authors. This story explores the potential benefits and controversies surrounding AI in various industries.
2 Sources
2 Sources
An exploration of ChatGPT's latest GPT-4o model, highlighting its advanced features and persistent limitations in the evolving landscape of AI technology.
2 Sources
2 Sources
OpenAI's new Canvas mode for ChatGPT introduces a more flexible and visual interface for text creation, editing, and task management, enhancing user productivity across various applications.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved