5 Sources
5 Sources
[1]
Apple releases huge AI dataset for image editing research - 9to5Mac
Apple has released Pico-Banana-400K, a 400,000-image research dataset which, interestingly, was built using Google's Gemini-2.5 models. Here are the details. Apple's research team has published an interesting study called "Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing". In addition to the study, they also released the full 400,000-image dataset it produced, which has a non-commercial research license. This means that anyone can use it and explore it, provided it is for academic work or AI research purposes. In other words, it can't be used commercially. A few months ago, Google released the Gemini-2.5-Flash-Image model, also known as Nanon-Banana, which is arguably the state-of-the-art when it comes to image editing models. Other models have also shown significant improvements, but, as Apple's researchers put it: "Despite these advances, open research remains limited by the lack of large-scale, high-quality, and fully shareable editing datasets. Existing datasets often rely on synthetic generations from proprietary models or limited human-curated subsets. Furthermore, these datasets frequently exhibit domain shifts, unbalanced edit type distributions, and inconsistent quality control, hindering the development of robust editing models." So, Apple set out to do something about it. The first thing Apple did was pull an unspecified number of real photographs from the OpenImages dataset, "selected to ensure coverage of humans, objects, and textual scenes." Then, it came up with a list of 35 different types of changes a user could ask the model to make, grouped into eight categories. For instance: Next, the researchers would upload an image to Nano-Banana, alongside one of these prompts. Once Nano-Banana was done generating the edited image, the researchers would then have Gemini-2.5-Pro analyze the result, either approving it or rejecting it, based on instruction compliance and visual quality. The result became Pico-Banana-400K, which includes images produced through single-turn edits (a single prompt), multi-turn edit sequences (multiple iterative prompts), and preference pairs comparing successful and failed results (so models can also learn what undesirable outcomes look like). While acknowledging Nano-Banana's limitations in fine-grained spatial editing, layout extrapolation, and typography, the researchers say that they hope Pico-Banana-400K will serve as "a robust foundation for training and benchmarking the next generation of text-guided image editing models."
[2]
Apple's New AI Dataset Aims to Improve Photo Editing Models
Apple researchers have released Pico-Banana-400K, a comprehensive dataset of 400,000 curated images that's been specifically designed to improve how AI systems edit photos based on text prompts. The massive dataset aims to address what Apple describes as a gap in current AI image editing training. While systems like GPT-4o can make impressive edits, the researchers say progress has been limited by inadequate training data built from real photographs. Apple's new dataset aims to improve the situation. Pico-Banana-400K features images organized into 35 different edit types across eight categories, from basic adjustments like color changes to complex transformations such as converting people into Pixar-style characters or LEGO figures. Each image went through Apple's AI-powered quality control system, with Google's Gemini-2.5-Pro being used to evaluate the results based on instruction compliance and technical quality. The dataset also includes three specialized subsets: 258,000 single-edit examples for basic training, 56,000 preference pairs comparing successful and failed edits, and 72,000 multi-turn sequences showing how images evolve through multiple consecutive edits. Apple built the dataset using Google's Gemini-2.5-Flash-Image (aka Nano-Banana) editing model, which was released just a few months ago. However, Apple's research revealed its limitations. While global style changes succeeded 93% of the time, precise tasks like relocating objects or editing text seriously struggled, with success rates below 60%.
[3]
How Apple Plans to Improve AI Image Editors
The company used Google's Nano Banana model to develop this dataset, as well as Gemini-2.5. Apple might be dead last in the AI race -- at least when you consider competition from companies like OpenAI, Google, and Meta -- but that doesn't mean the company isn't working on the tech. In fact, it seems most of the work Apple does on AI is behind the scenes: While Apple Intelligence is, well, there, the company's researchers are working on other ways to improve AI models for everyone, not just Apple users. The latest project? Improving AI image editors based on text prompts. In a paper published last week, researchers introduced Pico-Banana-400K, a dataset of 400,000 "text-guided" images selected to improve AI-based image editing. Apple believes its image dataset improves upon existing sets by including higher quality images with more diversity: The researchers found that existing datasets either use images produced by AI models, or are not varied enough, which can hinder efforts to improve the models. Funnily enough, Pico-Banana-400K is designed to work with Nano Banana, Google's image editing model. Researchers say using Nano Banana, their dataset can generate 35 different types of edits, as well as tap into Gemini-2.5-Pro to asses quality the edits, and whether those edits should remain as part of the overall dataset. As part of these 400,000 images, there are 258,000 samples of single edits (where Apple compares the original images to one with edits); 56,000 "preference pairs," which distinguishes between failed and successful edit generations; and 72,000 "multi-turn sequences," which walks through two to five edits. Researchers note that different functions had different success rates in this dataset. Global edits and stylization are "easy," achieving the highest success rates; object semantics and scene context are "moderate;" while precise geometry, layout, and typography are "hard." The highest performing function, "strong artistic style transfer," which could include changing an image's style to "Van Gogh" or anime, has a 93% success rate. The lowest performing function, "change font style or color of visible text if there is text," only succeeded 58% of the time. Other tested functions include "add new text" (67% success rate), "zoom in" (74% success rate), and "add film grain or vintage filter" (91% success rate). Unlike many of Apple's products, which are typically closed to the company's own platforms, Pico-Banana-400K is open for all researchers and AI developers to use. It's cool to see Apple researchers contributing to open research like this, especially in an area Apple is generally behind in. Will we actually get an AI-powered Siri anytime soon? Unclear. But it is clear Apple is actively working on AI, perhaps just in its own way.
[4]
Apple's Pico-Banana-400K dataset could redefine how AI learns to edit images
The dataset was built using an automated pipeline powered by Google's Nano-Banana and Gemini-2.5-Pro, eliminating the need for human annotators. Apple has released Pico-Banana-400K, a massive, high-quality dataset of nearly 400,000 image editing examples. The new dataset, detailed in an academic paper posted on October 23, 2025, was built by Apple researchers including Yusu Qian, Jialing Tong, and Zhe Gan. This matters because the AI community has been held back by a lack of large-scale, open, and realistic datasets. Most previous datasets were either synthetic, low-quality, or built with proprietary models. Apple's new resource, which is built from real photographs, is designed to be a robust foundation for training the next generation of text-guided image editing models, from simple touch-ups to complex, multi-step creative projects. Instead of the old, expensive method of paying humans to manually edit hundreds of thousands of images, Apple's team created a sophisticated, automated pipeline using other powerful AI models. . First, they sourced real photographs from the OpenImages collection. Then, they used Google's Nano-Banana model to generate a diverse range of edits based on a comprehensive taxonomy of 35 different edit types, from "change color" to "apply seasonal transformation." But here's the clever part: to ensure quality, they used another AI, Gemini-2.5-Pro, as an automated "judge." This AI judge scored every single edit on four criteria: Instruction Compliance (40%), Seamlessness (25%), Preservation Balance (20%), and Technical Quality (15%). Edits that scored above a 0.7 threshold were labeled "successful." Edits that failed were kept as "negative examples." This process creates a high-quality dataset without a single human annotator, at a total cost of about $100,000. The real power of Pico-Banana-400K isn't just its size; it's the specialized subsets designed to solve complex research problems. The full dataset includes: By analyzing the "success rates" of its own pipeline, the Apple team also created a clear map of what AI image editors are good at and where they still fail. Global edits like "add a vintage filter" (90% success) are easy. Object-level edits like "remove this car" (83% success) are pretty good. But edits requiring precise spatial control or symbolic understanding remain "brittle" and are now open problems for researchers to solve. The hardest tasks? Relocating an object (59% success), changing a font (57% success), and generating caricatures (58% success). By open-sourcing this dataset, Apple is essentially giving the entire AI community a high-quality "gym" to train their models and a clear list of challenges to tackle next.
[5]
Apple Wants to Help the World Build Nano Banana-Like AI Models
Apple's dataset comes with a non-commercial research license Apple researchers have released a large-scale dataset to help others develop image editing artificial intelligence (AI) models. Dubbed Pico-Banana-400K, the dataset contains 4,00,000 real images and their AI-edited counterparts that can be used to train large language models how to handle text-based image editing requests. It is an open-source dataset available with a research-only license, meaning it cannot be used for commercial purposes. Interestingly, the Cupertino-based tech giant's new dataset release comes at a time when it is struggling with native AI models itself. Apple's Pico-Banana-400K Will Help Others Build Image Editing Models A research paper titled "Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing" was published on arXiv, an online journal. The dataset contains roughly 4,00,00 real photo edit pairs, built from OpenImages, organised into a 35-type edit taxonomy and split into single-turn edits, multi-turn sequences and preference pairs. These design choices matter because they shift the training signal from synthetic, narrowly curated examples to instruction-rich, real-world scenarios that resemble what users actually ask for. Pico-Banana-400K was produced by chaining a powerful generative model (Nano Banana) to create edits and another large multimodal model to act as an automated judge, filtering and retrying failed attempts. The result is a dataset emphasising photographic diversity, human-centric scenes and text-heavy shots. The photos also focus on nuance, with long and short instruction pairs to support research work. Additionally, it also includes negative examples and preference pairs, which are crucial for alignment research and for teaching models not just what to do but what "better" looks like. The paper explicitly documents which edit types are robust (style transfers, global photometric changes) and which remain brittle (precise spatial relocations, text replacement on signs), making it unusually candid about limitations. The dataset is currently available on GitHub, and can be used for any non-commercial use cases. Interestingly, Apple has seemingly stalled with the company's in-house AI progress. While it has integrated the Apple Intelligence in more apps and features with the iPhone 17 series launch, the company continues to delay the Siri overhaul which was first announced in 2024.
Share
Share
Copy Link
Apple has released Pico-Banana-400K, a comprehensive dataset of 400,000 curated images designed to improve AI-powered text-guided image editing models. Built using Google's Nano-Banana and Gemini models, the open-source dataset addresses critical gaps in AI training data.
Apple has released Pico-Banana-400K, a comprehensive dataset containing 400,000 curated images specifically designed to advance AI-powered image editing research
1
. The release marks a significant departure from Apple's typically closed approach to AI development, as the company makes this resource freely available to researchers worldwide under a non-commercial research license2
.
Source: 9to5Mac
The dataset addresses what Apple researchers describe as a critical gap in current AI training resources. According to their published study titled "Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing," existing datasets often rely on synthetic generations from proprietary models or limited human-curated subsets, frequently exhibiting domain shifts and inconsistent quality control
1
.In an interesting twist, Apple built this dataset using Google's AI technologies, specifically the Gemini-2.5-Flash-Image model (also known as Nano-Banana) and Gemini-2.5-Pro
3
. The researchers sourced real photographs from the OpenImages dataset, selecting images to ensure coverage of humans, objects, and textual scenes1
.The construction process involved creating a sophisticated automated pipeline that eliminated the need for human annotators, representing a cost-effective approach estimated at approximately $100,000
4
. Apple developed a comprehensive taxonomy of 35 different edit types grouped into eight categories, ranging from basic color changes to complex transformations such as converting people into Pixar-style characters or LEGO figures2
.
Source: MacRumors
The dataset's quality control system represents a significant innovation in AI training data curation. Gemini-2.5-Pro served as an automated judge, evaluating each edit based on four criteria: Instruction Compliance (40%), Seamlessness (25%), Preservation Balance (20%), and Technical Quality (15%)
4
. Edits scoring above a 0.7 threshold were labeled as successful, while failed attempts were retained as negative examples to help models learn from mistakes.The research revealed clear performance patterns across different edit types. Global edits and stylization achieved the highest success rates, with strong artistic style transfers reaching 93% success
3
. However, precise tasks requiring spatial control or symbolic understanding proved more challenging, with font style changes achieving only 58% success and object relocation managing just 59%4
.Related Stories
Pico-Banana-400K is organized into three specialized subsets designed to address different research needs. The dataset includes 258,000 single-edit examples for basic training, 56,000 preference pairs comparing successful and failed edits, and 72,000 multi-turn sequences showing how images evolve through multiple consecutive edits
2
. This structure supports various research approaches, from basic model training to advanced preference learning and multi-step editing scenarios5
.The dataset is currently available on GitHub and can be accessed by any researcher for non-commercial purposes
5
. This open approach contrasts sharply with Apple's typical product development strategy and comes at a time when the company faces challenges with its own AI initiatives, including delays to the promised Siri overhaul announced in 20245
.
Source: NDTV Gadgets 360
Summarized by
Navi
[3]
[5]
1
Business and Economy

2
Business and Economy

3
Technology
