2 Sources
2 Sources
[1]
When creating images, AI keeps remixing the same 12 stock photo clichés
In the game of visual telephone, one player draws a picture and describes it to another player, who must then attempt to draw the picture based only on the verbal description. After many turns, things often get woefully derailed -- and wildly creative. Now, researchers have made artificial intelligence (AI) models play the game. In a new study published today in Patterns, researchers paired two AI models and set them loose for 100 rounds of visual telephone. But no matter how diverse or specific the starting prompt, the AIs repeatedly converged on the same 12 generic, often Eurocentric motifs -- what the researchers call "visual elevator music." As more AI systems are built to autonomously generate and judge other AI creative work, the researchers warn that the resulting bland soup of cliches could flatten creative diversity. Jeba Rezwana, a human-AI co-creativity researcher at Towson University, says the study provides more evidence that unsupervised AI systems can amplify existing biases, such as favoring Western cultures over others -- underscoring the need to keep humans in the loop. Ahmed Elgammal, director of the Art and Artificial Intelligence Laboratory at Rutgers University, adds that because AI systems are designed to generalize, it's not surprising that they gravitate to familiar themes in their training data. However, he says the study's quantification of this drift is "very, very interesting." These days, AI models are increasingly deployed as independent "agents" that can autonomously generate, critique, and revise text and multimedia. Even a simple question to ChatGPT can set off a chain reaction, as one AI system hands off queries to others. "You have this avalanche of large language models in the background that you're not seeing," says study co-author Arend Hintze, an AI researcher at Dalarna University. Watching that process made Hintze wonder what happens when humans step out of the picture entirely. Can AI systems stay on track when they are left to generate and judge creative work on their own? To find out, he and his team algorithmically generated 100 text prompts to seed games of visual telephone. The prompts were deliberately unusual and distinct. One read, "As the morning sun rises over the nation, eight weary travelers prepare to embark on a plan that will seem impossible to achieve but promises to take them beyond." Another read, "As I sat particularly alone, surrounded by nature, I found an old book with exactly eight pages that told a story in a forgotten language waiting to be read and understood.'' "You cannot get [the prompts] further away from each other," Hintze says. "We tried to make them as wild as possible." Each prompt was fed into an image generator called Stable Diffusion XL (SDXL), which produced an image that was handed off to an image-describing model called the Large Language and Vision Assistant. The resulting description was passed back to SDXL, and the cycle repeated until the systems had gone through 100 rounds. Very quickly, the original ideas began to slip away. For example, after a few dozen handoffs, a prompt about a prime minister grappling with a fragile peace deal devolved into an image of a pompous sitting room with a dramatic chandelier. The outputs for other prompts regularly drifted toward Gothic cathedrals, pastoral landscapes, and rainy nighttime scenes in Paris. The trend persisted even when researchers adjusted the randomness in the image-describing model and swapped in other AI models to play the game. Across the hundreds of resulting trajectories, the AIs defaulted to 12 dominant motifs that Hintze likens to the "meaningless, happy nonsense" of filler photos in Ikea picture frames. The convergence may partly reflect the data sets used to train visual models, Elgammal says. Those data sets are typically curated to be visually appealing, broadly acceptable, and free of offensive material. When the researchers extended the experiment to 1000 iterations, most image sequences remained stuck once they reached one of the 12 dominant motifs. In one case, however, a trajectory abruptly jumped after several hundred steps, moving from a snow-covered house to cows in a field and then to a quaint town. But how often such jumps occur, or whether some visual endpoints are more stable than others, remains unclear. "Does everybody end up in Paris or something? We don't know," Hintze says. The phenomenon also has parallels in human culture. Across cultures, stories such as the Little Red Riding Hood and simple geometric patterns such as spirals or zigzags have emerged repeatedly, suggesting people tend to converge on familiar forms, too. The difference is that human societies tend to have corrective countercultures that push back against homogenization. In AI models, however, "convergence is driven by reinforcement without critique," says Caterina Moruzzi, a philosopher studying creativity and AI at the Edinburgh College of Art. "There is a reward for representations that are easiest to stabilize and to describe." Whether these systems can be built to resist the pull toward sameness is an open question. But Christian Guckelsberger, an AI and creativity researcher at Aalto University, hopes this current limitation isn't viewed as an "engineering challenge. Rather, it raises a broader question about the very purpose of creativity. "We should remember how important it is for people to exercise their creativity as a form of meaning-making and self-realization," he says. "Is there really a problem to be solved -- or is there actually something to preserve?"
[2]
AI Image Generators Default to the Same 12 Photo Styles, Study Finds
AI image generation models have massive sets of visual data to pull from in order to create unique outputs. And yet, researchers find that when models are pushed to produce images based on a series of slowly shifting prompts, it'll default to just a handful of visual motifs, resulting in an ultimately generic style. A study published in the journal Patterns took two AI image generators, Stable Diffusion XL and LLaVA, and put them to test by playing a game of visual telephone. The game went like this: the Stable Diffusion XL model would be given a short prompt and required to produce an imageâ€"for example, "As I sat particularly alone, surrounded by nature, I found an old book with exactly eight pages that told a story in a forgotten language waiting to be read and understood.†That image was presented to the LLaVA model, which was asked to describe it. That description was then fed back to Stable Diffusion, which was asked to create a new image based off that prompt. This went on for 100 rounds. Much like a game of human telephone, the original image was quickly lost. No surprise there, especially if you've ever seen one of those time-lapse videos where people ask an AI model to reproduce an image without making any changes, only for the picture to quickly turn into something that doesn't remotely resemble the original. What did surprise the researchers, though, was the fact that the models default to just a handful of generic-looking styles. Across 1,000 different iterations of the telephone game, the researchers found that most of the image sequences would eventually fall into just one of 12 dominant motifs. In most cases, the shift is gradual. A few times, it happened suddenly. But it almost always happened. And researchers were not impressed. In the study, they referred to the common image styles as "visual elevator music," basically the type of pictures that you'd see hanging up in a hotel room. The most common scenes included things like maritime lighthouses, formal interiors, urban night settings, and rustic architecture. Even when the researchers switched to different models for image generation and descriptions, the same types of trends emerged. Researchers said that when the game is extended to 1,000 turns, coalescing around a style still happens around turn 100, but variations spin out in those extra turns. Interestingly, though, those variations still typically pull from one of the popular visual motifs. So what does that all mean? Mostly that AI isn't particularly creative. In a human game of telephone, you'll end up with extreme variance because each message is delivered and heard differently, and each person has their own internal biases and preferences that may impact what message they receive. AI has the opposite problem. No matter how outlandish the original prompt, it'll always default to a narrow selection of styles. Of course, the AI model is pulling from human-created prompts, so there is something to be said about the data set and what humans are drawn to take pictures of. If there's a lesson here, perhaps it is that copying styles is much easier than teaching taste.
Share
Share
Copy Link
A new study published in Patterns journal shows that AI image generators like Stable Diffusion XL consistently converge on just 12 generic visual motifs—researchers call it "visual elevator music." When two AI models played a game of visual telephone for 100 rounds, even wildly different prompts devolved into the same clichéd scenes: Gothic cathedrals, Parisian streets, and formal interiors. The findings raise urgent questions about AI creativity and the need for human oversight in AI creative processes.
When researchers at Dalarna University asked AI image generators to play a game of visual telephone, they uncovered a troubling pattern. No matter how diverse or unusual the starting prompts, the systems repeatedly collapsed into the same 12 generic visual styles—what study co-author Arend Hintze describes as "visual elevator music."
1
The study, published in Patterns journal, paired Stable Diffusion XL with the Large Language and Vision Assistant (LLaVA) for 100 rounds of iterative image generation and description.1
The visual telephone game worked like this: Stable Diffusion XL received a deliberately unusual prompt and generated an image. LLaVA then described that image, and the description was fed back to Stable Diffusion XL to create a new image. This cycle repeated 100 times across 1,000 different iterations.
2
Researchers crafted prompts to be as distinct as possible, including scenarios like "eight weary travelers prepare to embark on a plan that will seem impossible to achieve" and stories about forgotten languages in ancient books.1
Within just a few dozen rounds, original ideas began slipping away. A prompt about a prime minister navigating a fragile peace deal devolved into an image of a pompous sitting room with a dramatic chandelier. Across hundreds of trajectories, the 12 generic visual styles that emerged included maritime lighthouses, Gothic cathedrals, pastoral landscapes, rainy nighttime scenes in Paris, formal interiors, urban night settings, and rustic architecture.
1
2
These motifs represent the kind of bland, inoffensive imagery you'd find in hotel rooms or Ikea picture frames.
Source: Gizmodo
The trend persisted even when researchers adjusted randomness parameters and swapped in different AI models. When the experiment extended to 1,000 iterations, most image sequences remained stuck once they reached one of the dominant motifs. In rare cases, a trajectory would jump abruptly—shifting from a snow-covered house to cows in a field, then to a quaint town—but such breaks from the pattern were uncommon.
1
Ahmed Elgammal, director of the Art and Artificial Intelligence Laboratory at Rutgers University, explains that because AI systems are designed to generalize, they naturally gravitate toward familiar themes in their training data.
1
The biases in training data play a critical role. Visual datasets used to train these models are typically curated to be visually appealing, broadly acceptable, and free of offensive material—leading to Eurocentric biases and homogenization.1
Jeba Rezwana, a human-AI co-creativity researcher at Towson University, emphasizes that the study provides more evidence that unsupervised AI systems can amplify existing biases, underscoring the need for human oversight in AI creative processes.
1
Unlike human culture, which tends to have corrective countercultures pushing back against homogenization, AI operates through "reinforcement without critique," as philosopher Caterina Moruzzi from the Edinburgh College of Art notes.1
Related Stories
As AI models are increasingly deployed as independent agents that autonomously generate, critique, and revise content, the implications grow more serious. Even a simple query to ChatGPT can trigger what Hintze calls "an avalanche of large language models in the background that you're not seeing."
1
When humans step out of the loop entirely, AI systems struggle to maintain creative direction, potentially leading to what researchers warn could flatten creative diversity across digital media.1
The study reveals a fundamental limitation: while humans in a game of telephone produce extreme variance due to individual biases and preferences, AI has the opposite problem. No matter how outlandish the original prompts, AI image generators default to a narrow selection of styles.
2
This suggests that copying styles proves far easier than teaching taste—a critical insight for anyone building or deploying generative AI systems.What remains unclear is whether certain visual endpoints prove more stable than others, or if specific motifs act as stronger attractors. "Does everybody end up in Paris or something? We don't know," Hintze admits.
1
As AI ethics discussions intensify and more organizations rely on these tools, understanding how to maintain diversity in AI-generated content becomes essential for preventing a future of homogenized visual culture.Summarized by
Navi
1
Technology

2
Technology

3
Technology
