State media control shapes what AI chatbots say by flooding training data with biased content

Reviewed byNidhi Govil

3 Sources

Share

A new Nature study reveals governments indirectly influence large language models by controlling online media environments. Researchers found state-coordinated media appears 41 times more frequently than Wikipedia in Chinese training data, causing ChatGPT and Claude to produce more pro-government responses when prompted in native languages versus English.

State Media Control Leaves Detectable Traces in AI Systems

Governments can shape what large language models say without directly building them, according to groundbreaking research published in Nature. A team spanning University of Oregon, Purdue University, UC San Diego, NYU, and Princeton University discovered that state media control influences large language models by flooding the information environment that produces training data

1

. The influence operates through a mechanism researchers call "institutional influence on AI"—where powerful institutions shape discourse through media control, which then affects LLMs they don't directly control

3

.

Source: Newswise

Source: Newswise

Evidence Spans 37 Countries and Multiple Studies

The researchers conducted six interconnected studies combining analysis of open training data, experiments with training small models, human evaluation, and real-world tests of commercial chatbots. Their cross-national analysis examined 37 states where at least 70% of speakers of an official language live within state borders

1

. The pattern was consistent: states with tighter media control showed stronger evidence of shaping LLM behaviour, particularly when queries were posed in the state's primary language rather than alternative languages.

Source: Nature

Source: Nature

China Case Study Reveals 41-Times Higher Presence

To trace the mechanism, researchers focused on China, comparing state-coordinated media with the CulturaX dataset—a 6.3-trillion token multilingual dataset derived from Common Crawl

2

. The findings were stark: state-scripted news content appeared in this common training data set 41 times more often than content from Chinese-language Wikipedia

1

. Among documents mentioning Chinese political leaders or institutions, the share rose as high as 23%

3

. Only about 12% of matched documents came from known government or news domains, suggesting state-coordinated media had spread widely across the Chinese internet content before reaching AI training corpora

3

.

Commercial Models Reproduce Distinctive State Phrases

The research team found that commercial models memorized and reproduced distinctive phrases from state-scripted news content, indicating repeated exposure during training

3

. "State-coordinated content is not just about what appears in official media. It is also about recirculation; the same phrasing moving through newspapers, apps, reposts and ordinary web pages until it looks like part of the broader information environment," explained Brandon M. Stewart, the paper's corresponding author and Associate Professor of Sociology at Princeton University

3

.

Training Experiments Confirm Causal Link

To establish causation, researchers conducted pretraining experiments with open-weight LLMs. They added state-scripted news to the training process and measured behavioral changes. The results demonstrated clear influence on AI models: adding scripted news made models produce pro-government responses nearly 80% of the time compared with an unmodified model

3

. This held true even when compared to other non-scripted Chinese media, and especially compared to adding general Chinese-language text from the internet.

ChatGPT and Claude Show Language-Based Bias

Testing commercial models revealed systematic patterns of biased output in LLMs. ChatGPT and Claude gave more favorable answers about institutions and leaders in China when queries were written in Chinese versus English

1

. Human raters judged Chinese-prompted answers to be more favorable to China 75.3% of the time for political questions about China

3

. For prompts not about China, the rate was no different from chance, demonstrating specificity in the influence.

Authoritarian Advantages in Information Environment

The research highlights concerning asymmetries between authoritarian and democratic states. "Authoritarian governments have advantages over democracies in this respect, because state outlets flood the information environment with their material, whereas state-owned organizations in democratic states typically have to compete with opposition or commercial media," the researchers noted

1

. This structural advantage allows authoritarian states to exert disproportionate influence through sheer volume of coordinated content.

Transparency Gaps Limit Understanding

The study underscores critical needs for transparency from technology companies. "Researchers and users of LLMs could better understand the scale of institutional influence if we could directly examine the training data and model parameters of commercial LLMs," the authors emphasized

1

. Hannah Waight, co-first author and Assistant Professor of Sociology at University of Oregon, explained: "People often talk about AI as if it learns from the internet in some neutral way. It doesn't. It learns from information environments that have already been shaped by institutions and power structures, and those environments can leave measurable traces in what models say"

3

.

Future Implications for Generative AI

The institutional influence mechanism documented in this research carries significant implications for generative AI development. The study focused on text, but researchers expect similar patterns in multimodal models processing images and video

1

. Joshua Tucker, co-author and co-Director of the NYU Center for Social Media, AI, and Politics, noted: "The public debate has focused on what AI can generate, but this study points upstream. Before AI systems can influence politics, politics can influence AI"

3

."

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved