AI Struggles with Sarcasm and Sentiment in Non-American English Varieties

New Benchmark Reveals AI's Struggle with Non-American English

Researchers have developed a new tool called BESSTIE (Benchmark for Sentiment and Sarcasm in Three International English varieties) to evaluate the performance of large language models (LLMs) in detecting sentiment and sarcasm across different English varieties. The study, published in the Findings of the Association for Computational Linguistics 2025, highlights significant challenges faced by AI in understanding non-American English 1

The Challenge of Language Varieties

Dr. Siddharth Srivastava, the lead researcher, shares a personal anecdote that illustrates the complexity of language varieties. Despite studying English for over two decades, he found himself confused by Australian English upon moving to Australia. This experience mirrors the challenges faced by AI models, which are predominantly trained and tested on Standard American English 1

Source: The Conversation

BESSTIE: A Novel Benchmark Tool

BESSTIE is the first benchmark of its kind, focusing on three English varieties: Australian, Indian, and British. The researchers collected data from Google Maps reviews and Reddit posts, using language variety predictors to ensure a high probability of specific language varieties. The benchmark evaluates nine powerful, freely usable large language models, including RoBERTa, mBERT, Mistral, Gemma, and Qwen 1

Key Findings

The study revealed several important insights:

Performance disparity: LLMs performed better on Australian and British English (native varieties) compared to Indian English (non-native variety) 1
1
2
2
.
Sentiment vs. Sarcasm: AI models were more adept at detecting sentiment than sarcasm across all varieties 1
1
2
2
.
Sarcasm detection challenges: The models struggled significantly with sarcasm, achieving only 62% accuracy for Australian English and about 57% for Indian and British English 1
1
2
2
.

Source: Tech Xplore

Inflated performance claims: The study's findings contradict the high performance metrics often reported by tech companies. For instance, while the GLUE leaderboard shows 97.5% accuracy for sentiment classification in American English, the actual performance on other English varieties was notably lower 1
1
2
2
.

Implications and Future Directions

The research underscores the importance of evaluating AI models in specific national contexts. As LLMs become increasingly prevalent worldwide, there's a growing recognition of the need to adapt these tools for diverse language varieties 1

Dr. Srivastava and his team are currently working on a project to implement LLMs in hospital emergency departments to assist patients with varying English proficiencies. Additionally, initiatives like the University of Western Australia and Google's project to improve LLM efficacy for Aboriginal English demonstrate the increasing focus on language diversity in AI development 1

Conclusion

The BESSTIE benchmark represents a significant step towards more inclusive and accurate AI language models. By highlighting the current limitations in processing non-American English varieties, this research paves the way for future improvements in AI's ability to understand and interpret diverse language patterns, ultimately leading to more effective and equitable AI applications across different cultures and regions.

AI Struggles with Sarcasm and Sentiment in Non-American English Varieties

New Benchmark Reveals AI's Struggle with Non-American English

The Challenge of Language Varieties

BESSTIE: A Novel Benchmark Tool

Key Findings

Implications and Future Directions

Conclusion

References

'Are you joking, mate?' AI doesn't get sarcasm in non-American varieties of English

'Are you joking, mate?' AI doesn't get sarcasm in non-American varieties of English

Related Stories

AI Language Models Struggle with Basic Sense-Making in Novel Benchmark Test

AI Writing Tools Homogenize Global Writing Styles, Favoring Western Norms

The AI Language Divide: How Non-English Speakers Are Being Left Behind

Recent Highlights

Google launches Gemini 3 Flash as default AI model, delivering speed with Pro-grade reasoning

OpenAI launches GPT Image 1.5 as AI image generator war with Google intensifies

OpenAI launches ChatGPT app store, opening doors for third-party developers to build AI-powered apps

Recent Highlights

Today's Top Stories

Doctors warn AI companions threaten mental health as kids turn to chatbots for friendship

AI hiring creates 'doom loop' as 78% of companies deploy AI agents for job interviews

Clair Obscur: Expedition 33 Stripped of Indie Game Awards GOTY After AI Art Disclosure

Mac cluster AI calculations get major boost from Thunderbolt 5 RDMA support