26,000 AI researchers admit they can't explain how models work as interpretability crisis deepens

Reviewed byNidhi Govil

2 Sources

Share

At this year's NeurIPS conference, a record 26,000 AI researchers confronted an unsettling reality: no one fully understands how today's most advanced AI systems actually work. As companies like Google and OpenAI pursue diverging approaches to AI interpretability, the field grapples with the black box problem and an evaluation crisis that threatens trust in increasingly powerful systems.

Record NeurIPS Gathering Exposes Fundamental Knowledge Gap

The Neural Information Processing Systems, or NeurIPS conference, drew a record-breaking 26,000 attendees to San Diego this year, twice the number from just six years ago

1

. Yet amid sessions on hyperspecific topics and the celebration of AI's exponential growth, one of the most discussed issues was surprisingly basic: AI researchers openly admit they don't understand how frontier AI systems actually work. Most leading AI researchers and CEOs readily acknowledge this knowledge gap, highlighting a profound paradox in a field advancing at breakneck speed

1

. The pursuit of understanding how AI models think through their internal structure is called AI interpretability, a field still in its infancy despite the urgency of the challenge.

Source: NBC

Source: NBC

The Black Box Problem Magnified by Scale

The black box problem has been magnified by the sheer scale of modern deep neural networks

2

. A large language model (LLM) with quadrillions of learned connections operates in a computational space too vast for the human mind to trace. When these systems provide answers, the underlying reasoning process remains opaque, leaving AI researchers to marvel at outputs without understanding the mechanism. Shriyash Upadhyay, an AI researcher and co-founder of interpretability-focused company Martian, compared the current state to fundamental physics questions: "We're asking, 'What does it mean to have an interpretable AI system?'"

1

. Martian used the NeurIPS occasion to launch a $1 million prize to boost interpretability efforts, signaling both the importance and difficulty of the challenge.

Source: Digit

Source: Digit

Google and OpenAI Diverge on Interpretability Approaches

Understanding how AI systems work has become a strategic priority, but leading companies are pursuing dramatically different paths. Google's interpretability team announced a significant pivot away from attempts to understand every part of a model toward more practical methods focused on real-world impact

1

. Neel Nanda, one of Google's interpretability leaders, acknowledged that "grand goals like near-complete reverse-engineering still feel far out of reach" given the team's desire for work to pay off within approximately 10 years. In contrast, OpenAI's head of interpretability, Leo Gao, announced he was doubling down on a deeper, more ambitious form of interpretability "to fully understand how neural networks work"

1

. This divergence reflects broader uncertainty about whether complete understanding is even achievable.

AI Evaluation Crisis Threatens Trust and Safety

Beyond interpretability challenges, the field faces an AI evaluation crisis in measuring AI intelligence reliably. Sanmi Koyejo, a professor of computer science and leader of the Trustworthy AI Research Lab at Stanford University, told NBC News: "We don't have the measurement tools to measure more complicated concepts and bigger questions about models' general behavior, things like intelligence and reasoning"

1

. Traditional AI benchmarks, built for a different era when researchers measured specific downstream tasks, increasingly fall victim to data contamination or the models' ability to mimic reasoning rather than perform it

2

. High scores on public leaderboards don't guarantee real-world capability or AI alignment with human values.

Non-Human Reasoning Patterns Challenge Explainable AI

New research into mechanistic interpretability reveals that AI models often employ radically non-human or even contradictory strategies

2

. An LLM might solve a math problem by simultaneously approximating the sum and then precisely calculating the last digitβ€”a method that defies conventional logic. The rationale it provides might be post-hoc rationalization driven by the need to satisfy the user, not a reflection of actual computation. This opacity makes debugging AI errors and correcting AI biases extremely difficult. When an autonomous vehicle fails or an AI-driven financial decision discriminates, the inability to peer into the black box prevents researchers from building necessary AI safety guardrails

2

.

The Path Forward Remains Uncertain

Adam Gleave, an AI researcher and co-founder of the FAR.AI research nonprofit, expressed skepticism about fully understanding models' behavior: "I suspect deep-learning models don't have a simple explanationβ€”so it's simply not possible to fully reverse engineer a large-scale neural network in a way that is comprehensible to a person"

1

. Despite this barrier, Gleave remains hopeful that researchers will make meaningful progress in understanding how models behave on many levels, which would help create more reliable systems. While explainable AI (XAI) is a rapidly growing sub-field, its solutions remain stopgaps, providing local interpretations rather than full, global understanding

2

. Machine learning capabilities continue advancing, but AI safety and alignment concerns grow as the world's most powerful technology advances with a brilliant yet deeply unsettling mystery at its core.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Β© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo