2 Sources
2 Sources
[1]
Inside AI's biggest gathering, a surprising admission: No one knows how it works
SAN DIEGO -- For the past week, academics, startup founders and researchers representing industrial titans from around the globe descended on sunny San Diego for the top gathering in the field of artificial intelligence. The Neural Information Processing Systems, or NeurIPS, conference has been held for 39 years, but it drew a record-breaking 26,000 attendees this year, twice as many as just six years ago. Since its founding in 1987, NeurIPS has been devoted to researching neural networks and the interplay among computation, neurobiology and physics. While neural networks, computational structures inspired by human and animal cognitive systems, were once an esoteric academic fixation, their role underpinning AI systems has transformed NeurIPS from a niche meeting in a Colorado hotel to an event filling the entire San Diego Convention Center -- also home to the world-famous Comic-Con. But even as the gathering boomed along with the AI industry and sessions on hyperspecific topics like AI-created music proliferated, one of the buzziest points of discussion was basic and foundational to the field of AI: the mystery around how frontier systems actually work. Most -- if not all -- leading AI researchers and CEOs readily admit that they do not understand how today's leading AI systems function. The pursuit of understanding models' internal structure is called interpretability, given the desire to "interpret" how the models function. Shriyash Upadhyay, an AI researcher and co-founder of an interpretability-focused company called Martian, said the interpretability field is still in its infancy: "People don't really understand fully what the field is about. There's a lot of ferment in ideas, and people have different agendas." "In traditional incremental science, for example, where the ideas are mostly settled, scientists might attempt to add an additional decimal point of accuracy of measurement to particular properties of an electron," Upadhyay said. "With interpretability, we're in the phase of asking: 'What are electrons? Do electrons exist? Are they measurable?' It's the same question with interpretability: We're asking, 'What does it mean to have an interpretable AI system?'" Upadhyay and Martian used the NeurIPS occasion to launch a $1 million prize to boost interpretability efforts. As the conference unfolded, leading AI companies' interpretability teams signaled new and diverging approaches to understanding how their increasingly advanced systems work. Early last week, Google's team announced a significant pivot to shift away from attempts to understand every part of a model to more practical methods focused on real-world impact. Neel Nanda, one of Google's interpretability leaders, wrote in a statement that "grand goals like near-complete reverse-engineering still feel far out of reach" given that "we want our work to pay off within ~10 years." Nanda highlighted AI's rapid progress and lackluster advancements on the teams' previous, more "ambitious reverse-engineering" approach as reasons for the switch. On the other hand, OpenAI's head of interpretability, Leo Gao, announced Friday and discussed at NeurIPS that he was doubling down on a deeper, more ambitious form of interpretability "to fully understand how neural networks work." Adam Gleave, an AI researcher and co-founder of the FAR.AI research and education nonprofit organization, said he was skeptical of the ability to fully understand models' behavior: "I suspect deep-learning models don't have a simple explanation -- so it's simply not possible to fully reverse engineer a large-scale neural network in a way that is comprehensible to a person." Despite the barrier to complete understanding of the complex systems, Gleave said he was hopeful that researchers would still make meaningful progress in understanding how models behave on many levels, which would help researchers and companies create more reliable and trustworthy systems. "I'm excited by the growing interest in issues of safety and alignment in the machine-learning research community," Gleave told NBC News, though he noted that NeurIPS meetings dedicated to increasing AI capabilities were so large that they "took place in rooms that could double as aircraft hangars." In addition to uncertainties in how models behave, most researchers are unimpressed with current methods for evaluating and measuring AI systems' current capabilities. "We don't have the measurement tools to measure more complicated concepts and bigger questions about models' general behavior, things like intelligence and reasoning," said Sanmi Koyejo, a professor of computer science and leader of the Trustworthy AI Research Lab at Stanford University. "Lots of evaluations and benchmarks were built for a different time when researchers were measuring specific downstream tasks," Koyejo said, emphasizing the need for more resources and attention to create new, reliable and meaningful tests for AI systems. The same questions about what aspects of AI systems should be measured and how to measure them apply to AI models being used for specific scientific domains. Ziv Bar-Joseph, a professor at Carnegie Mellon University, the founder of GenBio AI and an expert in advanced AI models for biology, said evaluations of biology-specific AI systems are also in their infancy. "It's extremely, extremely early stages for biology evaluations. Extremely early stages," Bar-Joseph told NBC News. "I think we are still working out what should be the way we evaluate things, let alone what we should even be studying for evaluation." Despite incremental and halting progress in understanding how cutting-edge AI systems work and how to actually measure their progress, researchers nonetheless see rapid advancements in AI systems' ability to enhance scientific research itself. "People built bridges before Isaac Newton figured out physics," said Upadhyay, of Martian, pointing out that complete understanding of AI systems is not necessary to unleash significant real-world change. For the fourth year in a row, researchers organized an offshoot of the main NeurIPS conference to focus on the latest AI methods to boost scientific discovery. One of the event's organizers, Ada Fang, a Ph.D. student studying the intersection of AI and chemistry at Harvard, said this year's NeurIPS edition was "a great success." "Frontier research in AI for science is happening separately in areas of biology to materials, chemistry and physics, yet the underlying challenges and ideas are deeply shared," Fang told NBC News. "Our goal was to create a space where researchers could discuss not only the breakthroughs, but also the reach and limits of AI for science." Jeff Clune, a pioneer in the use of AI for science and a computer science professor at the University of British Columbia in Vancouver, said in a panel discussion that the field is quickly accelerating. "The amount of emails, contacts and people who are stopping me at NeurIPS and want to talk about creating AI that can go learn, discover and innovate for science, the interest level is through the roof," Clune said. "I have been absolutely blown away by the change. "I'm looking out in the crowd here and seeing people who were there 10 years ago or 20 years ago, in the wilderness with me, when nobody cared about this issue. And now, it seems like the world cares," he added. "It's just heartwarming to see that AI works well enough, and now there's enough interest, to want to go tackle some of the most important and pressing problems for human well-being."
[2]
26,000 scientists still can't explain exactly how AI models think and how to measure them
Black box models challenge trust as benchmarks fail to reflect real intelligence In a dizzying age of machine learning triumph, where systems can generate human-like prose, diagnose medical conditions, and synthesize novel proteins, the AI research community is facing an extraordinary paradox. Despite the exponential growth in capability, a core challenge remains stubbornly unsolved: no one, not even 26,000 scientists at the NeurIPS conference, can definitively explain how these powerful AI models think or agree on how to truly measure their intelligence. Also read: Google's increasing Chrome security for agentic AI actions with User Alignment Critic model This is the "black box" problem, magnified by the sheer scale of modern deep neural networks. A large language model (LLM), with its quadrillions of learned connections, operates in a computational space too vast for the human mind to trace. When it provides an answer, the underlying reasoning process is opaque, leaving researchers to marvel at the output without understanding the mechanism. The engineering is functional, but the science of why it works is missing. New research into this mechanistic interpretability reveals a troubling disconnect. Studies show AI models often employ radically non-human, or even contradictory, strategies. For instance, an LLM might solve a math problem by simultaneously approximating the sum and then precisely calculating the last digit - a method that defies our school-taught logic. Crucially, the rationale it provides for its answer might, in fact, be a post-hoc rationalization - a convincing form of "bullshitting" driven by the need to satisfy the user, not a reflection of its actual computation. Also read: ChatGPT as a grocery store: From recipe to goods delivery, OpenAI's chatbot is evolving This opaque reasoning makes measurement an almost impossible task. Traditional benchmarks, often based on simple question-and-answer formats, are increasingly unreliable, falling victim to data contamination or the models' ability to mimic reasoning rather than perform it. The field is now grappling with an "evaluation crisis," realizing that high scores on public leaderboards don't guarantee real-world capability or alignment with human values. The ultimate risk is not simply confusion, but a profound barrier to trust. When an autonomous vehicle fails, or an AI-driven financial decision discriminates, our inability to peer into the black box prevents us from debugging the error, correcting the bias, or building the necessary safety guardrails. While Explainable AI (XAI) is a rapidly growing sub-field, its solutions remain stopgaps, providing local interpretations rather than a full, global understanding of the machine's mind. Until researchers can bridge this interpretability gap, the world's most powerful technology will continue to advance with a brilliant, yet deeply unsettling, mystery at its core.
Share
Share
Copy Link
At this year's NeurIPS conference, a record 26,000 AI researchers confronted an unsettling reality: no one fully understands how today's most advanced AI systems actually work. As companies like Google and OpenAI pursue diverging approaches to AI interpretability, the field grapples with the black box problem and an evaluation crisis that threatens trust in increasingly powerful systems.
The Neural Information Processing Systems, or NeurIPS conference, drew a record-breaking 26,000 attendees to San Diego this year, twice the number from just six years ago
1
. Yet amid sessions on hyperspecific topics and the celebration of AI's exponential growth, one of the most discussed issues was surprisingly basic: AI researchers openly admit they don't understand how frontier AI systems actually work. Most leading AI researchers and CEOs readily acknowledge this knowledge gap, highlighting a profound paradox in a field advancing at breakneck speed1
. The pursuit of understanding how AI models think through their internal structure is called AI interpretability, a field still in its infancy despite the urgency of the challenge.
Source: NBC
The black box problem has been magnified by the sheer scale of modern deep neural networks
2
. A large language model (LLM) with quadrillions of learned connections operates in a computational space too vast for the human mind to trace. When these systems provide answers, the underlying reasoning process remains opaque, leaving AI researchers to marvel at outputs without understanding the mechanism. Shriyash Upadhyay, an AI researcher and co-founder of interpretability-focused company Martian, compared the current state to fundamental physics questions: "We're asking, 'What does it mean to have an interpretable AI system?'"1
. Martian used the NeurIPS occasion to launch a $1 million prize to boost interpretability efforts, signaling both the importance and difficulty of the challenge.
Source: Digit
Understanding how AI systems work has become a strategic priority, but leading companies are pursuing dramatically different paths. Google's interpretability team announced a significant pivot away from attempts to understand every part of a model toward more practical methods focused on real-world impact
1
. Neel Nanda, one of Google's interpretability leaders, acknowledged that "grand goals like near-complete reverse-engineering still feel far out of reach" given the team's desire for work to pay off within approximately 10 years. In contrast, OpenAI's head of interpretability, Leo Gao, announced he was doubling down on a deeper, more ambitious form of interpretability "to fully understand how neural networks work"1
. This divergence reflects broader uncertainty about whether complete understanding is even achievable.Beyond interpretability challenges, the field faces an AI evaluation crisis in measuring AI intelligence reliably. Sanmi Koyejo, a professor of computer science and leader of the Trustworthy AI Research Lab at Stanford University, told NBC News: "We don't have the measurement tools to measure more complicated concepts and bigger questions about models' general behavior, things like intelligence and reasoning"
1
. Traditional AI benchmarks, built for a different era when researchers measured specific downstream tasks, increasingly fall victim to data contamination or the models' ability to mimic reasoning rather than perform it2
. High scores on public leaderboards don't guarantee real-world capability or AI alignment with human values.Related Stories
New research into mechanistic interpretability reveals that AI models often employ radically non-human or even contradictory strategies
2
. An LLM might solve a math problem by simultaneously approximating the sum and then precisely calculating the last digitβa method that defies conventional logic. The rationale it provides might be post-hoc rationalization driven by the need to satisfy the user, not a reflection of actual computation. This opacity makes debugging AI errors and correcting AI biases extremely difficult. When an autonomous vehicle fails or an AI-driven financial decision discriminates, the inability to peer into the black box prevents researchers from building necessary AI safety guardrails2
.Adam Gleave, an AI researcher and co-founder of the FAR.AI research nonprofit, expressed skepticism about fully understanding models' behavior: "I suspect deep-learning models don't have a simple explanationβso it's simply not possible to fully reverse engineer a large-scale neural network in a way that is comprehensible to a person"
1
. Despite this barrier, Gleave remains hopeful that researchers will make meaningful progress in understanding how models behave on many levels, which would help create more reliable systems. While explainable AI (XAI) is a rapidly growing sub-field, its solutions remain stopgaps, providing local interpretations rather than full, global understanding2
. Machine learning capabilities continue advancing, but AI safety and alignment concerns grow as the world's most powerful technology advances with a brilliant yet deeply unsettling mystery at its core.Summarized by
Navi
13 May 2025β’Technology

16 Jul 2025β’Technology

25 Apr 2025β’Technology

1
Science and Research

2
Technology

3
Business and Economy
