Curated by THEOUTPOST
On Tue, 13 May, 4:02 PM UTC
3 Sources
[1]
Inner workings of AI an enigma -- even to its creators
Even the greatest human minds building generative artificial intelligence that is poised to change the world admit they do not comprehend how digital minds think. "People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work," Anthropic co-founder Dario Amodei wrote in an essay posted online in April. "This lack of understanding is essentially unprecedented in the history of technology." Unlike traditional software programs that follow pre-ordained paths of logic dictated by programmers, generative AI (gen AI) models are trained to find their own way to success once prompted. In a recent podcast Chris Olah, who was part of ChatGPT-maker OpenAI before joining Anthropic, described gen AI as "scaffolding" on which circuits grow. Olah is considered an authority in so-called mechanistic interpretability, a method of reverse engineering AI models to figure out how they work. This science, born about a decade ago, seeks to determine exactly how AI gets from a query to an answer. "Grasping the entirety of a large language model is an incredibly ambitious task," said Neel Nanda, a senior research scientist at the Google DeepMind AI lab. It was "somewhat analogous to trying to fully understand the human brain," Nanda added to AFP, noting neuroscientists have yet to succeed on that front. Delving into digital minds to understand their inner workings has gone from a little-known field just a few years ago to being a hot area of academic study. "Students are very much attracted to it because they perceive the impact that it can have," said Boston University computer science professor Mark Crovella. The area of study is also gaining traction due to its potential to make gen AI even more powerful, and because peering into digital brains can be intellectually exciting, the professor added. Keeping AI honest Mechanistic interpretability involves studying not just results served up by gen AI but scrutinizing calculations performed while the technology mulls queries, according to Crovella. "You could look into the model...observe the computations that are being performed and try to understand those," the professor explained. Startup Goodfire uses AI software capable of representing data in the form of reasoning steps to better understand gen AI processing and correct errors. The tool is also intended to prevent gen AI models from being used maliciously or from deciding on their own to deceive humans about what they are up to. "It does feel like a race against time to get there before we implement extremely intelligent AI models into the world with no understanding of how they work," said Goodfire chief executive Eric Ho. In his essay, Amodei said recent progress has made him optimistic that the key to fully deciphering AI will be found within two years. "I agree that by 2027, we could have interpretability that reliably detects model biases and harmful intentions," said Auburn University associate professor Anh Nguyen. According to Boston University's Crovella, researchers can already access representations of every digital neuron in AI brains. "Unlike the human brain, we actually have the equivalent of every neuron instrumented inside these models", the academic said. "Everything that happens inside the model is fully known to us. It's a question of discovering the right way to interrogate that." Harnessing the inner workings of gen AI minds could clear the way for its adoption in areas where tiny errors can have dramatic consequences, like national security, Amodei said. For Nanda, better understanding what gen AI is doing could also catapult human discoveries, much like DeepMind's chess-playing AI, AlphaZero, revealed entirely new chess moves that none of the grand masters had ever thought about. Properly understood, a gen AI model with a stamp of reliability would grab competitive advantage in the market. Such a breakthrough by a US company would also be a win for the nation in its technology rivalry with China. "Powerful AI will shape humanity's destiny," Amodei wrote. "We deserve to understand our own creations before they radically transform our economy, our lives, and our future."
[2]
Inner workings of AI an enigma - even to its creators
New York (AFP) - Even the greatest human minds building generative artificial intelligence that is poised to change the world admit they do not comprehend how digital minds think. "People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work," Anthropic co-founder Dario Amodei wrote in an essay posted online in April. "This lack of understanding is essentially unprecedented in the history of technology." Unlike traditional software programs that follow pre-ordained paths of logic dictated by programmers, generative AI (gen AI) models are trained to find their own way to success once prompted. In a recent podcast Chris Olah, who was part of ChatGPT-maker OpenAI before joining Anthropic, described gen AI as "scaffolding" on which circuits grow. Olah is considered an authority in so-called mechanistic interpretability, a method of reverse engineering AI models to figure out how they work. This science, born about a decade ago, seeks to determine exactly how AI gets from a query to an answer. "Grasping the entirety of a large language model is an incredibly ambitious task," said Neel Nanda, a senior research scientist at the Google DeepMind AI lab. It was "somewhat analogous to trying to fully understand the human brain," Nanda added to AFP, noting neuroscientists have yet to succeed on that front. Delving into digital minds to understand their inner workings has gone from a little-known field just a few years ago to being a hot area of academic study. "Students are very much attracted to it because they perceive the impact that it can have," said Boston University computer science professor Mark Crovella. The area of study is also gaining traction due to its potential to make gen AI even more powerful, and because peering into digital brains can be intellectually exciting, the professor added. Keeping AI honest Mechanistic interpretability involves studying not just results served up by gen AI but scrutinizing calculations performed while the technology mulls queries, according to Crovella. "You could look into the model...observe the computations that are being performed and try to understand those," the professor explained. Startup Goodfire uses AI software capable of representing data in the form of reasoning steps to better understand gen AI processing and correct errors. The tool is also intended to prevent gen AI models from being used maliciously or from deciding on their own to deceive humans about what they are up to. "It does feel like a race against time to get there before we implement extremely intelligent AI models into the world with no understanding of how they work," said Goodfire chief executive Eric Ho. In his essay, Amodei said recent progress has made him optimistic that the key to fully deciphering AI will be found within two years. "I agree that by 2027, we could have interpretability that reliably detects model biases and harmful intentions," said Auburn University associate professor Anh Nguyen. According to Boston University's Crovella, researchers can already access representations of every digital neuron in AI brains. "Unlike the human brain, we actually have the equivalent of every neuron instrumented inside these models", the academic said. "Everything that happens inside the model is fully known to us. It's a question of discovering the right way to interrogate that." Harnessing the inner workings of gen AI minds could clear the way for its adoption in areas where tiny errors can have dramatic consequences, like national security, Amodei said. For Nanda, better understanding what gen AI is doing could also catapult human discoveries, much like DeepMind's chess-playing AI, AlphaZero, revealed entirely new chess moves that none of the grand masters had ever thought about. Properly understood, a gen AI model with a stamp of reliability would grab competitive advantage in the market. Such a breakthrough by a US company would also be a win for the nation in its technology rivalry with China. "Powerful AI will shape humanity's destiny," Amodei wrote. "We deserve to understand our own creations before they radically transform our economy, our lives, and our future."
[3]
Inner workings of AI an enigma - even to its creators
Leading AI experts admit they don't fully understand how generative AI works, despite rapid progress. Mechanistic interpretability aims to reverse-engineer AI models, improving their reliability and preventing misuse. Researchers hope to uncover AI's inner workings within two years, ensuring safer, more impactful AI for industries like national security.Even the greatest human minds building generative artificial intelligence that is poised to change the world admit they do not comprehend how digital minds think. "People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work," Anthropic co-founder Dario Amodei wrote in an essay posted online in April. "This lack of understanding is essentially unprecedented in the history of technology." Unlike traditional software programs that follow pre-ordained paths of logic dictated by programmers, generative AI (gen AI) models are trained to find their own way to success once prompted. In a recent podcast Chris Olah, who was part of ChatGPT-maker OpenAI before joining Anthropic, described gen AI as "scaffolding" on which circuits grow. Olah is considered an authority in so-called mechanistic interpretability, a method of reverse engineering AI models to figure out how they work. This science, born about a decade ago, seeks to determine exactly how AI gets from a query to an answer. "Grasping the entirety of a large language model is an incredibly ambitious task," said Neel Nanda, a senior research scientist at the Google DeepMind AI lab. It was "somewhat analogous to trying to fully understand the human brain," Nanda added to AFP, noting neuroscientists have yet to succeed on that front. Delving into digital minds to understand their inner workings has gone from a little-known field just a few years ago to being a hot area of academic study. "Students are very much attracted to it because they perceive the impact that it can have," said Boston University computer science professor Mark Crovella. The area of study is also gaining traction due to its potential to make gen AI even more powerful, and because peering into digital brains can be intellectually exciting, the professor added. Keeping AI honest Mechanistic interpretability involves studying not just results served up by gen AI but scrutinizing calculations performed while the technology mulls queries, according to Crovella. "You could look into the model...observe the computations that are being performed and try to understand those," the professor explained. Startup Goodfire uses AI software capable of representing data in the form of reasoning steps to better understand gen AI processing and correct errors. The tool is also intended to prevent gen AI models from being used maliciously or from deciding on their own to deceive humans about what they are up to. "It does feel like a race against time to get there before we implement extremely intelligent AI models into the world with no understanding of how they work," said Goodfire chief executive Eric Ho. In his essay, Amodei said recent progress has made him optimistic that the key to fully deciphering AI will be found within two years. "I agree that by 2027, we could have interpretability that reliably detects model biases and harmful intentions," said Auburn University associate professor Anh Nguyen. According to Boston University's Crovella, researchers can already access representations of every digital neuron in AI brains. "Unlike the human brain, we actually have the equivalent of every neuron instrumented inside these models", the academic said. "Everything that happens inside the model is fully known to us. It's a question of discovering the right way to interrogate that." Harnessing the inner workings of gen AI minds could clear the way for its adoption in areas where tiny errors can have dramatic consequences, like national security, Amodei said. For Nanda, better understanding what gen AI is doing could also catapult human discoveries, much like DeepMind's chess-playing AI, AlphaZero, revealed entirely new chess moves that none of the grand masters had ever thought about. Properly understood, a gen AI model with a stamp of reliability would grab competitive advantage in the market. Such a breakthrough by a US company would also be a win for the nation in its technology rivalry with China. "Powerful AI will shape humanity's destiny," Amodei wrote. "We deserve to understand our own creations before they radically transform our economy, our lives, and our future."
Share
Share
Copy Link
Leading AI experts admit they don't fully understand how generative AI works, sparking a race to decipher these digital minds through mechanistic interpretability.
In a surprising revelation, even the most brilliant minds behind generative artificial intelligence (gen AI) admit they don't fully comprehend how their creations work. Dario Amodei, co-founder of Anthropic, stated, "People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work" 1. This lack of understanding is unprecedented in technological history, marking a significant shift in how we develop and interact with advanced AI systems.
Unlike traditional software that follows predetermined logical paths, gen AI models are trained to find their own solutions when prompted. Chris Olah, formerly of OpenAI and now with Anthropic, described gen AI as "scaffolding" on which circuits grow 1. This unique characteristic sets gen AI apart from conventional programming paradigms and contributes to its enigmatic nature.
To address this knowledge gap, researchers are turning to a field known as mechanistic interpretability. This approach, which has gained traction in the past decade, aims to reverse-engineer AI models to understand their inner workings 2. Mark Crovella, a computer science professor at Boston University, explains that this involves not just studying the results produced by gen AI but also scrutinizing the calculations performed during the process 3.
The urgency to understand gen AI is palpable within the AI community. Eric Ho, CEO of startup Goodfire, emphasizes the time-sensitive nature of this endeavor: "It does feel like a race against time to get there before we implement extremely intelligent AI models into the world with no understanding of how they work" 1. This sentiment is echoed by many in the field who recognize the potential risks of deploying powerful AI systems without fully grasping their decision-making processes.
Despite the challenges, there's optimism in the AI community. Dario Amodei believes that the key to fully deciphering AI could be found within two years 1. Anh Nguyen, an associate professor at Auburn University, agrees, stating, "By 2027, we could have interpretability that reliably detects model biases and harmful intentions" 3.
Understanding the inner workings of gen AI could pave the way for its adoption in critical areas such as national security, where even small errors can have significant consequences 1. Moreover, as Neel Nanda from Google DeepMind points out, better comprehension of AI's processes could lead to groundbreaking human discoveries, similar to how DeepMind's AlphaZero revealed novel chess moves 2.
The quest to understand gen AI has implications beyond scientific curiosity. A breakthrough in this field by a US company could provide a competitive edge in the global AI market and strengthen the nation's position in its technological rivalry with China 3. As Amodei concludes, "Powerful AI will shape humanity's destiny. We deserve to understand our own creations before they radically transform our economy, our lives, and our future" 1.
Reference
[1]
[3]
Anthropic's CEO Dario Amodei emphasizes the critical importance of AI interpretability, setting an ambitious goal to reliably detect most AI model problems by 2027. This push comes amid growing concerns about the opacity of advanced AI systems and their potential impacts on various sectors.
3 Sources
3 Sources
As artificial intelligence continues to evolve at an unprecedented pace, experts debate its potential to revolutionize industries while others warn of the approaching technological singularity. The manifestation of unusual AI behaviors raises concerns about the widespread adoption of this largely misunderstood technology.
2 Sources
2 Sources
Anthropic's new research technique, circuit tracing, provides unprecedented insights into how large language models like Claude process information and make decisions, revealing unexpected complexities in AI reasoning.
9 Sources
9 Sources
As artificial intelligence rapidly advances, the concept of Artificial General Intelligence (AGI) sparks intense debate among experts, raising questions about its definition, timeline, and potential impact on society.
4 Sources
4 Sources
Recent research reveals GPT-4's ability to pass the Turing Test, raising questions about the test's validity as a measure of artificial general intelligence and prompting discussions on the nature of AI capabilities.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved