17 Sources
[1]
Turing Award honors AI's reinforcement learning duo
Why it matters: Reinforcement learning, as the technique is known, posits that computers can learn from their own experiences, using a system of rewards similar to how researchers have trained animals. In a joint interview, Barto and Sutton said the award is extremely rewarding, especially given that for much of their career, the technology they pursued was out of vogue. Catch up quick: Sutton, now a computer science professor at Canada's University of Alberta, was Barto's student at the University of Massachusetts in the late 1970s. What they're saying: Google's Jeff Dean said reinforcement learning has been central to the advancement of modern AI. What's next: Both Sutton and Barto believe that current fears about AI are overblown, though they acknowledge that highly intelligent systems could cause significant upheaval as society adjusts.
[2]
Pioneers behind reinforcement learning win Turing Award
OpenAI's ChatGPT employs a technique called reinforcement learning from human feedback, a practical application of the awardees' work. Andrew Barto and Richard Sutton have received one of the highest honours in computing for developing the foundations of reinforcement learning (RL) - one of the key pieces of research behind the artificial intelligence (AI) we see today. The recipients of the 2024 Association of Computing Machinery (ACM) A M Turing Award are credited with introducing the main ideas, constructing the mathematical foundations and developing important algorithms that led to the creation of "one of the most important approaches for creating intelligent systems". Barto is professor emeritus at the Department of Information and Computer Sciences at the University of Massachusetts, Amherst, while Sutton is a professor of computer science at the University of Alberta, the chief scientific advisor at the Alberta Machina Intelligence Institute and a research scientist at Keen Technologies, an AI company. The two began collaborating in 1978 at the University of Massachusetts at Amherst where Barto was Sutton's PhD and postdoctoral advisor. In the early 1980s, Barto and Sutton drew on mathematical foundations provided by Markov decision processes (MDPs), whereby an agent - a computational entity that can perceive and act - makes decisions in a random environment, receiving a reward signal after each transition with the aim of maximising its long-term rewards. Whereas standard MDP theory assumes that everything about the MDP is known to the agent, the RL framework allows for the environment and the rewards to be unknown. The minimal information requirements of RL, combined with the generality of the MDP framework, allows RL algorithms to be applied to a vast range of problems. Later, the two, along with others, developed many of the basic algorithmic approaches for RL, leading to their textbook Reinforcement Learning: An Introduction in 1988, which is still a standard reference in the field, having been cited more than 75,000 times. Image: Andrew Barto and Richard Sutton However, successful practical applications for RL came decades later, and include the development of OpenAI's ChatGPT, which employs a technique called reinforcement learning from human feedback to capture human expectations in its responses. Moreover, RL is also widely applied in various sectors, including chip design, internet advertising and global supply chain optimisation. "Barto and Sutton's work demonstrates the immense potential of applying a multidisciplinary approach to longstanding challenges in our field," said Yannis Ioannidis, the president of ACM. "Research areas ranging from cognitive science and psychology to neuroscience inspired the development of reinforcement learning, which has laid the foundations for some of the most important advances in AI and has given us greater insight into how the brain works. "Barto and Sutton's work is not a stepping stone that we have now moved on from. Reinforcement learning continues to grow and offers great potential for further advances in computing and many other disciplines." While senior VP at Google Jeff Dean said that the awardees' work has been a "lynchpin of progress in AI over the last several decades". The company financially supported the $1m cash prize that the awardees received today (5 March). "In a 1947 lecture, Alan Turing stated 'What we want is a machine that can learn from experience'. Reinforcement learning, as pioneered by Barto and Sutton, directly answers Turing's challenge," Dean said. "The tools they developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers and driven billions of dollars in investments. RL's impact will continue well into the future." The Turing Award, often referred to as the 'Nobel Prize in Computing,' is named after Alan M Turing, the British mathematician who articulated the mathematical foundations of computing. Last year, theoretical computer scientist Avi Wigderson won the prestigious award for reshaping our understanding of the role of randomness in computation. Previous winners include AI leader Geoffrey Hinton, who also won last year's Nobel Prize in Physics, Lisp programming inventor John McCarthy and software design pioneer Niklaus Wirth. Don't miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic's digest of need-to-know sci-tech news.
[3]
Latest Turing Award winners again warn of AI dangers
University of Massachusetts researcher Andrew Barto and former DeepMind research scientist Richard Sutton warned that AI companies are not thoroughly testing products before releasing them, likening the development to "building a bridge and testing it by having people use it," according to The Financial Times. The Turing Award, often referred to as the "Nobel Prize of Computing," carries a $1 million prize and was jointly awarded to Barto and Sutton for developing "reinforcement learning" -- a machine learning method that trains AI systems to make optimized decisions through trial and error. Google's senior vice president Jeff Dean describes the technique as "a lynchpin of progress in AI" and has remained "a central pillar of the AI boom" that led to breakthrough models like OpenAI's ChatGPT and Google's AlphaGo before that.
[4]
AI pioneers who channeled 'hedonistic' machines win computer science's top prize
Teaching machines in the way that animal trainers mold the behavior of dogs or horses has been an important method for developing artificial intelligence and one that was recognized Wednesday with the top computer science award Teaching machines in the way that animal trainers mold the behavior of dogs or horses has been an important method for developing artificial intelligence and one that was recognized Wednesday with the top computer science award. Two pioneers in the field of reinforcement learning, Andrew Barto and Richard Sutton, are the winners of this year's A.M. Turing Award, the tech world's equivalent of the Nobel Prize. Research that Barto, 76, and Sutton, 67, began in the late 1970s paved the way for some of the past decade's AI breakthroughs. At the heart of their work was channeling so-called "hedonistic" machines that could continuously adapt their behavior in response to positive signals. Reinforcement learning is what led a Google computer program to beat the world's best human players of the ancient Chinese board game Go in 2016 and 2017. It's also been a key technique in improving popular AI tools like ChatGPT, optimizing financial trading and helping a robotic hand solve a Rubik's Cube. But Barto said the field was "not fashionable" when he and his doctoral student, Sutton, began crafting their theories and algorithms at the University of Massachusetts, Amherst. "We were kind of in the wilderness," Barto said in an interview with The Associated Press. "Which is why it's so gratifying to receive this award, to see this becoming more recognized as something relevant and interesting. In the early days, it was not." Google sponsors the annual $1 million prize, which was announced Wednesday by the Association for Computing Machinery. Barto, now retired from the University of Massachusetts, and Sutton, a longtime professor at Canada's University of Alberta, aren't the first AI pioneers to win the award named after British mathematician, codebreaker and early AI thinker Alan Turing. But their research has directly sought to answer Turing's 1947 call for a machine that "can learn from experience" -- which Sutton describes as "arguably the essential idea of reinforcement learning." In particular, they borrowed from ideas in psychology and neuroscience about the way that pleasure-seeking neurons respond to rewards or punishment. In one landmark paper published in the early 1980s, Barto and Sutton set their new approach on a specific task in a simulated world: balance a pole on a moving cart to keep it from falling. The two computer scientists later co-authored a widely used textbook on reinforcement learning. "The tools they developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments," said Google's chief scientist Jeff Dean in a written statement. In a joint interview with the AP, Barto and Sutton didn't always agree on how to evaluate the risks of AI agents that are constantly seeking to improve themselves. They also distinguished their work from the branch of generative AI technology that is currently in fashion -- the large language models behind chatbots made by OpenAI, Google and other tech giants that mimic human writing and other media. "The big choice is, do you try to learn from people's data, or do you try to learn from an (AI) agent's own life and its own experience?" Sutton said. Sutton has dismissed what he describes as overblown concerns about AI's threat to humanity, while Barto disagreed and said "You have to be cognizant of potential unexpected consequences." Barto, retired for 14 years, describes himself as a Luddite, while Sutton is embracing a future he expects to have beings of greater intelligence than current humans -- an idea sometimes known as posthumanism. "People are machines. They're amazing, wonderful machines," but they are also not the "end product" and could work better, Sutton said. "It's intrinsically a part of the AI enterprise," Sutton said. "We're trying to understand ourselves and, of course, to make things that can work even better. Maybe to become such things."
[5]
AI pioneers who channeled 'hedonistic' machines win computer science's top prize
Teaching machines in the way that animal trainers mold the behavior of dogs or horses has been an important method for developing artificial intelligence and one that was recognized Wednesday with the top computer science award. Two pioneers in the field of reinforcement learning, Andrew Barto and Richard Sutton, are the winners of this year's A.M. Turing Award, the tech world's equivalent of the Nobel Prize. Research that Barto, 76, and Sutton, 67, began in the late 1970s paved the way for some of the past decade's AI breakthroughs. At the heart of their work was channeling so-called "hedonistic" machines that could continuously adapt their behavior in response to positive signals. Reinforcement learning is what led a Google computer program to beat the world's best human players of the ancient Chinese board game Go in 2016 and 2017. It's also been a key technique in improving popular AI tools like ChatGPT, optimizing financial trading and helping a robotic hand solve a Rubik's Cube. But Barto said the field was "not fashionable" when he and his doctoral student, Sutton, began crafting their theories and algorithms at the University of Massachusetts, Amherst. "We were kind of in the wilderness," Barto said in an interview with The Associated Press. "Which is why it's so gratifying to receive this award, to see this becoming more recognized as something relevant and interesting. In the early days, it was not." Google sponsors the annual $1 million prize, which was announced Wednesday by the Association for Computing Machinery. Barto, now retired from the University of Massachusetts, and Sutton, a longtime professor at Canada's University of Alberta, aren't the first AI pioneers to win the award named after British mathematician, codebreaker and early AI thinker Alan Turing. But their research has directly sought to answer Turing's 1947 call for a machine that "can learn from experience" -- which Sutton describes as "arguably the essential idea of reinforcement learning." In particular, they borrowed from ideas in psychology and neuroscience about the way that pleasure-seeking neurons respond to rewards or punishment. In one landmark paper published in the early 1980s, Barto and Sutton set their new approach on a specific task in a simulated world: balance a pole on a moving cart to keep it from falling. The two computer scientists later co-authored a widely used textbook on reinforcement learning. "The tools they developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments," said Google's chief scientist Jeff Dean in a written statement. In a joint interview with the AP, Barto and Sutton didn't always agree on how to evaluate the risks of AI agents that are constantly seeking to improve themselves. They also distinguished their work from the branch of generative AI technology that is currently in fashion -- the large language models behind chatbots made by OpenAI, Google and other tech giants that mimic human writing and other media. "The big choice is, do you try to learn from people's data, or do you try to learn from an (AI) agent's own life and its own experience?" Sutton said. Sutton has dismissed what he describes as overblown concerns about AI's threat to humanity, while Barto disagreed and said "You have to be cognizant of potential unexpected consequences." Barto, retired for 14 years, describes himself as a Luddite, while Sutton is embracing a future he expects to have beings of greater intelligence than current humans -- an idea sometimes known as posthumanism. "People are machines. They're amazing, wonderful machines," but they are also not the "end product" and could work better, Sutton said. "It's intrinsically a part of the AI enterprise," Sutton said. "We're trying to understand ourselves and, of course, to make things that can work even better. Maybe to become such things."
[6]
AI pioneers win the Turing Award, tech's top prize
Teaching machines in the way that animal trainers mold the behavior of dogs or horses has been an important method for developing artificial intelligence and one that was recognized Wednesday with the top computer science award. Two pioneers in the field of reinforcement learning, Andrew Barto and Richard Sutton, are the winners of this year's A.M. Turing Award, the tech world's equivalent of the Nobel Prize. Research that Barto, 76, and Sutton, 67, began in the late 1970s paved the way for some of the past decade's AI breakthroughs. At the heart of their work was channeling so-called "hedonistic" machines that could continuously adapt their behavior in response to positive signals. Reinforcement learning is what led a Google computer program to beat the world's best human players of the ancient Chinese board game Go in 2016 and 2017. It's also been a key technique in improving popular AI tools like ChatGPT, optimizing financial trading and helping a robotic hand solve a Rubik's Cube. But Barto said the field was "not fashionable" when he and his doctoral student, Sutton, began crafting their theories and algorithms at the University of Massachusetts, Amherst. "We were kind of in the wilderness," Barto said in an interview with The Associated Press. "Which is why it's so gratifying to receive this award, to see this becoming more recognized as something relevant and interesting. In the early days, it was not." Google sponsors the annual $1 million prize, which was announced Wednesday by the Association for Computing Machinery. Barto, now retired from the University of Massachusetts, and Sutton, a longtime professor at Canada's University of Alberta, aren't the first AI pioneers to win the award named after British mathematician, codebreaker and early AI thinker Alan Turing. But their research has directly sought to answer Turing's 1947 call for a machine that "can learn from experience" -- which Sutton describes as "arguably the essential idea of reinforcement learning." In particular, they borrowed from ideas in psychology and neuroscience about the way that pleasure-seeking neurons respond to rewards or punishment. In one landmark paper published in the early 1980s, Barto and Sutton set their new approach on a specific task in a simulated world: balance a pole on a moving cart to keep it from falling. The two computer scientists later coauthored a widely used textbook on reinforcement learning. "The tools they developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments," said Google's chief scientist Jeff Dean in a written statement. In a joint interview with the AP, Barto and Sutton didn't always agree on how to evaluate the risks of AI agents that are constantly seeking to improve themselves. They also distinguished their work from the branch of generative AI technology that is currently in fashion -- the large language models behind chatbots made by OpenAI, Google and other tech giants that mimic human writing and other media. "The big choice is, do you try to learn from people's data, or do you try to learn from an (AI) agent's own life and its own experience?" Sutton said. Sutton has dismissed what he describes as overblown concerns about AI's threat to humanity, while Barto disagreed and said "You have to be cognizant of potential unexpected consequences." Barto, retired for 14 years, describes himself as a Luddite, while Sutton is embracing a future he expects to have beings of greater intelligence than current humans -- an idea sometimes known as posthumanism. "People are machines. They're amazing, wonderful machines," but they are also not the "end product" and could work better, Sutton said. "It's intrinsically a part of the AI enterprise," Sutton said. "We're trying to understand ourselves and, of course, to make things that can work even better. Maybe to become such things."
[7]
Technique Behind ChatGPT's AI Wins Computing's Top Prize -- But Its Creators Are Worried - Decrypt
Andrew Barto and Richard Sutton, who received computing's highest honor this week for their foundational work on reinforcement learning, didn't waste any time using their new platform to sound alarms about unsafe AI development practices in the industry. The pair were announced as recipients of the 2024 ACM A.M. Turing Award on Wednesday, often dubbed the "Nobel Prize of Computing," and is accompanied by a $1 million prize funded by Google. Rather than simply celebrating their achievement, they immediately criticized what they see as dangerously rushed deployment of AI technologies. "Releasing software to millions of people without safeguards is not good engineering practice," Barto told The Financial Times. "Engineering practice has evolved to try to mitigate the negative consequences of technology, and I don't see that being practiced by the companies that are developing." Their assessment likened current AI development practices like "building a bridge and testing it by having people use it" without proper safety checks in place, as AI companies seek to prioritize business incentives over responsible innovation. The duo's journey began in the late 1970s when Sutton was Barto's student at the University of Massachusetts. Throughout the 1980s, they developed reinforcement learning -- a technique where AI systems learn through trial and error by receiving rewards or penalties -- when few believed in the approach. Their work culminated in their seminal 1998 textbook "Reinforcement Learning: An Introduction," which has been cited almost 80 thousand times and became the bible for a generation of AI researchers. "Barto and Sutton's work demonstrates the immense potential of applying a multidisciplinary approach to longstanding challenges in our field," ACM President Yannis Ioannidis said in an announcement. "Reinforcement learning continues to grow and offers great potential for further advances in computing and many other disciplines." The $1 million Turing Award comes as reinforcement learning continues to drive innovation across robotics, chip design, and large language models, with reinforcement learning from human feedback (RLHF) becoming a critical training method for systems like ChatGPT. Still, the pair's warnings echo growing concerns from other big names in the field of computer science. Yoshua Bengio, himself a Turing Award recipient, publicly supported their stance on Bluesky. "Congratulations to Rich Sutton and Andrew Barto on receiving the Turing Award in recognition of their significant contributions to ML," he said. "I also stand with them: Releasing models to the public without the right technical and societal safeguards is irresponsible." Their position aligns with criticisms from Geoffrey Hinton, another Turing Award winner -- known as the godfather of AI -- as well as a 2023 statement from top AI researchers and executives -- including OpenAI CEO Sam Altman -- that called for mitigating extinction risks from AI as a global priority. Former OpenAI researchers have raised similar concerns. Jan Leike, who recently resigned as head of OpenAI's alignment initiatives and joined rival AI company Anthropic, pointed to an inadequate safety focus, writing that "building smarter-than-human machines is an inherently dangerous endeavor." "Over the past years, safety culture and processes have taken a backseat to shiny products," Leike said. Leopold Aschenbrenner, another former OpenAI safety researcher, called security practices at the company "egregiously insufficient." At the same time, Paul Christiano, who also previously led OpenAI's language model alignment team, suggested there might be a "10-20% chance of AI takeover, [with] many [or] most humans dead." Despite their warnings, Barto and Sutton maintain a cautiously optimistic outlook on AI's potential. In an interview with Axios, both suggested that current fears about AI might be overblown, though they acknowledge significant social upheaval is possible. "I think there's a lot of opportunity for these systems to improve many aspects of our life and society, assuming sufficient caution is taken," Barto told Axios. Sutton sees artificial general intelligence as a watershed moment, framing it as an opportunity to introduce new "minds" into the world without them developing through biological evolution -- essentially opening the gates for humanity to interact with sentient machines in the future.
[8]
Andrew Barto and Richard Sutton win Turing award for AI training trick
The Turing award, often considered the Nobel prize of computing, has gone to two computer scientists for their work on reinforcement learning, a key technique in training artificial intelligence models Andrew Barto and Richard Sutton have won the 2024 Turing award, which is often called the Nobel prize of computing, for their fundamental work on ideas in machine learning that later proved crucial to the success of artificial intelligence models such as Google DeepMind's AlphaGo. Barto, who is now retired and lives in Cape Cod, Massachusetts, didn't even realise he was nominated for the award. "I joined a Zoom with some people and was told and I was...
[9]
Pioneers of Reinforcement Learning Win the Turing Award
In the 1980s, Andrew Barto and Rich Sutton were considered eccentric devotees to an elegant but ultimately doomed idea -- having machines learn, as humans and animals do, from experience. Decades on, with the technique they pioneered now increasingly critical to modern artificial intelligence and programs like ChatGPT, Barto and Sutton have been awarded the Turing Award, the highest honor in the field of computer science. Barto, a professor emeritus at the University of Massachusetts Amherst, and Sutton, a professor at the University of Alberta, trailblazed a technique known as reinforcement learning, which involves coaxing a computer to perform tasks through experimentation combined with either positive or negative feedback. "When this work started for me, it was extremely unfashionable," Barto recalls with a smile, speaking over Zoom from his home in Massachusetts. "It's been remarkable that [it has] achieved some influence and some attention," Barto adds. Reinforcement learning was perhaps most famously used by Google DeepMind in 2016 to build AlphaGo, a program that learned for itself how to play the incredibly complex and subtle board game of Go to an expert level. This demonstration sparked new interest in the technique, which has gone on to be used in advertising, optimizing data-center energy use, finance, and chip design. The approach also has a long history in robotics, where it can help machines learn to perform physical tasks through trial and error. More recently, reinforcement learning has been crucial to guiding the output of large language models (LLMs) and producing extraordinarily capable chatbot programs. The same method is also being used to train AI models to mimic human reasoning, and to build more capable AI agents. Sutton notes, however, that the methods used to guide LLMs involve humans providing goals rather than an algorithm learning purely through its own exploration. He says having machines learn entirely on their own may ultimately be more fruitful. "The big division is whether [AI is] learning from people or whether it's learning from its own experience," he says. Barto and Sutton's "work has been a lynchpin of progress in AI over the last several decades," Jeff Dean, a senior vice president at Google, said in a statement released by the Association for Computing Machinery (ACM) which hands out the Turing Award. "The tools they developed remain a central pillar of the AI boom and have rendered major advances." Reinforcement has a long and checkered history within AI. It was there at the dawn of the field, when Alan Turing suggested that machines could learn through experience and feedback in his famous 1950 paper "Computing Machinery and Intelligence," which examines the notion that a machine might someday think like a human. Arthur Samuel, an AI pioneer, used reinforcement learning to build one of the first machine learning programs, a system capable of playing checkers, in 1955.
[10]
Turing Award winners warn over unsafe deployment of AI models
Cristina Criddle in San Francisco and Melissa Heikkilä in London Two pioneers of reinforcement learning, a scientific technique that has been fundamental to the artificial intelligence boom, have warned against the unsafe deployment of AI models after winning this year's Turing Award. Andrew Barto, a professor emeritus at the University of Massachusetts, and Richard Sutton, a professor at the University of Alberta and former research scientist at DeepMind, have won the $1mn prize from the Association for Computing Machinery for developing the groundbreaking method. Barto and Sutton developed reinforcement learning in the 1980s after they were inspired by psychology and how people learn. The machine learning technique, which rewards AI systems for behaving in a desired way, has helped power the success of some of the world's top AI groups, such as OpenAI and Google. The winners of the award, which is often dubbed the Nobel Prize of computing, said they were concerned about AI companies rushing to launch products before thoroughly testing them. "Releasing software to millions of people without safeguards is not good engineering practice," said Barto, likening it to building a bridge and testing it by having people use it. "Engineering practice has evolved to try to mitigate the negative consequences of technology, and I don't see that being practised by the companies that are developing," he added. The award, which is named after British mathematician Alan Turing, comes after AI breakthroughs were also recognised in both the chemistry and physics Nobel Prizes in October. This highlighted the importance of computing tools and data science in cracking complex scientific problems at far shorter timescales. "The tools [Barto and Sutton] developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments. [Reinforcement learning's] impact will continue well into the future," said Jeff Dean, senior vice-president at Google, which sponsored the prize. Google DeepMind used the technique to develop AlphaGo, an AI system that beat human players in the game Go, a major milestone in AI research. OpenAI also used a type of reinforcement learning that relies on human feedback to control ChatGPT's output. But both Barto and Sutton warned against the current pace of AI development, where firms are racing to launch models that are powerful but prone to making errors, raising unprecedented amounts of funding and investing billions in infrastructure like data centres to train and run AI. Big Tech groups have said AI spending could exceed $320bn this year, while OpenAI, which launched ChatGPT in 2022, is currently raising $40bn in new funding at a $260bn valuation. Barto criticised the AI sector for being motivated by business incentives, instead of furthering AI research. "The idea of having huge data centres and then charging a certain amount to use the software is motivating things, and that is not the motive that I would subscribe to," he added. OpenAI has argued it needs to unlock further investment through a more traditional corporate structure in order to achieve the company's founding 'mission' of ensuring that artificial general intelligence (AGI) -- a scenario where computer systems achieve similar or superior levels of intelligence to humans -- benefits humanity. But Sutton dismissed tech companies' narrative around AGI as "hype". "AGI is a weird term because there's always been AI and people trying to understand intelligence." He added that "systems that are more intelligent than people" will happen eventually through a better understanding of the human mind. Barto and Sutton also criticised US President Donald Trump's attempt to slash federal spending on scientific research and lay off staff at US science agencies. This could have devastating consequences for US dominance in science, said Barto, who called it "wrong and a tragedy not only to this country but to the world". He added that the opportunities to do the kind of research that enabled their work in reinforcement learning would "disappear" without the freedom to explore abstract, unproven concepts. Despite their concerns, both scientists are optimistic about the potential for reinforcement learning, combined with AI, to bring positive outcomes to the world. "We have the potential to become less greedy and selfish and more aware of what's going on in others . . . there are many things wrong in the world, but too much intelligence is not one of them," said Sutton.
[11]
AI pioneers who channeled 'hedonistic' machines win computer science's top prize
Teaching machines in the way that animal trainers mold the behavior of dogs or horses has been an important method for developing artificial intelligence and one that was recognized Wednesday with the top computer science award. Two pioneers in the field of reinforcement learning, Andrew Barto and Richard Sutton, are the winners of this year's A.M. Turing Award, the tech world's equivalent of the Nobel Prize. Research that Barto, 76, and Sutton, 67, began in the late 1970s paved the way for some of the past decade's AI breakthroughs. At the heart of their work was channeling so-called "hedonistic" machines that could continuously adapt their behavior in response to positive signals. Reinforcement learning is what led a Google computer program to beat the world's best human players of the ancient Chinese board game Go in 2016 and 2017. It's also been a key technique in improving popular AI tools like ChatGPT, optimizing financial trading and helping a robotic hand solve a Rubik's Cube. But Barto said the field was "not fashionable" when he and his doctoral student, Sutton, began crafting their theories and algorithms at the University of Massachusetts, Amherst. "We were kind of in the wilderness," Barto said in an interview with The Associated Press. "Which is why it's so gratifying to receive this award, to see this becoming more recognized as something relevant and interesting. In the early days, it was not." Google sponsors the annual $1 million prize, which was announced Wednesday by the Association for Computing Machinery. Barto, now retired from the University of Massachusetts, and Sutton, a longtime professor at Canada's University of Alberta, aren't the first AI pioneers to win the award named after British mathematician, codebreaker and early AI thinker Alan Turing. But their research has directly sought to answer Turing's 1947 call for a machine that "can learn from experience" -- which Sutton describes as "arguably the essential idea of reinforcement learning." In particular, they borrowed from ideas in psychology and neuroscience about the way that pleasure-seeking neurons respond to rewards or punishment. In one landmark paper published in the early 1980s, Barto and Sutton set their new approach on a specific task in a simulated world: balance a pole on a moving cart to keep it from falling. The two computer scientists later co-authored a widely used textbook on reinforcement learning. "The tools they developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments," said Google's chief scientist Jeff Dean in a written statement. In a joint interview with the AP, Barto and Sutton didn't always agree on how to evaluate the risks of AI agents that are constantly seeking to improve themselves. They also distinguished their work from the branch of generative AI technology that is currently in fashion -- the large language models behind chatbots made by OpenAI, Google and other tech giants that mimic human writing and other media. "The big choice is, do you try to learn from people's data, or do you try to learn from an (AI) agent's own life and its own experience?" Sutton said. Sutton has dismissed what he describes as overblown concerns about AI's threat to humanity, while Barto disagreed and said "You have to be cognizant of potential unexpected consequences." Barto, retired for 14 years, describes himself as a Luddite, while Sutton is embracing a future he expects to have beings of greater intelligence than current humans -- an idea sometimes known as posthumanism. "People are machines. They're amazing, wonderful machines," but they are also not the "end product" and could work better, Sutton said. "It's intrinsically a part of the AI enterprise," Sutton said. "We're trying to understand ourselves and, of course, to make things that can work even better. Maybe to become such things."
[12]
AI pioneers scoop Turing Award for reinforcement learning work | TechCrunch
Two trailblazing computer scientists have won the 2024 Turing Award for their work in reinforcement learning, a discipline in which machines learn through a reward-based trial-and-error approach that lets them adapt within constrained or dynamic environments. Andrew G. Barto, a professor emeritus at the University of Massachusetts Amherst; and Richard S. Sutton, a professor at the University of Alberta, developed key algorithms and theories through a seminal series of papers starting in the 1980s. This includes work on a reinforcement technique called temporal difference learning; the duo later published an academic textbook called Reinforcement Learning: An Introduction. Esteemed mathematician Alan Turing (pictured above), after whom the Turing Award is named, also produced a paper in the 1950s called Computing Machinery and Intelligence that questioned whether computers can think and touched on similar concepts around learning from experience. In more recent years, reinforcement learning has received more attention after Google Deepmind used the technique to build an AI that defeated the world's best AlphaGo players. And in the past few months, Chinese AI upstart DeepSeek hit the headlines for its game-changing R1 reasoning model, which leaned heavily on reinforcement learning to create more cost-effective foundation models. The Turing Award, administered by the Association for Computing Machinery (ACM), has often been dubbed the "Nobel Prize for computing." However, the Nobel Prize itself has been encroaching into the computing realm, particularly around AI; Geoff Hinton and John Hopfield won the Nobel Prize in Physics for their work in foundational AI last year. This was followed shortly after by DeepMind's Demis Hassabis and John Jumper who were awarded the Nobel Prize in Chemistry for their work on AlphaFold. "Research areas ranging from cognitive science and psychology to neuroscience inspired the development of reinforcement learning, which has laid the foundations for some of the most important advances in AI and has given us greater insight into how the brain works," ACM president Yannis Ioannidis said in a press release. "Barto and Sutton's work is not a stepping stone that we have now moved on from. Reinforcement learning continues to grow and offers great potential for further advances in computing and many other disciplines. It is fitting that we are honoring them with the most prestigious award in our field." Other notable AI pioneers to win the Turing Award include Meta's chief AI scientist Yann LeCun, who was awarded the prize in 2018 alongside Geoff Hinton and Yoshua Bengio for their work on deep neural networks. Barto and Sutton will share the $1 million cash prize, which was provided with support from Google.
[13]
Andrew Barto and Richard Sutton Win the Turing Award for Reinforcement Learning
Their 1998 textbook Reinforcement Learning: An Introduction remains a standard reference, cited over 75,000 times. ACM, the Association for Computing Machinery, has awarded the 2024 ACM A.M. Turing Award to Andrew G. Barto and Richard S. Sutton for their contributions to reinforcement learning. Their work laid the conceptual and algorithmic foundations of the field, influencing modern artificial intelligence. Barto, Professor Emeritus at the University of Massachusetts Amherst, and Sutton, Professor at the University of Alberta and Research Scientist at Keen Technologies, have been recognised for research spanning decades. The Turing Award, often called the "Nobel Prize in Computing," includes a $1 million prize funded by Google. "Barto and Sutton's work demonstrates the immense potential of applying a multidisciplinary approach to longstanding challenges in our field," said ACM President Yannis Ioannidis. "Their contributions continue to shape AI and provide insight into how the brain works." Reinforcement learning (RL) focuses on training intelligent systems through reward-based mechanisms. The approach, inspired by psychology and neuroscience, builds on Markov decision processes, where agents learn optimal strategies through trial and error. In the 1980s, Barto and Sutton formalised RL as a general problem framework and introduced key algorithms, including temporal difference learning and policy-gradient methods. Their 1998 textbook Reinforcement Learning: An Introduction remains a standard reference, cited over 75,000 times. Their ideas influenced the integration of RL with deep learning, leading to advancements such as AlphaGo's victories over human Go players and reinforcement learning from human feedback (RLHF) used in ChatGPT. "In a 1947 lecture, Alan Turing stated, 'What we want is a machine that can learn from experience,'" said Jeff Dean, senior vice president at Google. "Reinforcement learning, as pioneered by Barto and Sutton, directly answers Turing's challenge." RL has been applied in areas such as robotics, network congestion control, chip design, and global supply chain optimisation. Research also suggests RL models align with findings on dopamine system functions in neuroscience. Barto and Sutton's contributions continue to impact AI, with applications expanding across industries. Their recognition with the Turing Award highlights the lasting significance of reinforcement learning in computing and beyond. This recognition follows a growing acknowledgment of AI's role in advancing scientific discovery. Last year, Demis Hassabis, CEO and co-founder of Google DeepMind, and John M. Jumper were awarded the Nobel Prize in Chemistry for their contributions to protein structure prediction through the AI system AlphaFold, alongside David Baker, a professor at the University of Washington. Geoffrey Hinton, known as the 'Godfather of AI,' was also awarded the Nobel Prize in Physics alongside John Hopfield for developing the Boltzmann machine, a neural network model inspired by statistical physics.
[14]
AI scholars win Turing Prize for technique that made possible AlphaGo's chess triumph
Some of the flashiest achievements in artificial intelligence in the past decade have come from a technique by which the computer acts randomly from a set of choices and is rewarded or punished for each correct or wrong move. It's the technique most famously employed in AlphaZero, Google DeepMind's 2016 program that achieved mastery at the games of chess, shogi, and Go in 2018. The same approach helped the AlphaStar program achieve "grandmaster" play in the video game Starcraft II. Also: 50 years ago the Homebrew Computer Club met for the first time - and sparked a tech revolution On Wednesday, two AI scholars were rewarded for advancing so-called reinforcement learning, a very broad approach to how a computer proceeds in an unknown environment. Andrew G. Barto, professor emeritus in the Department of Information and Computer Sciences at the University of Massachusetts, Amherst, and Richard S. Sutton, professor of computer science at the University of Alberta, Canada, were jointly awarded the 2025 Turing Award by the Association for Computing Machinery. The ACM award states that "Barto and Sutton introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning -- one of the most important approaches for creating intelligent systems." The ACM honor comes with a $1 million prize and is widely viewed as the computer industry's equivalent of a Nobel Prize. Reinforcement learning can be thought of by analogy with a mouse in a maze: the mouse must find its way through an unknown environment to an ultimate reward, the cheese. To do so, the mouse must learn which moves seem to lead to progress and which lead to dead ends. Also: Open AI, Anthropic invite US scientists to experiment with frontier models Neuroscientists and others have hypothesized that intelligent entities such as mice have an "internal model of the world," which lets them retain lessons from exploring the mazes and other challenges, and formulate plans. Sutton and Barto hypothesized that a computer could be similarly made to formulate an internal model of the state of its world. Reinforcement learning programs absorb information about the environment, be it a maze or a chess board, as their input. The program acts somewhat randomly at first, trying out different moves in that environment. The moves either meet with rewards or lack of rewards. That feedback, positive and negative, starts to form a calculation by the program, an estimation of what rewards can be obtained by making different moves. Based on that estimation, the program formulates a "policy" to guide future actions to success. At a high level, such programs must balance the tactics of exploring new choices of action, on the one hand, and exploiting known good choices on the other, for neither alone will lead to success. Those wanting to dig deeper can get a copy of the textbook on the matter that Sutton and Barto wrote on the topic in 2018. Reinforcement learning in the sense that Sutton and Barto use it is not the same as reinforcement learning referenced by OpenAI and other purveyors of large language model AI. OpenAI and others use "reinforcement learning from human feedback," RLHF, to shape the output of GPT and other large language models to be inoffensive and helpful. But that is a different AI technique, only the name has been borrowed. Sutton, who was also a Distinguished Research Scientist at DeepMind from 2017 to 2023, has emphasized in recent years that reinforcement learning is a theory of thought. During a 2020 symposium on AI, Sutton bemoaned that "there is very little computational theory" in AI today. Also: Gartner identifies top trends in data and analytics for 2025 - and AI takes the lead "Reinforcement learning is the first computational theory of intelligence," declared Sutton. "AI needs an agreed-upon computational theory of intelligence," he added, and "RL is the stand-out candidate for that." Reinforcement learning may also have implications for how creativity and free play can happen as an expression of intelligence, including in artificial intelligence. Barto and Sutton have emphasized the importance of play in learning. During the 2020 symposium, Sutton remarked that in reinforcement learning, curiosity has a "low-level role," to drive exploration. "In recent years, people have begun to look at a larger role for what we are referring to, which I like to refer to as 'play'," said Sutton. "We set goals that are not necessarily useful, but may be useful later. I set a task and say, Hey, what am I able to do. What affordances." Sutton said play might be among the "big things" people do. "Play is a big thing," he said.
[15]
A.I. Pioneers Andrew Barto and Richard Sutton Win the Turing Award
The annual award is often referred to as the "Nobel Prize of Computing." This year's Turing Award, an honor often dubbed the "Nobel Prize of Computing," goes to two A.I. researchers who laid the foundations for tech breakthroughs like OpenAI's GPT. Andrew Barto, a researcher at the University of Massachusetts Amherst, and Richard Sutton, a professor at Canada's University of Alberta, will share the $1 million prize, as announced today (March 5). Sign Up For Our Daily Newsletter Sign Up Thank you for signing up! By clicking submit, you agree to our <a href="http://observermedia.com/terms">terms of service</a> and acknowledge we may use your information to send you emails, product samples, and promotions on this website and other properties. You can opt out anytime. See all of our newsletters Named for the British mathematician Alan Turing, the award bestows a cash prize with financial support from Google (GOOGL) and is given annually by the Association for Computing Machinery (ACM). Past winners included A.I. researchers like Geoffrey Hinton, Yoshua Bengio and Yann LeCun, who received the 2018 Turing Award for their work in artificial neural networks. Barto and Sutton won this year's award for their contributions to the field of reinforcement learning, a process used to improve the behavior of machines. "Their work has been a lynchpin of progress in A.I. over the last several decades," said Jeff Dean, Google's chief scientist, in a statement. "The tools they developed remain a central pillar of the A.I. boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments." The duo began working together in 1978 at UMass Amherst, where Barto served as Sutton's Ph.D. and postdoctoral advisor. In the following years, they collaborated on numerous papers that shaped key algorithms and techniques of reinforcement learning and published the 1998 textbook Reinforcement Learning: An Introduction, which has been cited more than 75,000 times and remains the field's standard reference. While their work took place decades ago, it remains more relevant than ever, noted Yannis Ioannidis, president of ACM. "Barto and Sutton's work is not a stepping stone that we have now moved on from," he said in a statement. Their research paved the way for reinforcement learning's application across major A.I. milestones like AlphaGo, the DeepMind system that triumphed human Go players in 2016 and 2017; and ChatGPT, the groundbreaking technology released by OpenAI in 2022. Is A.I. moving too fast? Despite their longstanding involvement in A.I.'s emergence, this year's Turing recipients are cautious about the emerging technology's rapid growth. Companies should prioritize safety and testing over commercial pressures, according to Barto, who told the Financial Times that "releasing software to millions of people without safeguards is not good engineering practice." Barto is currently a professor emeritus of information and computer sciences at UMass Amherst. Sutton teaches computer science at the University of Alberta, serves as chief scientific advisor of the Alberta Machine Intelligence Institute and is a research scientist at Keen Technologies, a Dallas-based A.I. company. Despite warning against A.I.'s fast pace, Sutton has taken a different safety approach to fellow researchers Hinton and Bengio, who have been vocal about the technology's existential threats. While the researcher is troubled by A.I.'s potential military applications and ability to spread misinformation through errors or hallucinations, he's also concerned about a backlash in highlighting its risks. "Doomers are out of line and the concerns are overblown," said Sutton in an interview with BetaKit, adding that he is more worried that A.I. will be unfairly blamed for global issues and cause the field to become "demonized inappropriately."
[16]
Reinforcement learning pioneers harshly criticize the "unsafe" state of AI development
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Who are they? Richard Sutton and Andrew Barto are pioneers of reinforcement learning, a machine learning technique modern AI models utilize. Sutton is often referred to as the "father of reinforcement learning" and serves as a professor at the University of Alberta. Barto is a professor emeritus at the University of Massachusetts. Both scientists are not particularly pleased with how AI companies are applying their life's work. Richard Sutton and Andrew Barto won this year's Turing Award, considered the Nobel Prize for computing, for their significant contributions to machine learning development. The two researchers are now speaking out against OpenAI, Google, and other AI companies releasing potentially dangerous software to end customers. They criticized ChatGPT as just a money-making machine that will never produce a working artificial general intelligence (AGI). Sutton and Barto developed reinforcement learning (RL) during the 1980s, inspired by behaviorist psychology. Reinforcement learning is one of the three basic machine learning paradigms, along with supervised and unsupervised learning. Reinforcement learning teaches AI agents, through trial and error, to make decisions that achieve the most optimal results, similar to how humans learn. OpenAI, Google, and other corporations build their AI platforms with RL. Financial Times notes that Barto believes that bringing this kind of AI software to millions of people without safeguards is inherently wrong. Using a metaphor, Sutton and Barto pointed out that most or all AI companies are building a bridge and testing its structural integrity by opening it to the public. Barto says that sound engineering practices suggest that developers try to mitigate the negative consequences of technology. Neither OpenAI nor any other AI-focused company is doing that. Current AI models make errors, hallucinating non-existing "facts" with binary confidence, but the companies behind them are collecting billions of dollars in unprecedented funding campaigns. "The idea of having huge data centers and then charging a certain amount to use the software is motivating things, and that is not the motive that I would subscribe to," Barto said. For-profit companies only seek money-making opportunities. The eventual event of one of them bringing the first (AGI) onto the world is just bragging rights; even those are leveraged to boost sales. Proponents of AGI think that this kind of superhuman, all-digital intelligence is almost here and will radically revolutionize technology and everything else. Sutton suggested that AGI is just a buzzword for marketing campaigns. Barto remarked that companies developing AI need to gain a better understanding of how the human mind works before they can responsibly build systems with human-level intelligence.
[17]
Two AI Pioneers Win Turing Award for Key Technique Used in ChatGPT | PYMNTS.com
Two professors who pioneered a key technique that became central to the development of artificial intelligence (AI) systems like ChatGPT have won the Turing Award, which is considered the Nobel Prize of computing. Andrew Barto, professor emeritus of information and computer sciences at the University of Massachusetts at Amherst, and Richard Sutton, a professor of computing science at the University of Alberta in Canada, developed the conceptual and algorithmic foundations of reinforcement learning. The two will split the $1 million award, which was named after Alan M. Turing, the British mathematician widely known as one of the fathers of computer science and AI. Reinforcement learning enables AI systems to learn through trial and error. This approach has been used in breakthrough applications such as AlphaGo's victory over Lee Sedol, the world champion in the game Go, and OpenAI's ChatGPT, which uses reinforcement learning from human feedback (RLHF). Barto and Sutton's 1998 textbook, "Reinforcement Learning: An Introduction," is the field's standard reference with over 75,000 citations. Their collaboration began in 1978 at UMass Amherst when Barto was Sutton's Ph.D and postdoctoral adviser. They developed many of the basic algorithmic approaches for reinforcement learning. Their key contributions include temporal difference learning, policy-gradient methods, and use of neural networks to represent learned functions. The duo's multidisciplinary approach helped lead to their breakthroughs. "Research areas ranging from cognitive science and psychology to neuroscience inspired the development of reinforcement learning," said President Yannis Ioannidis of the Association for Computing Machinery, which gives out the Turing Award, in a Wednesday (March 5) blog post. In a 1947 lecture, Turing had said, "What we want is a machine that can learn from experience," according to Jeff Dean, Google's chief scientist. "Reinforcement learning, as pioneered by Barto and Sutton, directly answers that challenge," he said. "Their work has been a lynchpin of progress in AI over the last several decades." The Turing Award is financially supported by Google. While the Turing Award is considered computing's equivalent to the Nobel Prize, there were only two people who have won both. Last year, Turing Award winner Geoffrey Hinton won the Nobel Prize in Physics for his pioneering work in the development of neural networks. The only other person to win both awards was Herbert Simon, a Carnegie Mellon University professor, who won the 1978 Nobel Prize in economic sciences.
Share
Copy Link
Andrew Barto and Richard Sutton, pioneers of reinforcement learning in AI, have been awarded the prestigious Turing Award. Their groundbreaking work has significantly influenced modern AI development, including technologies like ChatGPT and AlphaGo.
Andrew Barto and Richard Sutton, pioneers in the field of reinforcement learning, have been awarded the 2024 A.M. Turing Award, often referred to as the "Nobel Prize of Computing" 12. The award, which carries a $1 million prize sponsored by Google, recognizes their foundational work in developing reinforcement learning, a key technique in modern artificial intelligence 3.
Barto and Sutton began their collaboration in the late 1970s at the University of Massachusetts, Amherst, where Barto was Sutton's PhD advisor 2. Their work focused on creating "hedonistic" machines that could learn from experience through a system of rewards, similar to how animals are trained 4. This approach, known as reinforcement learning, allows AI systems to make optimized decisions through trial and error 3.
In the early 1980s, they published a landmark paper demonstrating their new approach by balancing a pole on a moving cart in a simulated environment 5. Their 1988 textbook, "Reinforcement Learning: An Introduction," remains a standard reference in the field with over 75,000 citations 2.
Reinforcement learning has been crucial to many recent AI breakthroughs:
Google's senior VP Jeff Dean described reinforcement learning as "a lynchpin of progress in AI over the last several decades" and "a central pillar of the AI boom" 23.
Both Barto and Sutton acknowledged that for much of their careers, their work was not in vogue. "We were kind of in the wilderness," Barto said in an interview 4. The award represents a significant recognition of their contributions to the field of AI.
While both scientists have made significant contributions to AI, they differ in their views on potential risks:
Sutton embraces a future with potentially superintelligent AI, stating, "We're trying to understand ourselves and, of course, to make things that can work even better. Maybe to become such things" 45.
The reinforcement learning techniques developed by Barto and Sutton continue to be relevant in AI research and development. Their work has attracted numerous young researchers and driven billions of dollars in investments 25. As AI technology continues to advance, the principles of reinforcement learning are likely to play a crucial role in shaping the future of intelligent systems.
Cybersecurity researchers demonstrate a novel "promptware" attack that uses malicious Google Calendar invites to manipulate Gemini AI into controlling smart home devices, raising concerns about AI safety and real-world implications.
13 Sources
Technology
22 hrs ago
13 Sources
Technology
22 hrs ago
Google's search head Liz Reid responds to concerns about AI's impact on web traffic, asserting that AI features are driving more searches and higher quality clicks, despite conflicting third-party reports.
8 Sources
Technology
22 hrs ago
8 Sources
Technology
22 hrs ago
OpenAI has struck a deal with the US government to provide ChatGPT Enterprise to federal agencies for just $1 per agency for one year, marking a significant move in AI adoption within the government sector.
14 Sources
Technology
22 hrs ago
14 Sources
Technology
22 hrs ago
Microsoft announces the integration of OpenAI's newly released GPT-5 model across its Copilot ecosystem, including Microsoft 365, GitHub, and Azure AI. The update promises enhanced AI capabilities for users and developers.
3 Sources
Technology
6 hrs ago
3 Sources
Technology
6 hrs ago
Google has officially launched its AI coding agent Jules, powered by Gemini 2.5 Pro, offering asynchronous coding assistance with new features and tiered pricing plans.
10 Sources
Technology
22 hrs ago
10 Sources
Technology
22 hrs ago