3 Sources
[1]
What is reinforcement learning? An AI researcher explains a key method of teaching machines - and how it relates to training your dog
Understanding intelligence and creating intelligent machines are grand scientific challenges of our times. The ability to learn from experience is a cornerstone of intelligence for machines and living beings alike. In a remarkably prescient 1948 report, Alan Turing - the father of modern computer science - proposed the construction of machines that display intelligent behavior. He also discussed the "education" of such machines "by means of rewards and punishments." Turing's ideas ultimately led to the development of reinforcement learning, a branch of artificial intelligence. Reinforcement learning designs intelligent agents by training them to maximize rewards as they interact with their environment. As a machine learning researcher, I find it fitting that reinforcement learning pioneers Andrew Barto and Richard Sutton were awarded the 2024 ACM Turing Award. What is reinforcement learning? Animal trainers know that animal behavior can be influenced by rewarding desirable behaviors. A dog trainer gives the dog a treat when it does a trick correctly. This reinforces the behavior, and the dog is more likely to do the trick correctly the next time. Reinforcement learning borrowed this insight from animal psychology. But reinforcement learning is about training computational agents, not animals. The agent can be a software agent like a chess-playing program. But the agent can also be an embodied entity like a robot learning to do household chores. Similarly, the environment of an agent can be virtual, like the chessboard or the designed world in a video game. But it can also be a house where a robot is working. Just like animals, an agent can perceive aspects of its environment and take actions. A chess-playing agent can access the chessboard configuration and make moves. A robot can sense its surroundings with cameras and microphones. It can use its motors to move about in the physical world. Agents also have goals that their human designers program into them. A chess-playing agent's goal is to win the game. A robot's goal might be to assist its human owner with household chores. The reinforcement learning problem in AI is how to design agents that achieve their goals by perceiving and acting in their environments. Reinforcement learning makes a bold claim: All goals can be achieved by designing a numerical signal, called the reward, and having the agent maximize the total sum of rewards it receives. Researchers do not know if this claim is actually true, because of the wide variety of possible goals. Therefore, it is often referred to as the reward hypothesis. Sometimes it is easy to pick a reward signal corresponding to a goal. For a chess-playing agent, the reward can be +1 for a win, 0 for a draw, and -1 for a loss. It is less clear how to design a reward signal for a helpful household robotic assistant. Nevertheless, the list of applications where reinforcement learning researchers have been able to design good reward signals is growing. A big success of reinforcement learning was in the board game Go. Researchers thought that Go was much harder than chess for machines to master. The company DeepMind, now Google DeepMind, used reinforcement learning to create AlphaGo. AlphaGo defeated top Go player Lee Sedol in a five-match game in 2016. A more recent example is the use of reinforcement learning to make chatbots such as ChatGPT more helpful. Reinforcement learning is also being used to improve the reasoning capabilities of chatbots. Reinforcement learning's origins However, none of these successes could have been foreseen in the 1980s. That is when Barto and his then-Ph.D. student Sutton proposed reinforcement learning as a general problem-solving framework. They drew inspiration not only from animal psychology but also from the field of control theory, the use of feedback to influence a system's behavior, and optimization, a branch of mathematics that studies how to select the best choice among a range of available options. They provided the research community with mathematical foundations that have stood the test of time. They also created algorithms that have now become standard tools in the field. It is a rare advantage for a field when pioneers take the time to write a textbook. Shining examples like "The Nature of the Chemical Bond" by Linus Pauling and "The Art of Computer Programming" by Donald E. Knuth are memorable because they are few and far between. Sutton and Barto's "Reinforcement Learning: An Introduction" was first published in 1998. A second edition came out in 2018. Their book has influenced a generation of researchers and has been cited more than 75,000 times. Reinforcement learning has also had an unexpected impact on neuroscience. The neurotransmitter dopamine plays a key role in reward-driven behaviors in humans and animals. Researchers have used specific algorithms developed in reinforcement learning to explain experimental findings in people and animals' dopamine system. Barto and Sutton's foundational work, vision and advocacy have helped reinforcement learning grow. Their work has inspired a large body of research, made an impact on real-world applications, and attracted huge investments by tech companies. Reinforcement learning researchers, I'm sure, will continue to see further ahead by standing on their shoulders.
[2]
Training an AI system and training a dog have a basic principle in common
Understanding intelligence and creating intelligent machines are grand scientific challenges of our times. The ability to learn from experience is a cornerstone of intelligence for machines and living beings alike. In a remarkably prescient 1948 report, Alan Turing - the father of modern computer science - proposed the construction of machines that display intelligent behavior. He also discussed the "education" of such machines "by means of rewards and punishments." Turing's ideas ultimately led to the development of reinforcement learning, a branch of artificial intelligence. Reinforcement learning designs intelligent agents by training them to maximize rewards as they interact with their environment. As a machine learning researcher, I find it fitting that reinforcement learning pioneers Andrew Barto and Richard Sutton were awarded the 2024 ACM Turing Award. Animal trainers know that animal behavior can be influenced by rewarding desirable behaviors. A dog trainer gives the dog a treat when it does a trick correctly. This reinforces the behavior, and the dog is more likely to do the trick correctly the next time. Reinforcement learning borrowed this insight from animal psychology. But reinforcement learning is about training computational agents, not animals. The agent can be a software agent like a chess-playing program. But the agent can also be an embodied entity like a robot learning to do household chores. Similarly, the environment of an agent can be virtual, like the chessboard or the designed world in a video game. But it can also be a house where a robot is working. Just like animals, an agent can perceive aspects of its environment and take actions. A chess-playing agent can access the chessboard configuration and make moves. A robot can sense its surroundings with cameras and microphones. It can use its motors to move about in the physical world. Agents also have goals that their human designers program into them. A chess-playing agent's goal is to win the game. A robot's goal might be to assist its human owner with household chores. The reinforcement learning problem in AI is how to design agents that achieve their goals by perceiving and acting in their environments. Reinforcement learning makes a bold claim: All goals can be achieved by designing a numerical signal, called the reward, and having the agent maximize the total sum of rewards it receives. Researchers do not know if this claim is actually true, because of the wide variety of possible goals. Therefore, it is often referred to as the reward hypothesis. Sometimes it is easy to pick a reward signal corresponding to a goal. For a chess-playing agent, the reward can be +1 for a win, 0 for a draw, and -1 for a loss. It is less clear how to design a reward signal for a helpful household robotic assistant. Nevertheless, the list of applications where reinforcement learning researchers have been able to design good reward signals is growing. A big success of reinforcement learning was in the board game Go. Researchers thought that Go was much harder than chess for machines to master. The company DeepMind, now Google DeepMind, used reinforcement learning to create AlphaGo. AlphaGo defeated top Go player Lee Sedol in a five-match game in 2016. A more recent example is the use of reinforcement learning to make chatbots such as ChatGPT more helpful. Reinforcement learning is also being used to improve the reasoning capabilities of chatbots. However, none of these successes could have been foreseen in the 1980s. That is when Barto and his then-Ph.D. student Sutton proposed reinforcement learning as a general problem-solving framework. They drew inspiration not only from animal psychology but also from the field of control theory, the use of feedback to influence a system's behavior, and optimization, a branch of mathematics that studies how to select the best choice among a range of available options. They provided the research community with mathematical foundations that have stood the test of time. They also created algorithms that have now become standard tools in the field. It is a rare advantage for a field when pioneers take the time to write a textbook. Shining examples like "The Nature of the Chemical Bond" by Linus Pauling and "The Art of Computer Programming" by Donald E. Knuth are memorable because they are few and far between. Sutton and Barto's "Reinforcement Learning: An Introduction" was first published in 1998. A second edition came out in 2018. Their book has influenced a generation of researchers and has been cited more than 75,000 times. Reinforcement learning has also had an unexpected impact on neuroscience. The neurotransmitter dopamine plays a key role in reward-driven behaviors in humans and animals. Researchers have used specific algorithms developed in reinforcement learning to explain experimental findings in people and animals' dopamine system. Barto and Sutton's foundational work, vision and advocacy have helped reinforcement learning grow. Their work has inspired a large body of research, made an impact on real-world applications, and attracted huge investments by tech companies. Reinforcement learning researchers, I'm sure, will continue to see further ahead by standing on their shoulders.
[3]
What is reinforcement learning? An AI researcher explains a key method of teaching machines
Understanding intelligence and creating intelligent machines are grand scientific challenges of our times. The ability to learn from experience is a cornerstone of intelligence for machines and living beings alike. In a remarkably prescient 1948 report, Alan Turing -- the father of modern computer science -- proposed the construction of machines that display intelligent behavior. He also discussed the "education" of such machines "by means of rewards and punishments." Turing's ideas ultimately led to the development of reinforcement learning, a branch of artificial intelligence. Reinforcement learning designs intelligent agents by training them to maximize rewards as they interact with their environment. Animal trainers know that animal behavior can be influenced by rewarding desirable behaviors. A dog trainer gives the dog a treat when it does a trick correctly. This reinforces the behavior, and the dog is more likely to do the trick correctly the next time. Reinforcement learning borrowed this insight from animal psychology. But reinforcement learning is about training computational agents, not animals. The agent can be a software agent like a chess-playing program. But the agent can also be an embodied entity like a robot learning to do household chores. Similarly, the environment of an agent can be virtual, like the chessboard or the designed world in a video game. But it can also be a house where a robot is working. Just like animals, an agent can perceive aspects of its environment and take actions. A chess-playing agent can access the chessboard configuration and make moves. A robot can sense its surroundings with cameras and microphones. It can use its motors to move about in the physical world. Agents also have goals that their human designers program into them. A chess-playing agent's goal is to win the game. A robot's goal might be to assist its human owner with household chores. The reinforcement learning problem in AI is how to design agents that achieve their goals by perceiving and acting in their environments. Reinforcement learning makes a bold claim: All goals can be achieved by designing a numerical signal, called the reward, and having the agent maximize the total sum of rewards it receives. Researchers do not know if this claim is actually true, because of the wide variety of possible goals. Therefore, it is often referred to as the reward hypothesis. Sometimes it is easy to pick a reward signal corresponding to a goal. For a chess-playing agent, the reward can be +1 for a win, 0 for a draw, and -1 for a loss. It is less clear how to design a reward signal for a helpful household robotic assistant. Nevertheless, the list of applications where reinforcement learning researchers have been able to design good reward signals is growing. A big success of reinforcement learning was in the board game Go. Researchers thought that Go was much harder than chess for machines to master. The company DeepMind, now Google DeepMind, used reinforcement learning to create AlphaGo. AlphaGo defeated top Go player Lee Sedol in a five-match game in 2016. A more recent example is the use of reinforcement learning to make chatbots such as ChatGPT more helpful. Reinforcement learning is also being used to improve the reasoning capabilities of chatbots. Reinforcement learning's origins However, none of these successes could have been foreseen in the 1980s. That is when Barto and his then-Ph.D. student Sutton proposed reinforcement learning as a general problem-solving framework. They drew inspiration not only from animal psychology but also from the field of control theory, the use of feedback to influence a system's behavior, and optimization, a branch of mathematics that studies how to select the best choice among a range of available options. They provided the research community with mathematical foundations that have stood the test of time. They also created algorithms that have now become standard tools in the field. It is a rare advantage for a field when pioneers take the time to write a textbook. Shining examples like "The Nature of the Chemical Bond" by Linus Pauling and "The Art of Computer Programming" by Donald E. Knuth are memorable because they are few and far between. Sutton and Barto's "Reinforcement Learning: An Introduction" was first published in 1998. A second edition came out in 2018. Their book has influenced a generation of researchers and has been cited more than 75,000 times. Reinforcement learning has also had an unexpected impact on neuroscience. The neurotransmitter dopamine plays a key role in reward-driven behaviors in humans and animals. Researchers have used specific algorithms developed in reinforcement learning to explain experimental findings in people and animals' dopamine system. Barto and Sutton's foundational work, vision and advocacy have helped reinforcement learning grow. Their work has inspired a large body of research, made an impact on real-world applications, and attracted huge investments by tech companies. Reinforcement learning researchers, I'm sure, will continue to see further ahead by standing on their shoulders.
Share
Copy Link
An exploration of reinforcement learning in AI, its origins, applications, and recent recognition of its pioneers, drawing parallels between machine learning and animal training.
Reinforcement learning, a key branch of artificial intelligence, has its roots in a visionary concept proposed by Alan Turing in 1948. Turing, often referred to as the father of modern computer science, suggested the creation of machines capable of intelligent behavior that could be "educated" through rewards and punishments 123. This idea laid the foundation for what would become a revolutionary approach to machine learning.
At its core, reinforcement learning is about training computational agents to achieve goals by maximizing rewards as they interact with their environment. This concept draws inspiration from animal psychology, particularly the way trainers influence animal behavior through positive reinforcement 123.
In reinforcement learning:
This approach is applied to various scenarios, from virtual environments like chess games to physical settings where robots learn to perform tasks 123.
Reinforcement learning operates on a bold claim known as the reward hypothesis: all goals can be achieved by designing a numerical reward signal for the agent to maximize. While this hypothesis remains unproven due to the vast array of possible goals, it has shown remarkable effectiveness in many applications 123.
Reinforcement learning has achieved significant milestones:
The field of reinforcement learning owes much to the work of Andrew Barto and Richard Sutton. In the 1980s, they proposed reinforcement learning as a general problem-solving framework, drawing from animal psychology, control theory, and optimization 123.
Their seminal textbook, "Reinforcement Learning: An Introduction," first published in 1998 and updated in 2018, has been instrumental in shaping the field. With over 75,000 citations, it has influenced a generation of researchers 123.
Interestingly, reinforcement learning has made unexpected contributions to neuroscience. Researchers have used reinforcement learning algorithms to explain findings related to the dopamine system in humans and animals, shedding light on reward-driven behaviors 123.
In a fitting tribute to their groundbreaking work, Andrew Barto and Richard Sutton were awarded the 2024 ACM Turing Award, often referred to as the "Nobel Prize of Computing" 12. This recognition underscores the profound impact of their contributions to the field of artificial intelligence.
The foundational work, vision, and advocacy of Barto and Sutton have propelled reinforcement learning into a thriving field of research and application. Their efforts have not only inspired a large body of research but also attracted significant investments from tech companies, promising continued advancements in the years to come 123.
Google is providing free users of its Gemini app temporary access to the Veo 3 AI video generation tool, typically reserved for paying subscribers, for a limited time this weekend.
3 Sources
Technology
23 hrs ago
3 Sources
Technology
23 hrs ago
The UK's technology secretary and OpenAI's CEO discussed a potential multibillion-pound deal to provide ChatGPT Plus access to all UK residents, highlighting the government's growing interest in AI technology.
2 Sources
Technology
7 hrs ago
2 Sources
Technology
7 hrs ago
Multiple news outlets, including Wired and Business Insider, have been duped by AI-generated articles submitted under a fake freelancer's name, raising concerns about the future of journalism in the age of artificial intelligence.
4 Sources
Technology
2 days ago
4 Sources
Technology
2 days ago
Google inadvertently revealed a new smart speaker during its Pixel event, sparking speculation about its features and capabilities. The device is expected to be powered by Gemini AI and could mark a significant upgrade in Google's smart home offerings.
5 Sources
Technology
1 day ago
5 Sources
Technology
1 day ago
As AI and new platforms transform search behavior, brands must adapt their strategies beyond traditional SEO to remain visible in an increasingly fragmented digital landscape.
2 Sources
Technology
1 day ago
2 Sources
Technology
1 day ago