Andrew G. Barto
BS Math 1970, Ph.D. Computer Science 1975, University of Michigan, Ann Arbor, Michigan (a short biography)
Since I retired in 2012 I am no longer taking on new students or interns. I encourage potential applicants to look for other opportunities in the College of Information and Computer Sciences by looking here .
My research centers on learning in machines and animals. I worked on developing learning algorithms that are useful for engineering applications but that also make contact with learning as studied by psychologists and neuroscientists. Although I make no claims to being either a psychologist or a neuroscientist, I have spent a lot of time interacting with scientists in those fields and reading their books and papers. It is important to understand how new developments relate to what others have done in the past.
In the case of reinforcement learning (RL)—whose main ideas go back a very long way—it has been immensely gratifying to participate in establishing new links between RL and methods from the theory of stochastic optimal control. Especially exciting are the connections between temporal difference (TD) algorithms and the brain's dopamine system. These connections are partly responsible for rekindling my interest in RL as a way to improve our understanding of animal learning and behavior, in addition to being a collection of methods for finding good solutions to engineering problems. The second edition of the RL book with Rich Sutton contains new chapters on RL from the perspectives of psychology and neuroscience.
An area of recent interest is about what psychologists call intrinsically motivated behavior, meaning behavior that is done for its own sake rather than as a step toward solving a specific problem of clear practical value. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous agents able to efficiently solve a wide range of practical problems as they arise.
The idea here is that the reward signals that an RL agent learns from do not really come from the agent's external environment, but are in fact generated within the agent itself to reward behavior that that not only helps the agent deal with immediate challenges from its environment, but also helps the agent acquire knowledge and skills that will be useful throughout its life. This leads to fundamental questions about reward signals, both extrinsic and intrinsic. What makes a good reward signal? What kinds of intrinsic reward signals do our brains, and the brains of other animals, generate? How are these signals related to evolutionary fitness, and how have they evolved? Some of my recent papers with colleagues deal with these questions.
The rapid pace of advances in AI has led to warnings that AI poses serious threats to our societies, even to humanity itself. This is certainly true for RL, which can benefit society in many ways but can also produce undesirable outcomes if carelessly deployed. RL is basically an optimization technology, so it inherits the plusses and minuses of traditional optimization methods. An RL agent can discover unexpected ways to obtain a lot of reward; sometimes by solving a problem in an efficient new way, but In other cases, the agent can learn to behave in unsafe ways that the system's designers never even thought of.
Despite this possibility of unintended negative consequences, optimization has been used for hundreds of years by engineers, architects, and others whose designs have positively impacted the world. In these days of rapidly changing climate and rapidly advancing AI, optimization can be especially beneficial. But it is necessary to adapt and extend to RL the best engineering practices that have evolved to mitigate the risks of real-world applications. This means that the safety of AI applications involving RL demands careful attention as RL moves out into the real world
Visit the Autonomous Learning Laboratory page for more details.
Top of page