Andrew G. Barto

Professor Emeritus
Retired Co-Director Autonomous Learning Laboratory

College of Information and Computer Sciences
University of Massachusetts Amherst
barto [at] cs [dot] umass [dot] edu

BS Math 1970, Ph.D. Computer Science 1975, University of Michigan, Ann Arbor, Michigan (a short biography)

Note to Potential Grad-Student, Internship, and Post-Doc Applicants

Since I retired in 2012 I am no longer taking on new students or interns. I encourage potential applicants to look for other opportunities in the College of Information and Computer Sciences by looking here .

Publications

Publications up through 2020 (latest first) I am working on fixing the links for the old papers. They should work for 2000 onwards.
Books: click on a book for more information

More information on the 2nd edition

Ph.D. students

Chris Vigorito, 2016
Philip Thomas, 2015
Bruno Castro da Silva, 2015
Will Dabney, 2014
Scott Niekum, 2013
Yariv Levy, 2012 (co-chair with J. Meyer)
Scott Kuindersma, 2012 (co-chair with Rod Grupen)
George Konidaris, 2010
Alicia "Pippin" Peregrin Wolfe, 2010
Özgür Şimşek, 2008
Ashvin Shah, 2008
Anders Jonsson, 2005
Thomas Kalt, 2005
Balaraman Ravindran, 2004
Mike Rosenstein, 2003
Ted Perkins, 2002
Amy McGovern, 2002
Mike Duff, 2001
Bob Crites, 1996
Steve Bradtke, 1994
Satinder Singh, 1993
Vijay Gullapalli, 1992
Jonathan Bachrach, 1992
Robbie Jacobs, 1990
Stephen Judd, 1988
Chuck Anderson, 1986
Rich Sutton, 1984

Research Interests

My research centers on learning in machines and animals. I worked on developing learning algorithms that are useful for engineering applications but that also make contact with learning as studied by psychologists and neuroscientists. Although I make no claims to being either a psychologist or a neuroscientist, I have spent a lot of time interacting with scientists in those fields and reading their books and papers. It is important to understand how new developments relate to what others have done in the past.

In the case of reinforcement learning (RL)—whose main ideas go back a very long way—it has been immensely gratifying to participate in establishing new links between RL and methods from the theory of stochastic optimal control. Especially exciting are the connections between temporal difference (TD) algorithms and the brain's dopamine system. These connections are partly responsible for rekindling my interest in RL as a way to improve our understanding of animal learning and behavior, in addition to being a collection of methods for finding good solutions to engineering problems. The second edition of the RL book with Rich Sutton contains new chapters on RL from the perspectives of psychology and neuroscience.

An area of recent interest is about what psychologists call intrinsically motivated behavior, meaning behavior that is done for its own sake rather than as a step toward solving a specific problem of clear practical value. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous agents able to efficiently solve a wide range of practical problems as they arise.

The idea here is that the reward signals that an RL agent learns from do not really come from the agent's external environment, but are in fact generated within the agent itself to reward behavior that that not only helps the agent deal with immediate challenges from its environment, but also helps the agent acquire knowledge and skills that will be useful throughout its life. This leads to fundamental questions about reward signals, both extrinsic and intrinsic. What makes a good reward signal? What kinds of intrinsic reward signals do our brains, and the brains of other animals, generate? How are these signals related to evolutionary fitness, and how have they evolved? Some of my recent papers with colleagues deal with these questions.

The rapid pace of advances in AI has led to warnings that AI poses serious threats to our societies, even to humanity itself. This is certainly true for RL, which can benefit society in many ways but can also produce undesirable outcomes if carelessly deployed. RL is basically an optimization technology, so it inherits the plusses and minuses of traditional optimization methods. An RL agent can discover unexpected ways to obtain a lot of reward; sometimes by solving a problem in an efficient new way, but In other cases, the agent can learn to behave in unsafe ways that the system's designers never even thought of.

Despite this possibility of unintended negative consequences, optimization has been used for hundreds of years by engineers, architects, and others whose designs have positively impacted the world. In these days of rapidly changing climate and rapidly advancing AI, optimization can be especially beneficial. But it is necessary to adapt and extend to RL the best engineering practices that have evolved to mitigate the risks of real-world applications. This means that the safety of AI applications involving RL demands careful attention as RL moves out into the real world

Visit the Autonomous Learning Laboratory page for more details.