Philip S. Thomas

Philip S. Thomas

Assistant Professor and co-director of the Autonomous Learning Lab

College of Information and Computer Sciences, University of Massachusetts Amherst

pthomas [at] cs [dot] umass [dot] edu

I am not taking additional students (PhD, MS, visiting) prior to Fall 2020. Unfortunately I also receive too many e-mails to continue to respond, and so I am unlikely to respond to e-mails about joining the lab or internships.

I study ways to ensure the safety of artificial intelligence (AI) systems, with emphases on ensuring the fairness of machine learning algorithms and on creating safe and practical reinforcement learning (RL) algorithms. My most recent work on these topics is summarized here. I am currently co-directing the Autonomous Learning Lab (ALL) at UMass Amherst with Sridhar Mahadevan. Before that I worked as a postdoc for Emma Brunskill at CMU. I completed my Ph.D. in computer science at UMass Amherst in 2015, where Andrew Barto was my adviser. I completed my B.S. and M.S. in computer science at CWRU in 2008 and 2009, where Michael Branicky was my adviser. Before that, in high school, I was introduced to computer science and mentored by David Kosbie.

Publications

(Bolded titles indicate papers that I find most interesting.)

2018

  • P. S. Thomas, C. Dann, and E. Brunskill. Decoupling Gradient-Like Learning Rules from Representations. In Proceedings of the Thirty-Fifth International Conference on Machine Learning, 2018. pdf

2017

  • P. S. Thomas, B. Castro da Silva, A. G. Barto, and E. Brunskill. On Ensuring that Intelligent Machines are Well-Behaved. arXiv:1708.05448, 2017. pdf, arXiv
  • P. S. Thomas and E. Brunskill. Importance Sampling with Unequal Support. In Proceedings of the Thirty-First Conference on Artificial Intelligence, 2017. pdf, body only, supplemental only, ArXiv preprint (pdf)
  • P. S. Thomas., G. Theocharous, M. Ghavamzadeh, I. Durugkar, and E. Brunskill. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing. In Conference on Innovative Applications of Artificial Intelligence, 2017. pdf
    • Related paper with same authors presented at the Workshop on Computational Frameworks for Personalization at ICML 2016.
  • S. Doroudi, P. S. Thomas, and E. Brunskill. Importance Sampling for Fair Policy Selection. In 33rd Conference on Uncertainty in Artificial Intelligence, 2017. pdf
  • P. S. Thomas, C. Dann, and E. Brunskill. Decoupling Learning Rules from Representations. arXiv:1706.03100v1, 2017. pdf, arXiv
  • J. P. Hanna, P. S. Thomas, P. Stone, and S. Niekum. Data-Efficient Policy Evaluation Through Behavior Policy Search. In Proceedings of the Thirty-Fourth International Conference on Machine Learning, 2017. pdf, supplemental
  • Z. Guo, P. S. Thomas, and E. Brunskill. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation. In Advances in Neural Information Processing Systems, 2017.
    • Related paper with same authors, titled "Using Options for Long-Horizon Off-Policy Evaluation" was presented at The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017, as an extended abstract.
  • K. M. Jagodnik, P. S. Thomas, A. J. van den Bogert, M. S. Branicky, and R. F. Kirsch. Training an Actor-Critic Reinforcement Learning Controller for Arm Movement Using Human-Generated Rewards. In IEEE Transactions on Neural Systems and Rehabilitation Engineering 25(10) pages 1892–1905, October 2017. pdf
  • A. G. Barto, P. S. Thomas, and R. S. Sutton. Some Recent Applications of Reinforcement Learning. In Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017. pdf
  • P. S. Thomas and E. Brunskill. Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines. arXiv:1706.06643v1, 2017. pdf, arXiv
  • S. Doroudi, P. S. Thomas, and E. Brunskill. Importance Sampling for Fair Policy Selection. In The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017. Extended abstract. pdf
  • Y. Liu, P. S. Thomas, and E. Brunskill. Model Selection for Off-Policy Policy Evaluation. In The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017. Extended abstract. pdf

2016

  • P. S. Thomas and E. Brunskill. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. In Proceedings of the Thirty-Third International Conference on Machine Learning, 2016. complete, body only, appendix, ArXiv preprint (pdf)
    • Related extended abstract for Data-Efficient Machine Learning Workshop at ICML 2016. pdf
  • P. S. Thomas, B. C. da Silva, C. Dann, and E. Brunskill. Energetic Natural Gradient Descent. In Proceedings of the Thirty-Third International Conference on Machine Learning, 2016. complete, body only, appendix
  • M. G. Bellemare, G. Ostrovski, A. Guez, P. S. Thomas, and R. Munos. Increasing the Action Gap: New Operators for Reinforcement Learning. In Proceedings of the Thirtieth AAAI Conference, 2016. pdf, supplemental, video, code
  • K. M. Jagodnik, P. S. Thomas, A. J. van den Bogert, M. S. Branicky, and R. F. Kirsch. Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement. In IEEE Transactions on Human-Machine Systems 46(5) pages 723–733, October 2016. pdf
  • P. S. Thomas and E. Brunskill. Magical Policy Search: Data Efficient Reinforcement Learning with Guarantees of Global Optimality. In European Workshop On Reinforcement Learning, 2016. pdf

2015

  • P. S. Thomas. Safe Reinforcement Learning. PhD Thesis, School of Computer Science, University of Massachusetts Amherst, September 2015. pdf
  • P. S. Thomas, S. Niekum, G. Theocharous, and G. Konidaris. Policy Evaluation using the Ω-Return. In Advances in Neural Information Processing Systems 29, 2015. pdf
  • P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High Confidence Off-Policy Evaluation. In Proceedings of the Twenty-Ninth Conference on Artificial Intelligence, 2015. pdf
  • P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High Confidence Policy Improvement. In Proceedings of the Thirty-Second International Conference on Machine Learning, 2015. pdf, errata
  • P. S. Thomas. A Notation for Markov Decision Processes. arXiv:1512.09075v1, 2015. pdf, arXiv
  • G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Personalized ad recommendation systems for life-time value optimization with guarantees. In Proceedings of the International Joint Conference on Artificial Intelligence, 2015. pdf
  • G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Ad recommendation systems for life-time value optimization. In TargetAd 2015: Ad Targeting at Scale, at the World Wide Web Conference, 2015. pdf

2014

  • P. S. Thomas. GeNGA: A generalization of natural gradient ascent with positive and negative convergence results. In Proceedings of the Thirty-First International Conference on Machine Learning, 2014. pdf
  • P. S. Thomas. Bias in natural actor-critic algorithms. In Proceedings of the Thirty-First International Conference on Machine Learning, 2014. pdf
  • W. Dabney and P. S. Thomas. Natural temporal difference learning. In Proceedings of the Twenty-Eighth Conference on Artificial Intelligence, 2014. pdf
  • S. Mahadevan, B. Liu, P. S. Thomas, W. Dabney, S. Giguere, N. Jacek, I. Gemp, J. Liu. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces. arxiv:1405.6757v1, 2014. pdf, arXiv

2013

  • P. S. Thomas, W. Dabney, S. Mahadevan, and S. Giguere. Projected natural actor-critic. In Advances in Neural Information Processing Systems 26, 2013. pdf
  • W. Dabney, P. S. Thomas, and A. G. Barto. Performance Metrics for Reinforcement Learning Algorithms. In The First Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2013. Extended abstract.

2012

  • P. S. Thomas. Bias in natural actor-critic algorithms. Technical Report UM-CS-2012-018, Department of Computer Science, University of Massachusetts Amherst, 2012. pdf
  • P. S. Thomas and A. G. Barto. Motor primitive discovery. In Proceedings of the IEEE Conference on Development and Learning and Epigenetic Robotics, 2012. pdf

2011

  • P. S. Thomas. Policy gradient coagent networks. In Advances in Neural Information Processing Systems 24, pages 1944–1952. 2011. pdf
  • G. D. Konidaris, S. Niekum, and P. S. Thomas. TDγ: Re-evaluating complex backups in temporal difference learning. In Advances in Neural Information Processing Systems 24, pages 2402–2410. 2011. pdf
      ↑Author names listed alphabetically. Footnote reads: "All three authors are primary authors on this occasion."
  • G. D. Konidaris, S. Osentoski, and P. S. Thomas. Value function approximation in reinforcement learning using the Fourier basis. In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence, pages 380–395, 2011. pdf
  • P. S. Thomas and A. G. Barto. Conjugate Markov decision processes. In Proceedings of the Twenty-Eighth International Conference on Machine Learning, pages 137–144, 2011. pdf

2009

  • P. S. Thomas. A reinforcement learning controller for functional electrical stimulation of a human arm. Master's thesis, Department of Electrical Engineering and Computer Science, Case Western Reserve University, August 2009. pdf
  • P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Application of the actor-critic architecture to functional electrical stimulation control of a human arm. In Proceedings of the Twenty-First Innovative Applications of Artificial Intelligence, pages 165–172, 2009. pdf
  • P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Creating a reinforcement learning controller for functional electrical stimulation of a human arm. In Proceedings of the Fourteenth Yale Workshop on Adaptive and Learning Systems, pages 15–20, 2008. pdf

1998

  • A. Kandabarow, M. Rafalko, and P. S. Thomas. Penguins with hats, penguins with pants. In 7th Grade English with Mrs. Haiges, Sewickley Academy, PA, c. 1998. pdf