Philip S. Thomas

Philip S. Thomas

Assistant Professor and co-director of the Autonomous Learning Lab

College of Information and Computer Sciences, University of Massachusetts Amherst

pthomas [at] cs [dot] umass [dot] edu

I might take one doctoral student in Fall 2020. Unfortunately I receive too many e-mails to respond individually. If you are interested, please apply through the official channels here. I am unlikely to respond to e-mails about joining the lab or internships.

I study ways to ensure the safety of artificial intelligence (AI) systems, with emphases on ensuring the fairness of machine learning algorithms and on creating safe and practical reinforcement learning (RL) algorithms. My most recent work on these topics is summarized here. I am currently co-directing the Autonomous Learning Lab (ALL) at UMass Amherst with Sridhar Mahadevan. Before that I worked as a postdoc for Emma Brunskill at CMU. I completed my Ph.D. in computer science at UMass Amherst in 2015, where Andrew Barto was my adviser. I completed my B.S. and M.S. in computer science at CWRU in 2008 and 2009, where Michael Branicky was my adviser. Before that, in high school, I was introduced to computer science and mentored by David Kosbie.

Publications

(Bolded titles indicate papers that I find most interesting.)

2019

  • B. Metevier, S. Giguere, S. Brockman, A. Kobren, Y. Brun, E. Brunskill, and P. S. Thomas. Offline Contextual Bandits with High Probability Fairness Guarantees. In Advances in Neural Information Processing Systems, 2019.
  • F. M. Garcia and P. S. Thomas. A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning. In Advances in Neural Information Processing Systems, 2019.
  • Y. Chandak, G. Theocharous, J. Kostas, S. Jordan, and P. S. Thomas. Learning Action Representations for Reinforcement Learning. In Proceedings of the Thirty-Sixth International Conference on Machine Learning, 2019. pdf
  • P. S. Thomas and E. Learned-Miller. Concentration Inequalities for Conditional Value at Risk. In Proceedings of the Thirty-Sixth International Conference on Machine Learning, 2019. pdf
  • S. Tiwari and P. S. Thomas. Natural Option-Critic. In Proceedings of the Thirty-Third Conference on Artificial Intelligence, 2019. pdf
  • Y. Chandak, G. Theocharous, J. Kostas, S. Jordan, and P. S. Thomas. Improving Generalization over Large Action Sets. In The Multi-disciplinary Conference on Reinforcement Learning and Decision Making, 2019.
  • C. Nota and P. S. Thomas. Is the Policy Gradient a Gradient? arXiv:1906.07073, 2019. pdf, arXiv
  • P. S. Thomas, S. Jordan, Y. Chandak, C. Nota, and J. Kostas. Classical Policy Gradient: Preserving Bellman's Principle of Optimality. arXiv:1906.03063, 2019. pdf, arXiv
  • Y. Chandak, G. Theocharous, C. Nota, and P. S. Thomas. Lifelong Learning with a Changing Action Set. arXiv:1906.01770, 2019. pdf, arXiv
  • Y. Chandak, G. Theocharous, B. Metevier, P. S. Thomas. Reinforcement Learning When All Actions are Not Always Available. arXiv:1906.01772, 2019. pdf, arXiv
  • E. Learned-Miller and P.S. Thomas. A New Confidence Interval for the Mean of a Bounded Random Variable. arXiv:1905.06208, 2019. pdf, arXiv
  • J. Kostas, C. Nota, and P. S. Thomas. Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock. arXiv:1902.05650, 2019. pdf, arXiv

2018

  • P. S. Thomas, C. Dann, and E. Brunskill. Decoupling Gradient-Like Learning Rules from Representations. In Proceedings of the Thirty-Fifth International Conference on Machine Learning, 2018. pdf
  • Y. Chandak, G. Theocharous, J. Kostas, and P. S. Thomas. Reinforcement Learning with a Dynamic Action Set. In Continual Learning Workshop, NIPS 2018.
  • S. M. Jordan, D. Cohen, and P. S. Thomas. Using Cumulative Distribution Based Performance Analysis to Benchmark Models. In Critiquing and Correcting Trends in ML workshop, NIPS 2018. pdf
  • S. Giguere and P. S. Thomas. Classification with Probabilistic Fairness Guarantees. In International Workshop on Software Fairness, ICSE 2018.
  • A. Jagannatha, P. S. Thomas, and H. Yu. Towards High Confidence Off-Policy Reinforcement Learning for Clinical Applications. In Workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML), ICML 2018.

2017

  • P. S. Thomas, B. Castro da Silva, A. G. Barto, and E. Brunskill. On Ensuring that Intelligent Machines are Well-Behaved. arXiv:1708.05448, 2017. pdf, arXiv
  • P. S. Thomas and E. Brunskill. Importance Sampling with Unequal Support. In Proceedings of the Thirty-First Conference on Artificial Intelligence, 2017. pdf, body only, supplemental only, ArXiv preprint (pdf)
  • P. S. Thomas., G. Theocharous, M. Ghavamzadeh, I. Durugkar, and E. Brunskill. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing. In Conference on Innovative Applications of Artificial Intelligence, 2017. pdf
    • Related paper with same authors presented at the Workshop on Computational Frameworks for Personalization at ICML 2016.
  • S. Doroudi, P. S. Thomas, and E. Brunskill. Importance Sampling for Fair Policy Selection. In 33rd Conference on Uncertainty in Artificial Intelligence, 2017. pdf
  • P. S. Thomas, C. Dann, and E. Brunskill. Decoupling Learning Rules from Representations. arXiv:1706.03100v1, 2017. pdf, arXiv
  • J. P. Hanna, P. S. Thomas, P. Stone, and S. Niekum. Data-Efficient Policy Evaluation Through Behavior Policy Search. In Proceedings of the Thirty-Fourth International Conference on Machine Learning, 2017. pdf, supplemental
  • Z. Guo, P. S. Thomas, and E. Brunskill. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation. In Advances in Neural Information Processing Systems, 2017.
    • Related paper with same authors, titled "Using Options for Long-Horizon Off-Policy Evaluation" was presented at The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017, as an extended abstract.
  • K. M. Jagodnik, P. S. Thomas, A. J. van den Bogert, M. S. Branicky, and R. F. Kirsch. Training an Actor-Critic Reinforcement Learning Controller for Arm Movement Using Human-Generated Rewards. In IEEE Transactions on Neural Systems and Rehabilitation Engineering 25(10) pages 1892–1905, October 2017. pdf
  • A. G. Barto, P. S. Thomas, and R. S. Sutton. Some Recent Applications of Reinforcement Learning. In Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017. pdf
  • P. S. Thomas and E. Brunskill. Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines. arXiv:1706.06643v1, 2017. pdf, arXiv
  • S. Doroudi, P. S. Thomas, and E. Brunskill. Importance Sampling for Fair Policy Selection. In The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017. Extended abstract. pdf
  • Y. Liu, P. S. Thomas, and E. Brunskill. Model Selection for Off-Policy Policy Evaluation. In The Third Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017. Extended abstract. pdf

2016

  • P. S. Thomas and E. Brunskill. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. In Proceedings of the Thirty-Third International Conference on Machine Learning, 2016. complete, body only, appendix, ArXiv preprint (pdf)
    • Related extended abstract for Data-Efficient Machine Learning Workshop at ICML 2016. pdf
  • P. S. Thomas, B. C. da Silva, C. Dann, and E. Brunskill. Energetic Natural Gradient Descent. In Proceedings of the Thirty-Third International Conference on Machine Learning, 2016. complete, body only, appendix
  • M. G. Bellemare, G. Ostrovski, A. Guez, P. S. Thomas, and R. Munos. Increasing the Action Gap: New Operators for Reinforcement Learning. In Proceedings of the Thirtieth AAAI Conference, 2016. pdf, supplemental, video, code
  • K. M. Jagodnik, P. S. Thomas, A. J. van den Bogert, M. S. Branicky, and R. F. Kirsch. Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement. In IEEE Transactions on Human-Machine Systems 46(5) pages 723–733, October 2016. pdf
  • P. S. Thomas and E. Brunskill. Magical Policy Search: Data Efficient Reinforcement Learning with Guarantees of Global Optimality. In European Workshop On Reinforcement Learning, 2016. pdf

2015

  • P. S. Thomas. Safe Reinforcement Learning. PhD Thesis, School of Computer Science, University of Massachusetts Amherst, September 2015. pdf
  • P. S. Thomas, S. Niekum, G. Theocharous, and G. Konidaris. Policy Evaluation using the Ω-Return. In Advances in Neural Information Processing Systems 29, 2015. pdf
  • P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High Confidence Off-Policy Evaluation. In Proceedings of the Twenty-Ninth Conference on Artificial Intelligence, 2015. pdf
  • P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High Confidence Policy Improvement. In Proceedings of the Thirty-Second International Conference on Machine Learning, 2015. pdf, errata
  • P. S. Thomas. A Notation for Markov Decision Processes. arXiv:1512.09075v1, 2015. pdf, arXiv
  • G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Personalized ad recommendation systems for life-time value optimization with guarantees. In Proceedings of the International Joint Conference on Artificial Intelligence, 2015. pdf
  • G. Theocharous, P. S. Thomas, and M. Ghavamzadeh. Ad recommendation systems for life-time value optimization. In TargetAd 2015: Ad Targeting at Scale, at the World Wide Web Conference, 2015. pdf

2014

  • P. S. Thomas. GeNGA: A generalization of natural gradient ascent with positive and negative convergence results. In Proceedings of the Thirty-First International Conference on Machine Learning, 2014. pdf
  • P. S. Thomas. Bias in natural actor-critic algorithms. In Proceedings of the Thirty-First International Conference on Machine Learning, 2014. pdf
  • W. Dabney and P. S. Thomas. Natural temporal difference learning. In Proceedings of the Twenty-Eighth Conference on Artificial Intelligence, 2014. pdf
  • S. Mahadevan, B. Liu, P. S. Thomas, W. Dabney, S. Giguere, N. Jacek, I. Gemp, J. Liu. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces. arxiv:1405.6757v1, 2014. pdf, arXiv

2013

  • P. S. Thomas, W. Dabney, S. Mahadevan, and S. Giguere. Projected natural actor-critic. In Advances in Neural Information Processing Systems 26, 2013. pdf
  • W. Dabney, P. S. Thomas, and A. G. Barto. Performance Metrics for Reinforcement Learning Algorithms. In The First Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2013. Extended abstract.

2012

  • P. S. Thomas. Bias in natural actor-critic algorithms. Technical Report UM-CS-2012-018, Department of Computer Science, University of Massachusetts Amherst, 2012. pdf
  • P. S. Thomas and A. G. Barto. Motor primitive discovery. In Proceedings of the IEEE Conference on Development and Learning and Epigenetic Robotics, 2012. pdf

2011

  • P. S. Thomas. Policy gradient coagent networks. In Advances in Neural Information Processing Systems 24, pages 1944–1952. 2011. pdf
  • G. D. Konidaris, S. Niekum, and P. S. Thomas. TDγ: Re-evaluating complex backups in temporal difference learning. In Advances in Neural Information Processing Systems 24, pages 2402–2410. 2011. pdf
      ↑Author names listed alphabetically. Footnote reads: "All three authors are primary authors on this occasion."
  • G. D. Konidaris, S. Osentoski, and P. S. Thomas. Value function approximation in reinforcement learning using the Fourier basis. In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence, pages 380–395, 2011. pdf
  • P. S. Thomas and A. G. Barto. Conjugate Markov decision processes. In Proceedings of the Twenty-Eighth International Conference on Machine Learning, pages 137–144, 2011. pdf

2009

  • P. S. Thomas. A reinforcement learning controller for functional electrical stimulation of a human arm. Master's thesis, Department of Electrical Engineering and Computer Science, Case Western Reserve University, August 2009. pdf
  • P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Application of the actor-critic architecture to functional electrical stimulation control of a human arm. In Proceedings of the Twenty-First Innovative Applications of Artificial Intelligence, pages 165–172, 2009. pdf
  • P. S. Thomas, M. S. Branicky, A. J. van den Bogert, and K. M. Jagodnik. Creating a reinforcement learning controller for functional electrical stimulation of a human arm. In Proceedings of the Fourteenth Yale Workshop on Adaptive and Learning Systems, pages 15–20, 2008. pdf

1998

  • A. Kandabarow, M. Rafalko, and P. S. Thomas. Penguins with hats, penguins with pants. In 7th Grade English with Mrs. Haiges, Sewickley Academy, PA, c. 1998. pdf