Bruno Castro da Silva

Assistant Professor
Co-Director of the Autonomous Learning Lab

College of Information and Computer Sciences
University of Massachusetts

Room 278 - CS Building
140 Governors Drive
Amherst, MA 01003

bsilva@cs.umass.edu

I am an assistant professor in the College of Information and Computer Sciences at the University of Massachusetts.

My research interests are in reinforcement learning (RL), decision making, robotics, and AI safety.

The ultimate goal of my work is to design the necessary tools so that RL algorithms can be widely used to solve challenging real-world tasks in homes and in the workplace, in a safe way, and with as little human intervention as possible.

I am interested, in particular, in two key problems:

how to design general-purpose RL algorithms capable of autonomously decomposing complex tasks into simpler sub-problems, for which specialized reusable and composable skills be can be learned; and
how to ensure that these skills are learned in a way that meets user-specified safety requirements with high probability.

These are fundamental questions that underlie the gap between what artificial intelligence agents can—in principle—do and what we can effectively get them to do given our current algorithms.

More broadly, my research interests lie in the intersection of machine learning, reinforcement learning, optimal control theory, and robotics, and include the construction of hierarchical policies, active learning, open-ended learning, biologically-plausible intrinsic motivation mechanisms, Bayesian optimization applied to control, and machine learning algorithms with high-probability safety and fairness guarantees.

Bio

Prior to being a professor at UMass, I was an associate professor at the Institute of Informatics at the Federal University of Rio Grande do Sul (UFRGS), in Brazil.

I was a postdoctoral associate at the Aerospace Controls Laboratory at MIT LIDS. I received my Ph.D. in Computer Science from the University of Massachusetts, working under the supervision of Prof. Andrew Barto, in 2014.

I completed my Masters Degree in Computer Science in 2007 under the supervision of Prof. Ana Bazzan at the Federal University of Rio Grande do Sul, in Brazil. I completed my B.S. in Computer Science cum laude at that same university in 2004.

I have worked, in different occasions from 2011 to 2018, as a visiting researcher at the Laboratory of Computational Embodied Neuroscience, in the Istituto di Scienze e Tecnologie della Cognizione, in Rome, developing novel control algorithms for a humanoid robot.

In the Summer of 2014 I worked at Adobe Research, where I developed large-scale optimization techniques for the construction of high-performance features for digital marketing optimization.

From 2011 to 2015 I collaborated with Prof. Victor Lesser on the problem of designing organizationally adept agents and on coordinating learning through emergent distributed supervisory control.

Some publications

(for the complete list, please see my CV)

Gupta, D.; Chandak, Y.; Jordan, S.; Thomas, P.S.; da Silva, B.C.
Behavior Alignment via Reward Function Optimization.  (To appear)
Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
[Spotlight — Top 3% among submissions].

Alegre, L.N.; Bazzan, A.L.C.; Nowé, A.; da Silva, B.C.
Multi-Step Generalized Policy Improvement by Leveraging Approximate Models.  (To appear)
Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).

Felten, F.; Alegre, L.N.; Nowé, A.; Bazzan, A.L.C.; Talbi, E.; Danoy, G.; da Silva, B.C.
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning.  (To appear)
Proceedings of Neural Information Processing Systems Track on Datasets and Benchmarks (Datasets and Benchmarks@NeurIPS 2023).

Alegre, L.N.; Bazzan, A.L.C.; Roijers, D.; Nowé, A.; da Silva, B.C.
Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization.
Proceedings of the 22nd International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2023).

Hoag, A.; Kostas, J.; da Silva, B.C.; Thomas, P.S.; Brun, Y.
Seldonian Toolkit: Building Software with Safe and Fair Machine Learning. (Demonstration Track).
Proceedings of the 45th International Conference on Software Engineering (ICSE 2023).

Alegre, L.N.; Bazzan, A.L.C.; da Silva, B.C.
Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer.
Proceedings of the 39th International Conference on Machine Learning (ICML 2022).

Polosky, N.; da Silva, B.C.; Fiterau, M.; Jagannat, J.
Constrained Offline Policy Optimization.
Proceedings of the 39th International Conference on Machine Learning (ICML 2022).

Chandak, Y.; Shankar, S.; Bastian, N.; da Silva, B.C.; Brunskill, E.; Thomas, P.S.
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments.
Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Giguere, S.; Metevier, B.; da Silva, B.C.; Brun, Y.; Thomas, P.S.; Niekum, S.
Fairness Guarantees under Demographic Shift.
Proceedings of the 10th International Conference on Learning Representations (ICLR 2022).

Arora, R.; Moss, E.; da Silva, B.C.
Model-Based Reinforcement Learning with SINDy.
Proceedings of the Workshop on Awareness in Reinforcement Learning, co-located with the 39th International Conference on Machine Learning (DARL@ICML 2022).

Chandak, Y.; Niekum, S.; da Silva, B.C.; Learned-Miller, E.; Brunskill, E.; Thomas, P.S.
Universal Off-Policy Evaluation.
Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
[also published at RLDM 2022 - Best Paper Award].

Nota, C.; da Silva, B.C., Thomas, P.S.
Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods.
Proceedings of the 38th International Conference on Machine Learning (ICML 2021).

Alegre, L.N.; Bazzan, A.L.C.; da Silva, B.C.
Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection.
Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2021).
[also published at the Workshop LatinX in AI (LXAI@ICML 2021) - Best Paper Award].

Strand, Ø.; Reilstad, D.; Wu, Z.; da Silva, B.C; Torresen, J,; Ellefsen, K.
Reactive and Deliberative Adaptive Reasoning - Learning When to Think Fast and When to Think Slow.
Proceedings of the 11th Joint IEEE International Conference on Development and Learning (ICDL 2022).

Weber, A.; Metevier, B.; Brun, Y.; Thomas, P.S.; da Silva, B.C.
Enforcing Delayed-Impact Fairness.
Proceedings of the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022).

Alegre, L.N.; Felten, F.; Talbi, E.; Danoy, G.; Nowé, A., Bazzan, A.L.C.; da Silva, B.C.
MO-Gym: A Library of Multi-Objective Reinforcement Learning Environments.
Proceedings of the 31st Belgium-Netherlands Conference on Artificial Intelligence (BNAIC 2022).

Ramos, G.; da Silva, B.C.; Radulescu, R.; Bazzan, A.L.C.
Toll-Based Reinforcement Learning for Efficient Equilibria in Route Choice.
The Knowledge Engineering Review. Volume 35. Cambridge University Press, 2020.

Thomas, P.S.; da Silva, B.C.; Barto, A.; Giguere, S.; Brun, Y.; Brunskill, E.
Preventing Undesirable Behavior of Intelligent Machines. [Supplementary Material].
Science. 2019.

Garcia, F.M.; da Silva, B.C.; Thomas, P.S.
A Compression-Inspired Framework for Macro Discovery. (Extended Abstract)
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2019).

Weber, A.; Martin, C.P.; Torresen, J.; da Silva, B.C.
Identifying Reusable Early-Life Options.
Proceedings of 9th Joint IEEE International Conference on Development and Learning (ICDL 2019).

Santucci, V.G.; Cartoni, E.; da Silva, B.C.; Baldassarre, G.
Autonomous Reinforcement Learning of Multiple Interrelated Tasks.
Proceedings of 9th Joint IEEE International Conference on Development and Learning (ICDL 2019).

Garcia, R.; Falcao, A.X.; Telea, A.; da Silva, B.C.; Torresen, J.; Comba, J.L.
A Methodology for Neural Network Architectural Tuning Using Activation Occurrence Maps.
Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2019).

del Verme, M.; da Silva, B.C.; Baldassarre, G.
Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints.
Proceedings of the 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019).

Santucci, V.G.; Cartoni, E.; da Silva, B.C.; Baldassarre, G.
Autonomous Open-Ended Learning of Interdependent Tasks.
Proceedings of the 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019).

Weber, A.; Alegre, L.N.; Torresen, J.; da Silva, B.C.
Parameterized Melody Generation with Autoencoders and Temporally-Consistent Noise.
Proceedings of the 19th International Conference on New Interfaces for Musical Expression (NIME 2019).

Lui, M.; Chowdhary, G.; da Silva, B.C.; Liu, S.; How, J.P.
Gaussian Processes for Learning and Control: A Tutorial with Examples.
IEEE Control Systems Magazine. Volume 38. 2018.

Garcia, R.; Telea, A.C.; da Silva, B.C.; Torresen, J.; Comba, J.L.
A task-and-technique centered survey on visual analytics for deep learning model engineering.
Computers & Graphics. Volume 77. Elsevier, 2018.

Ramos, G. O.; da Silva, B.C.; Radulescu, R.; Bazzan, A.L.C.
Learning System-Efficient Equilibria in Route Choice Using Tolls.
Proceedings of the Adaptive Learning Agents Workshop 2018, co-located with the 35th International Conference on Machine Learning (ALA@ICML 2018).

Grunitzki, R.; da Silva, B.C.; Bazzan, A.L.C.
Towards Designing Optimal Reward Functions in Multi-Agent Reinforcement Learning Problems.
Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2018).

Oliveira, T.B.F.; Bazzan, A.L.C.; da Silva, B.C.; Grunitzki, R.
Comparing Multi-Armed Bandit Algorithms and Q-Learning for Multiagent Action Selection: a Case Study in Route Choice.
Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2018).

Ramos, G.O.; da Silva, B.C.; Bazzan, A.L.C.
Analysing the impact of travel information for minimising the regret of route choice.
Transportation Research Part C: Emerging Technologies, Volume 88. 2018.

Garcia, R.; da Silva, B.C.; Comba, J.L.D.
Task-Based Behavior Generalization via Manifold Clustering.
Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017).

Ramos, G.O.; da Silva, B.C.; Bazzan, A.L.C.
Learning to Minimise Action Regret in Route Choice.
Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017).

Grunitzki, R.; da Silva, B.C.; Bazzan, A.L.C.
A Flexible Approach for Designing Optimal Reward Functions. (Extended Abstract)
Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017).

Garant, D.; da Silva, B.C.; Lesser, V.; Zhang, C.
Context-Based Concurrent Experience Sharing in Multiagent Systems. [paper+supplemental material] [Extended Abstract].
Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017).

Thomas, P.S.; da Silva, B.C.; Dann, C.; Brunskill, E.
Energetic Natural Gradient Descent.
Proceedings of the 33rd International Conference on Machine Learning (ICML 2016).

Stefanello, F.; da Silva, B.C.; Bazzan, A.L.C.
Using Topological Statistics to Bias and Accelerate Route Choice: preliminary findings in synthetic and real-world road networks.
Proceedings of the 9th International Workshop on Agents in Traffic and Transportation, co-located with the 25th International Joint Conference on Artificial Intelligence (ATT@IJCAI 2016).

Garant, D.; da Silva, B.C.; Lesser, V.; Zhang, C.
Accelerating multi-agent reinforcement learning with dynamic co-learning.
Technical Report UM-CS-2015-004. Department of Computer Science, University of Massachusetts Amherst.

Baldassarre, G.; Mannella, F.; Santucci, V.G.; Sperati, V.; Caligiore, D.; Cartoni, E.; da Silva, B.C.; Mirolli, M.
Open-Ended Learning of Skills in Robots: Insights from Looking at the Brain.
Proceedings of the 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2015).

da Silva, B.C.; Konidaris, G.; Barto, A.G.
Active Learning of Parameterized Skills.
Proceedings of the 31st International Conference on Machine Learning (ICML 2014).

da Silva, B.C.; Baldassarre, G.; Konidaris, G.; Barto, A.G.
Learning Parameterized Motor Skills on a Humanoid Robot. [video].
Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA 2014).

Corkill, D.; Zhang, C.; da Silva, B.C.; Kim, Y.; Zhang, X.; Lesser, V.
Biasing the Behavior of Organizationally Adept Agents. (Extended Abstract)
Proceedings of the 12th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2013).

da Silva, B.C.; Konidaris, G.; Barto, A.G.
Learning Parameterized Skills.
Proceedings of the 29th International Conference on Machine Learning (ICML 2012).

da Silva, B.C.; Barto, A.G.
TD-Δπ: A Model-Free Algorithm for Efficient Exploration.
Proceedings of the 26th Conference on Artificial Intelligence (AAAI 2012).

da Silva, B.C.; Barto, A.G.; Kurose, J.
Designing Adaptive Sensing Policies for Meteorological Phenomena via Spectral Analysis of Radar Images.
Technical Report UM-CS-2012-006, Department of Computer Science, University of Massachusetts Amherst.

Corkill, D.; Zhang, C.; da Silva, B.C.; Kim, Y.; Zhang, X.; Lesser, V.
Using Annotated Guidelines to Influence the Behavior of Organizationally Adept Agents.
14th International Workshop on Coordination, Organisations, Institutions and Norms (COIN@AAMAS 2012).

Brandalero, M.; Meneguzzi, G.; Oliveira, G.; Goncalves, L.; da Silveira, L.; da Silva, B.C.; Carro, L.; Beck, A.C.
Efficient Local memory support for approximate computing.
VIII Brazilian Symposium on Computing Systems Engineering (SBESC 2018).

Bazzan, A.L.C.; Oliveira, D., da Silva, B.C.
Learning in Groups of Traffic Lights.
Journal of Engineering Applications of Artificial Intelligence. 2010.

Bazzan, A.L.C.; da Silva, B.C.
Distributed Constraint Propagation for Diagnosis of Faults in Physical Processes. (Extended Abstract)
Proceedings of the 6th International Joint Conference On Autonomous Agents And Multiagent Systems (AAMAS 2007).

da Silva, B.C.; Basso, E.W.; Bazzan, A.L.C.; Engel, P.M.
Dealing with Non-Stationary Environments using Context Detection.
Proceedings of the 23rd International Conference on Machine Learning (ICML 2006).

da Silva, B.C.; Basso, E.W.; Bazzan, A.L.C.; Engel, P.M.
Improving Reinforcement Learning with Context Detection.
Proceedings of the 5th International Joint Conference On Autonomous Agents And Multiagent Systems (AAMAS 2006).

da Silva, B.C.; Basso, E.W.; Bazzan, A.L.C.; Engel, P.M.
RL-CD: Dealing with Non-Stationarity in Reinforcement Learning. (Student Abstract)
Proceedings of the 21st Conference on Artificial Intelligence (AAAI 2006).

da Silva, B.C.; Junges, R.; Oliveira, D.; Bazzan, A.L.C.
ITSUMO: an Intelligent Transportation System for Urban Mobility. Demonstration Track.
Proceedings of the 5th International Joint Conference On Autonomous Agents And Multiagent Systems (AAMAS 2006).

da Silva, B.C.; Oliveira, D.; Basso, E.W., Bazzan, A.L.C.
Adaptive Traffic Control with Reinforcement Learning.
Proceedings of the 4th Workshop on Agents in Traffic and Transportation (ATT@AAMAS 2006).

Oliveira, D.; Bazzan, A.L.C.; da Silva, B.C.; Basso, E.W.; Nunes, L.; Rossetti, R.; Oliveira, E.; da Silva, R.; Lamb, L.
Reinforcement Learning based Control of Traffic Lights in Non-stationary Environments: A Case Study in a Microscopic Simulator.
Proceedings of the 4th European Workshop on Multi-Agent Systems (EUMAS 2006).

da Silva, B.C.; Bazzan, A.L.C.; Oliveira, D.; Lopes, F.; Andriotti, G.K.
ITSUMO: an Intelligent Transportation System for Urban Mobility.
Lecture Notes in Computer Science. Springer-Verlag, 2004.

da Silva, B.C.; Weber, R.F.
TuxGuardian: um firewall de host voltado para o usuário final.
Proceedings of the 2nd Brazilian Symposium on Computer Networks.

Almeida, L.; da Silva, B.C.; Bazzan, A.L.C.
Towards a physiological model of emotions: first steps.
AAAI Spring Symposium (AAAI 2004).