My research interests are in reinforcement learning (RL), decision making, robotics, and AI safety.
The ultimate goal of my work is to design the necessary tools so that RL algorithms can be widely used to solve challenging real-world tasks in homes and in the workplace, in a safe way, and with as little human intervention as possible.
I am interested, in particular, in two key problems:
how to design general-purpose RL algorithms capable of autonomously decomposing complex tasks into simpler sub-problems, for which specialized reusable and composable skills be can be learned; and
how to ensure that these skills are learned in a way that meets user-specified safety requirements with high probability.
These are fundamental questions that underlie the gap between what artificial intelligence agents can—in principle—do and what we can effectively get them to do given our current algorithms.
More broadly, my research interests lie in the intersection of machine learning, reinforcement learning, optimal control theory, and robotics, and include the construction of hierarchical policies, active learning, open-ended learning, biologically-plausible intrinsic motivation mechanisms, Bayesian optimization applied to control, and machine learning algorithms with high-probability safety and fairness guarantees.
I completed my Masters Degree in Computer Science in 2007 under the supervision of Prof. Ana Bazzan at the Federal University of Rio Grande do Sul, in Brazil. I completed my B.S. in Computer Science cum laude at that same university in 2004.
I have worked, in different occasions from 2011 to 2018, as a visiting researcher at the Laboratory of Computational Embodied Neuroscience, in the
Istituto di Scienze e Tecnologie della Cognizione, in Rome, developing novel control algorithms for a humanoid robot.
In the Summer of 2014 I worked at Adobe Research, where I developed large-scale optimization techniques for the construction of high-performance features for digital marketing optimization.
From 2011 to 2015 I collaborated with Prof. Victor Lesser on the problem of designing organizationally adept agents and on coordinating learning through emergent distributed supervisory control.
Gupta, D.; Chandak, Y.; Jordan, S.; Thomas, P.S.; da Silva, B.C. Behavior Alignment via Reward Function Optimization. (To appear) Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
[Spotlight — Top 3% among submissions].
Alegre, L.N.; Bazzan, A.L.C.; Nowé, A.; da Silva, B.C. Multi-Step Generalized Policy Improvement by Leveraging Approximate Models. (To appear) Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Felten, F.; Alegre, L.N.; Nowé, A.; Bazzan, A.L.C.; Talbi, E.; Danoy, G.; da Silva, B.C. A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning. (To appear) Proceedings of Neural Information Processing Systems Track on Datasets and Benchmarks (Datasets and Benchmarks@NeurIPS 2023).
Polosky, N.; da Silva, B.C.; Fiterau, M.; Jagannat, J. Constrained Offline Policy Optimization. Proceedings of the 39th International Conference on Machine Learning (ICML 2022).
Giguere, S.; Metevier, B.; da Silva, B.C.; Brun, Y.; Thomas, P.S.; Niekum, S. Fairness Guarantees under Demographic Shift. Proceedings of the 10th International Conference on Learning Representations (ICLR 2022).
Arora, R.; Moss, E.; da Silva, B.C. Model-Based Reinforcement Learning with SINDy. Proceedings of the Workshop on Awareness in Reinforcement Learning, co-located with the 39th International Conference on Machine Learning (DARL@ICML 2022).
Chandak, Y.; Niekum, S.; da Silva, B.C.; Learned-Miller, E.; Brunskill, E.; Thomas, P.S. Universal Off-Policy Evaluation. Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
[also published at RLDM 2022 - Best Paper Award].
Strand, Ø.; Reilstad, D.; Wu, Z.; da Silva, B.C; Torresen, J,; Ellefsen, K. Reactive and Deliberative Adaptive Reasoning - Learning When to Think Fast and When to Think Slow. Proceedings of the 11th Joint IEEE International Conference on Development and Learning (ICDL 2022).
Weber, A.; Metevier, B.; Brun, Y.; Thomas, P.S.; da Silva, B.C. Enforcing Delayed-Impact Fairness. Proceedings of the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022).
Garcia, F.M.; da Silva, B.C.; Thomas, P.S. A Compression-Inspired Framework for Macro Discovery. (Extended Abstract) Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2019).
Weber, A.; Martin, C.P.; Torresen, J.; da Silva, B.C. Identifying Reusable Early-Life Options. Proceedings of 9th Joint IEEE International Conference on Development and Learning (ICDL 2019).
Santucci, V.G.; Cartoni, E.; da Silva, B.C.; Baldassarre, G. Autonomous Reinforcement Learning of Multiple Interrelated Tasks. Proceedings of 9th Joint IEEE International Conference on Development and Learning (ICDL 2019).
del Verme, M.; da Silva, B.C.; Baldassarre, G. Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints. Proceedings of the 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019).
Santucci, V.G.; Cartoni, E.; da Silva, B.C.; Baldassarre, G. Autonomous Open-Ended Learning of Interdependent Tasks. Proceedings of the 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019).
Ramos, G. O.; da Silva, B.C.; Radulescu, R.; Bazzan, A.L.C. Learning System-Efficient Equilibria in Route Choice Using Tolls. Proceedings of the Adaptive Learning Agents Workshop 2018, co-located with the 35th International Conference on Machine Learning (ALA@ICML 2018).
Oliveira, T.B.F.; Bazzan, A.L.C.; da Silva, B.C.; Grunitzki, R. Comparing Multi-Armed Bandit Algorithms and Q-Learning for Multiagent Action Selection: a Case Study in Route Choice. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2018).
Thomas, P.S.; da Silva, B.C.; Dann, C.; Brunskill, E. Energetic Natural Gradient Descent. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016).
Garant, D.; da Silva, B.C.; Lesser, V.; Zhang, C. Accelerating multi-agent reinforcement learning with dynamic co-learning. Technical Report UM-CS-2015-004. Department of Computer Science, University of Massachusetts Amherst.
Baldassarre, G.; Mannella, F.; Santucci, V.G.; Sperati, V.; Caligiore, D.; Cartoni, E.; da Silva, B.C.; Mirolli, M. Open-Ended Learning of Skills in Robots: Insights from Looking at the Brain. Proceedings of the 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2015).
da Silva, B.C.; Konidaris, G.; Barto, A.G. Active Learning of Parameterized Skills. Proceedings of the 31st International Conference on Machine Learning (ICML 2014).
Corkill, D.; Zhang, C.; da Silva, B.C.; Kim, Y.; Zhang, X.; Lesser, V. Biasing the Behavior of Organizationally Adept Agents. (Extended Abstract) Proceedings of the 12th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2013).
da Silva, B.C.; Konidaris, G.; Barto, A.G. Learning Parameterized Skills. Proceedings of the 29th International Conference on Machine Learning (ICML 2012).
Corkill, D.; Zhang, C.; da Silva, B.C.; Kim, Y.; Zhang, X.; Lesser, V. Using Annotated Guidelines to Influence the Behavior of Organizationally Adept Agents. 14th International Workshop on Coordination, Organisations, Institutions and Norms (COIN@AAMAS 2012).
Brandalero, M.; Meneguzzi, G.; Oliveira, G.; Goncalves, L.; da Silveira, L.; da Silva, B.C.; Carro, L.; Beck, A.C. Efficient Local memory support for approximate computing. VIII Brazilian Symposium on Computing Systems Engineering (SBESC
2018).
Bazzan, A.L.C.; Oliveira, D., da Silva, B.C. Learning in Groups of Traffic Lights. Journal of Engineering Applications of Artificial Intelligence. 2010.
Bazzan, A.L.C.; da Silva, B.C. Distributed Constraint Propagation for Diagnosis of Faults in Physical Processes. (Extended Abstract) Proceedings of the 6th International Joint Conference On Autonomous Agents And Multiagent Systems (AAMAS 2007).
da Silva, B.C.; Basso, E.W.; Bazzan, A.L.C.; Engel, P.M. Improving Reinforcement Learning with Context Detection. Proceedings of the 5th International Joint Conference On Autonomous Agents And Multiagent Systems (AAMAS 2006).
da Silva, B.C.; Basso, E.W.; Bazzan, A.L.C.; Engel, P.M. RL-CD: Dealing with Non-Stationarity in Reinforcement Learning. (Student Abstract) Proceedings of the 21st Conference on Artificial Intelligence (AAAI 2006).
da Silva, B.C.; Junges, R.; Oliveira, D.; Bazzan, A.L.C. ITSUMO: an Intelligent Transportation System for Urban Mobility. Demonstration Track. Proceedings of the 5th International Joint Conference On Autonomous Agents And Multiagent Systems (AAMAS 2006).
da Silva, B.C.; Oliveira, D.; Basso, E.W., Bazzan, A.L.C. Adaptive Traffic Control with Reinforcement Learning. Proceedings of the 4th Workshop on Agents in Traffic and Transportation (ATT@AAMAS 2006).
Oliveira, D.; Bazzan, A.L.C.; da Silva, B.C.; Basso, E.W.; Nunes, L.; Rossetti, R.; Oliveira, E.; da Silva, R.; Lamb, L. Reinforcement Learning based Control of Traffic Lights in Non-stationary Environments: A Case Study in a Microscopic Simulator. Proceedings of the 4th European Workshop on Multi-Agent Systems (EUMAS 2006).
da Silva, B.C.; Bazzan, A.L.C.; Oliveira, D.; Lopes, F.; Andriotti, G.K. ITSUMO: an Intelligent Transportation System for Urban Mobility. Lecture Notes in Computer Science. Springer-Verlag, 2004.
da Silva, B.C.; Weber, R.F. TuxGuardian: um firewall de host voltado para o usuário final. Proceedings of the 2nd Brazilian Symposium on Computer Networks.
Almeida, L.; da Silva, B.C.; Bazzan, A.L.C. Towards a physiological model of emotions: first steps. AAAI Spring Symposium (AAAI 2004).