Kaleigh Clary
PhD Candidate, Causality & ML
Knowledge Discovery Laboratory
Manning College of Information and Computer Sciences
University of Massachusetts Amherst


[ Projects ] [ CV ]

I am on the job market for Fall 2023!


  • Evaluation of Learned Representations in Open-World Environments for Sequential Decision-Making
    Kaleigh Clary

    My dissertation research focuses on sequential decision-making (SDM) in complex environments, and how agents can perform well even when novelty is introduced to those environments. In the open-world setting, a wide range of changes may be introduced in the environment without notifying the agent. The problem of how agents can respond intelligently to novelty has been a long-standing challenge in AI, and poses unique problems across approaches to SDM. I study the conditions under which causal-relational queries can be estimated from non-novel observations, and empirically examine the effects of open-world novelty on agent behavior.

    I was selected as a 2023 AAAI Doctoral Consortium Fellow and presented this work as part of the 2023 AAAI DC.

    [ paper ] [ bibtex ]



  • Correcting for Non-Cooperative Behavior of Subjects in Experiments on Social Networks
    Kaleigh Clary, Emma Tosch, Jeremiah Onaolapo, David Jensen

    A large body of research in network and social sciences studies the effects of interventions in network systems and assumes that network participants will respond to interventions in similar ways. In real-world systems, a subset of participants may respond in ways purposefully different than their true outcome. We characterize the influence of non-cooperative nodes and the bias these nodes introduce in estimates of average treatment effect (ATE). We provide theoretical bounds on estimation bias introduced through non-cooperative behavior and conduct empirical demonstrations through experiments on synthetically generated graphs and a real-world network.

    This work appeared at USENIX Security 2022.

    [ paper ] [ bibtex ] [ code ]



  • Detection, Resilience, and Adaptation under Open-World Novelty in Sequential Decision-Making
    UMass CICS Knowledge Discovery Laboratory, DARPA SAIL-ON
    Kaleigh Clary, Andy Zane, Justin Clarke, Sam Witty, David Westbrook, Przemyslaw Grabowicz, David Jensen

    In open-world learning, environments for sequential decision-making (SDM) may diverge from training contexts under a diverse set of potential changes (open-world novelty). We developed agents and machine learning models for temporal anomaly detection and few-shot transfer in four domains for SDM under open-world novelty.

    Our simulation-based SAIL-ON agent system CIMARRON was the 1st place overall and in the Novelty Track in the 2021 Angry Birds AI Competition run by Australia National University as part of the AIBIRDS Competition Session at IJCAI 2021. Our team later achieved 95% true positive detection accuracy and 6% false positive rate in external program evaluation of anomaly detection under unrevealed forms of novelty as part of the DARPA SAIL-ON program.



  • Causal Reasoning for Explainability of Deep Networks: Analysis of Saliency Maps in Deep Reinforcement Learning
    Akanksha Atrey, Kaleigh Clary, David Jensen

    Saliency maps have been used to support explanations of deep reinforcement learning (RL) agent behavior over temporally extended sequences. However, their use in the community indicates that the explanations derived from saliency maps are often unfalsifiable and can be highly subjective. We introduce an empirical approach grounded in counterfactual reasoning to test the hypotheses generated from saliency maps and assess the degree to which saliency maps represent semantics of RL environments.

    This work appeared at ICLR 2020. An earlier version was accepted as a poster at the 2019 WiML Workshop co-located with NeurIPS 2019.

    [ paper ] [ bibtex ] [ code ] [ reviews ]



  • Post-Training Variability of Deep Reinforcement Learning Models
    Kaleigh Clary, Emma Tosch, John Foley, David Jensen

    Reproducibility in deep reinforcement learning has proven challenging due to the large number of factors influencing agent performance. We find that post-training performance distributions can exhibit several characteristics that make common summary statistics an unsound metric for reporting agent performance, including fat-tailed and multi-modal reward distributions.

    This work was accepted at the 2018 NeurIPS Critiquing and Correcting Trends workshop and featured in a spotlight talk.

    [ paper ] [ bibtex ] [ code ] [ slides ]



  • Toybox: Atari Reimplementations for Interventional Experimentation in Deep Reinforcement Learning
    Emma Tosch, John Foley, Kaleigh Clary, David Jensen

    Atari games have been used as the de facto benchmark suite for deep reinforcement learning as enabled by the Arcade Learning Environment. Unfortunately, these gaming environments are black boxes that do not permit systemtic intervention on program state. We developed Toybox, a suite of Atari games and associated testing framework for validating behavioral requirements of agents trained on Atari games. Toybox increased testing efficiency by as much as 4.4x and enabled new evaluation designs via a reconfigurable software mock for a set of common deep reinforcement learning benchmark environments.

    Toybox was presented at the 2018 IBM AI Systems Day and as a poster at the 2018 NeurIPS Systems for ML Workshop.

    [ paper ] [ bibtex ] [ code] [ preprint ]



  • Data Science for Social Good: Predicting Risk of Type II Diabetes
    DSSG Fellows: Benjamin Ackerman, Kaleigh Clary, Jorge Saldivar, William Wang
    Technical Leads: Adolfo De Unánue, Elena Eneva and Rayid Ghani

    Data Science for Social Good is a summer fellowship program hosted at the University of Chicago (now at CMU) that brings together graduate students and young professionals representing a diverse set of skills and backgrounds to work closely with governments, nonprofits, and relevant stakeholders to develop solutions for policy and social problems across health, criminal justice, education, public safety, social services, and economic development.

    Our team developed a model to identify patients at risk of developing type II diabetes. Existing diabetes screening guidelines miss opportunities for prevention, diagnosis, and treatment among minority populations. We partnered with AllianceChicago to identify patients at risk of developing type II diabetes so that its network of community health centers can provide better medical treatment. AllianceChicago plans to integrate the risk model into its electronic health records system (EHR) to help clinicians personalize their recommendations to patients and reduce their risk of developing diabetes.

    [ DataFest 2018 slides ] [ code (awaiting public release) ]



  • A/B Testing in Networks with Adversarial Nodes
    Kaleigh Clary, Andrew McGregor, David Jensen

    Causal estimation over large-scale relational systems requires careful experimental design, as the treatment compliance and response of an individual may be influenced by the outcomes of other individuals. In some cases, members of the relational network may be targeting their output to influence their neighbors. These adversarial nodes can influence effect estimates by leveraging peer effects to influence the outcomes of their neighbors, yet these nodes may not be known or detectable. Our work demonstrates that estimates of average treatment effect in networks can be biased due to the actions of adversaries, and we identify network structures that are particularly vulnerable to adversarial responses.

    This work was accepted for a short talk at the 2017 KDD Mining and Learning with Graphs Workshop, and was awarded a CICS Outstanding Synthesis Project award in Spring 2018.

    [ paper ] [ bibtex ] [ code ] [ video ] [ slides ]