Yuriy Brun


Manning College of    Information & Computer Sciences

140 Governors Drive

University of Massachusetts

Amherst, MA 01003-9264 USA

Office: 302
Phone: +1-413-577-0233
Fax: +1-413-545-1249

curriculum vitae

Research interests

My research aims to improve how we build software systems, focusing on proving system correctness, mitigating bias, and enabling self-adaption and self-repair. I co-direct the LASER and PLASMA laboratories.


My lab's research focuses on high-risk, high-impact problems, with the aim of fundamentally improving how engineers build systems. Sometimes we fail. Sometimes we succeed and make a difference in industry, academia, or both.
Privacy-preserving distributed computing. On the cloud, our private data live on computers we don't own, under security policies we don't control. To tackle that problem, we developed bio-inspired, self-organizing distributed systems that preserve privacy, even on untrusted machines. This work has won a test of time award and I received an IEEE TCSC Young Achiever in Scalable Computing Award for my contributions to privacy, reliability, and self-adaptation in distributed computing.
Fairness and safety in machine learning. Studies have shown that modern software systems can harm humans and exhibit sexist and racist behavior. To build safer and fairer systems, the entire system-building process needs to account for potential harms. We pioneered the foundation of bias as a software engineering concern, founding the field of software fairness, authoring the seminal paper on fairness testing and developing the first machine learning techniques that probabilistically guarantee fairness and safety even when trained on biased data. For this work, I was awarded the IEEE Computer Society TCSE New Directions Award.
Automatically repairing programs. Software developers spend more than half their time debugging. Automated repair technology has the potential to save time and significantly reduce costs. Unfortunately, my lab has shown that this technology can invisibly break functionality as often as repair it, overfitting to the written specification. We established overfitting as a fundamental challenge in automated program repair and created the first objective methodology for measuring overfitting. Our methodology has become ubiquitous, leading to a rich body of research that focuses on reducing overfitting, improving the field.
Speculative analysis. Developers are often overwhelmed by decisions. We invented speculative analysis to concretely predict the outcomes of developer decisions, reducing the developers' load. Speculative analysis has been used internally by Microsoft and Infosys, and found to be "the most industrially relevant software engineering research published in the prior five years, out of a total of 571 research papers" by an independent study.

Our work has received a Microsoft Research Software Engineering Innovation Foundation Award, a Google Inclusion Research Award, an Amazon Research Award, a Google Faculty Research Award, and an NSF CAREER Award, as well as five ACM SIGSOFT and SIGPLAN Distinguished Paper Awards and a Best Paper Award, and has been funded by the NSF, DARPA, IARPA, Amazon, Dolby, Google, Kosa.ai, Microsoft, Meta, and Oracle.

Current projects

Automating formal verification

Formal verification—proving the correctness of software—is one of the most effective ways of improving software quality. But, despite significant help from interactive theorem provers and automation tools such as hammers, writing the required proofs is an incredibly difficult, time consuming, and error prone manual process. My work has pioneered ways to use natural language processing to synthesize formal verification proofs. Automating proof synthesis can drastically reduce costs of formal verification and improve software quality. The unique nature of formal verification allows us to combine multiple models to fully automatically prove 33.8% of formal verification theorems in Coq (ICSE'22 Distinguished Paper Award), and use large language models to synthesize guaranteed-correct proofs for 65.7% of mathematical theorems in Isabelle (ESEC/FSE'23 Distinguished Paper Award).

Provably fair and safe software

Today, increasingly more software uses artificial intelligence. Unfortunately, recent investigations have shown that such systems can discriminate and be unsafe for humans. My work identified that numerous challenges in this space are fundamentally software engineering challenges, founded the field of software fairness (ESEC/FSE'17 Distinguished Paper Award), established a roadmap for this field, and developed the first machine learning algorithms that provide provable guarantees that learned models adhere to safety and fairness requirements. Next, we showed that such guarantees can hold for contextual bandits and when the training data comes from a different distribution than the data the system is applied to at runtime. Importantly, recent work has shown that, sometimes, optimizing short-term fairness constraints can lead to adverse outcomes in the long term. Our most recent work showed that machine learning can guarantee constraints on both, short term and long-term, delayed impact by formulating the learning goal as simultaneously a fairness-constrained classification problem and a reinforcement learning problem for optimizing the long-term impact. The central goal of our work is to make safe and fair machine learning accessible to software engineers, who have neither extensive machine learning expertise nor are themselves experts in the application domains that require fairness and safety, such as healthcare or law (video).

Trust in machine learning

Increasingly, software systems use machine learning. Whether users understand how this technology functions can fundamentally affect their trust in these systems. We study how aspects of machine learning, such as precision and bias, affect people's perception and trust, and how explainability techniques can better educate users, engineers, and policymakers into making better, evidence-based decisions about machine learning. Our findings include that bias affects men and women differently when making trust-based decisions, and that visualization design choices have a significant impact on bias perception, and on the resulting trust.

High-quality automated program repair

Software bugs are incredibly costly, but so is software debugging, as more new bugs are reported every day than developers can handle. Automatic program repair has the potential to significantly reduce time and cost of debugging. But our work has shown that such techniques can often break as much functionality as they repair. Our work has focused on improving the quality of automated program repair. We developed an objective methodology for measuring overfitting that leads to low-quality repairs, which has become ubiquitous in the field and has led to significant improvements in repair quality. Our controlled user study found that automated repair positively impacts the human-driven debugging process, which is how it is used in industry today.

Past projects

Proactive detection of collaboration conflicts

One of my most impactful projects, in terms of influence on industry and others' research, has been work on collaborative development. Using our speculative analysis technology, we built Crystal, a tool to proactively detect conflicts between collaborating developers. Crystal automatically detects textual and behavioral conflicts that happen when multiple developers work in parallel, and unobtrusively informs the developers. This helps developers resolve, and even sometimes altogether avoid conflicts. The work won an ACM SIGSOFT Distinguished Paper Award at ESEC/FSE 2011, and an IEEE TSE Spotlight Paper recognition for the TSE extension (the two papers combined have been cited more than 300 times). This work directly led to new collaborative development tools and practices within Microsoft, and was awarded a Microsoft Research Software Engineering Innovation Foundation Award. An independent study found this work "the most industrially relevant software engineering research published in the prior five years, out of a total of 571 research papers." This work influenced our 2017 development of FLAME that proactively detects architects' conflicts during collaborative design. Awarded the Best Paper Award at IEEE ICSA, FLAME has led to industrial development of tools used internally by Infosys every day. Watch a video about Crystal.

Privacy preservation in distributed computation

My email, my taxes, my research, my medical records are all on the cloud. How do I distribute computation onto untrusted machines while making sure those machines cannot read my private data? sTile makes this possible by breaking up the computation into tiny pieces and distributing those pieces on a network in a way that makes is nearly impossible for an adversary controlling less than half of the network to reconstruct the data. This work was awarded the SEAMS 2020 Most Influential Paper award. Watch a video about sTile's impact, or read: Keeping Data Private while Computing in the Cloud.

Automatically inferring precise models of system behavior

Debugging and improving systems requires understanding their behavior. Using source code and execution logs to understand behavior is a daunting task. We develop tools to aid understanding and development tasks, many of which involve automatically inferring a human-readable FSM-based behavioral model. This project is or has been funded by the NSF, Google, and Microsoft Research. We have shown that the inferred models can be used to detect software vulnerabilities (read) and are currently building model-based tools to improve test suite quality at Google.

Perfume: Performance-aware model inference. Perfume observes system executions and infers a compact model of the system that accounts for performance phenomena. The model also displays performance information. These models capture behavior other inference misses, such as cache use, pipelining, lazy evaluation, and loop perforation. A front end that makes the models interactive and queryable helps developers understand and debug systems. Controlled experiments show that these models help developers more effectively and quickly answer questions about software system performance. Watch a Perfume video demo, read and read, or try it out.

CSight: Inferring precise models of concurrent systems. Concurrent, networked, and other distributed systems are especially difficult to debug and understand. CSight automatically infers communicating FSM-based models that accurately describe such system behavior and host interaction from executions. CSight models improve developer understanding and ability to answer questions about distributed systems. Read.

InvariMint: Specifying model inference algorithms declaratively. Model inference algorithms can be hard to understand and compare. To help, InvariMint specifies such algorithms using a single, common, declarative language. InvariMint makes it easy to understand, compare, and extend such algorithms, and efficiently implementing them as well. Read or read.

Synoptic: Summarizing system logs with refinement. Minimal model inference is NP-complete, and inference from incomplete observations in underspecified, so at the heart of all (efficient) model inference lie heuristics and generalization assumptions. Synoptic capitalizes on the observation that a small number of temporal (LTL) property types specify the majority of system requirements, mines likely LTL properties of that type from observed executions, and uses them to infer a compact, precise FSM-based model of system behavior. Looking as these models has led developers to identify bugs and anomalies quickly, empirically verify the correctness of their system, and understand behavior. Read.

Reliability through smart redundancy

One of the most common ways of achieving system reliability is through redundancy. But how can we ensure we are using the resources in a smart way? Can we guarantee that we are squeezing the most reliability possible out of our available resources? A new technique called smart redundancy says we can! Read: Smart redundancy for distributed computation.

Self-adaptive systems

Self-adaptive systems are capable of handling runtime changes in requirements and the environment. When something goes wrong, self-adaptive systems can recover on-the-fly. Building such systems poses many challenge as developers often have to design the system without knowing all requirements changes and potential attacks that could take place. Read about the challenges of and a research roadmap for self-adaptive systems here and here. Most of our work has some aspects of self-adaptation, from biologically-inspired security work to self-adaptive reliability to automated program repair.

Bug cause analysis

When regression tests break, can the information about the latest changes help find the cause of the failure? The cause might be a bug in recently added code, or it could instead be in old code that wasn't exercised in a fault-revealing way until now. Considering the minimal test-failing sets of recent changes, and maximal test-passing sets of those changes can identify surprises about the failure's cause. Read: Understanding Regression Failures through Test-Passing and Test-Failing Code Changes.

DNA Self-Assembly

How do simple objects self-assemble to form complex structures? Answering that question can lead to understanding interactions of objects as simple as atoms and as complex as software systems. Studying mathematical models of molecular interactions and gaining control over nanoscale DNA structures, begins to understand this space. Read: (theory) Solving NP-Complete Problems in the Tile Assembly Model or (experimental) Self-Assembly of DNA Double-Double Crossover Complexes into High-Density, Doubly Connected, Planar Structures.

The fault-invariant classifier

Can the wealth of knowledge and examples of bugs make it easier to discover unknown, latent bugs in new software? For example, if I have access to bugs, and bug fixes in Windows 7, can I find bugs in Windows 8 before it ships? Machine learning makes this possible for certain classes of often-repeated bugs. Read: Finding Latent Code Errors via Machine Learning Over Programs Executions.