Yuriy Brun

    Assistant Professor

College of Information & Computer Science

140 Governors Drive

University of Massachusetts

Amherst, MA 01003-9264 USA

Office: 346
Phone: +1-413-577-0233
Fax: +1-413-545-1249

curriculum vitae

Research interests

My research is in software engineering. I am interested in improving our ability to build systems that are smart, and self-adapt to their environment.
I co-direct the LASER and PLASMA laboratories.

Current projects

Automatically inferring precise models of system behavior

Debugging and improving systems requires understanding their behavior. Using source code and execution logs to understand behavior is a daunting task. We develop tools to aid understanding and development tasks, many of which involve automatically inferring a human-readable FSM-based behavioral model. This project is or has been funded by the NSF, Google, and Microsoft Research. We have shown that the inferred models can be used to detect software vulnerabilities (read) and are currently building model-based tools to improve test suite quality at Google.

Perfume: Performance-aware model inference. Perfume observes system executions and infers a compact model of the system that accounts for performance phenomena. The model also displays performance information. These models capture behavior other inference misses, such as cache use, pipelining, lazy evaluation, and loop perforation. A front end that makes the models interactive and queryable helps developers understand and debug systems. Controlled experiments show that these models help developers more effectively and quickly answer questions about software system performance. Watch a Perfume video demo, read and read, or try it out.

CSight: Inferring precise models of concurrent systems. Concurrent, networked, and other distributed systems are especially difficult to debug and understand. CSight automatically infers communicating FSM-based models that accurately describe such system behavior and host interaction from executions. CSight models improve developer understanding and ability to answer questions about distributed systems. Read.

InvariMint: Specifying model inference algorithms declaratively. Model inference algorithms can be hard to understand and compare. To help, InvariMint specifies such algorithms using a single, common, declarative language. InvariMint makes it easy to understand, compare, and extend such algorithms, and efficiently implementing them as well. Read or read.

Synoptic: Summarizing system logs with refinement. Minimal model inference is NP-complete, and inference from incomplete observations in underspecified, so at the heart of all (efficient) model inference lie heuristics and generalization assumptions. Synoptic capitalizes on the observation that a small number of temporal (LTL) property types specify the majority of system requirements, mines likely LTL properties of that type from observed executions, and uses them to infer a compact, precise FSM-based model of system behavior. Looking as these models has led developers to identify bugs and anomalies quickly, empirically verify the correctness of their system, and understand behavior. Read.

All tools are open-source. Collaborators include or have included Tony Ohmann, Ryan Stanley, Jenny Abrahamson, Michael Herzberg, Sebastian Fiss, Armand Halbert, Marc Palyart, William Borkofsky, Kevin Thai, Tim Vega, Ivan Beschastnikh, Michael D. Ernst, Thomas E. Anderson, and Arvind Krishnamurthy.

High-quality automated program repair

Bugs in software are incredibly costly, but so is software debugging. More bugs are reported daily than developers can handle. Automatic program repair is an exciting approach, but can we trust automatically generated patches? Automated program repair research has produced dozens of repair techniques that use a buggy program and a partial specification of that program, e.g., a set of tests describing the desired behavior. Due to the partial nature of the specification, patches that satisfy the written specification may not satisfy the intended, unwritten, full specification. The goal of this project is to develop benchmarks and tools to evaluate the quality of repair, and to improve the repair quality. We have developed an objective measure of repair quality that uses an independent specification — tests that are not part of the repair process — and found that existing techniques often produce low-quality repairs that break existing (undertested) functionality (read) using our new benchmark for repair quality (read). Next, we developed SearchRepair, a technique that replaces likely buggy code with publicly available human-written code to repair defects. SearchRepair works at a higher level of granularity than prior repair techniques, and it produces repairs of much higher quality (though making SearchRepair scale to the size programs prior techniques can repair is an ongoing effort). For example, in small C programs, SearchRepair generated repairs that pass 97.2% of independent tests, whereas prior techniques' repairs passed no more than 72.1% (read).

This project is funded by the NSF. The tools and benchmarks are all open-source and publicly available. Collaborators include or have included Ted Smith, Manish Motwani, Afsoon Afzal, Mauricio Soto, Yalin Ke, Neal Holts, Sasha Medvidovic, Claire Le Goues, Kathryn T. Stolee, Prem Devanbu, Stephanie Forrest, and Westley Weimer.

Self-adaptive systems

Self-adaptive systems are capable of handling runtime changes in requirements and the environment. When something goes wrong, self-adaptive systems can recover on-the-fly. Building such systems poses many challenge as developers often have to design the system without knowing all requirements changes and potential attacks that could take place. Read about the challenges of and a research roadmap for self-adaptive systems here and here. Most of our work has some aspects of self-adaptation, from biologically-inspired security work to self-adaptive reliability to automated program repair.

Fairness testing

Software has become ubiquitous in our society, which, if unchecked, allows it to enforce stereotypes and discrimination. For example, software is used to compute risk-assessment scores for suspected criminals; these scores — an estimated probability that the person arrested for a crime is likely to commit another crime — are used to inform decisions about who can be set free at every stage of the criminal justice system process, from assigning bond amounts, to deciding guilt, to sentencing. Recent examples of software discrimination include Amazon's software denying same-day delivery to minority neighborhoods and Apple users being recommended higher-priced hotels. This project develops methodologies and tools for testing fairness of software, identifying both correlation and causation of discrimination in software outcomes.

Collaborators include Sainyam Galhotra and Alexandra Meliou.

Speculative analysis

When developers make decisions, they often don't know the consequences of their potential actions. Wouldn't it be great if they did? Speculative analysis makes that possible by speculating what actions developers might perform, executing them in the background, and computing the consequences. For example, Crystal proactively detects textual, build, and test conflicts between collaborators. And Quick Fix Scout helps developers resolve compilation errors. Watch a video about Crystal or Read: Proactive Detection of Collaboration Conflicts. This work has won an ACM Distinguished Paper award, an IEEE TSE spotlight paper award, and has influenced tools and collaborative development practices at Microsoft. The latest effort is targeting collaborative architectural design.

This project is funded by the NSF. The tools are all open-source. Collaborators include or have included Kıvanç Muşlu, Jae young Bang Reid Holmes, Nenad Medvidović, Michael D. Ernst, and David Notkin.

Reliability through smart redundancy

One of the most common ways of achieving system reliability is through redundancy. But how can we ensure we are using the resources in a smart way? Can we guarantee that we are squeezing the most reliability possible out of our available resources? A new technique called smart redundancy says we can! Read: Smart redundancy for distributed computation.
Collaborators: George Edwards, Jae young Bang, and Nenad Medvidović.

Past projects

Privacy preservation in distributed computation via sTile

My email, my taxes, my research, my medical records are all on the cloud. How do I distribute computation onto untrusted machines while making sure those machines cannot read my private data? sTile makes this possible by breaking up the computation into tiny pieces and distributing those pieces on a network in a way that makes is nearly impossible for an adversary controlling less than half of the network to reconstruct the data. Read: Keeping Data Private while Computing in the Cloud.
Collaborator: Nenad Medvidović.

Bug cause analysis

When regression tests break, can the information about the latest changes help find the cause of the failure? The cause might be a bug in recently added code, or it could instead be in old code that wasn't exercised in a fault-revealing way until now. Considering the minimal test-failing sets of recent changes, and maximal test-passing sets of those changes can identify surprises about the failure's cause. Read: Understanding Regression Failures through Test-Passing and Test-Failing Code Changes.
Collaborators: Roykrong Sukkerd, Ivan Beschastnikh, Jochen Wuttke, and Sai Zhang.

DNA Self-Assembly

How do simple objects self-assemble to form complex structures? Answering that question can lead to understanding interactions of objects as simple as atoms and as complex as software systems. Studying mathematical models of molecular interactions and gaining control over nanoscale DNA structures, begins to understand this space. Read: (theory) Solving NP-Complete Problems in the Tile Assembly Model or (experimental) Self-Assembly of DNA Double-Double Crossover Complexes into High-Density, Doubly Connected, Planar Structures.
Collaborators: Dustin Reishus, Manoj Gopalkrishnan, Bilal Shaw, Nickolas Chelyapov, and Leonard Adleman.

The fault-invariant classifier

Can the wealth of knowledge and examples of bugs make it easier to discover unknown, latent bugs in new software? For example, if I have access to bugs, and bug fixes in Windows 7, can I find bugs in Windows 8 before it ships? Machine learning makes this possible for certain classes of often-repeated bugs. Read: Finding Latent Code Errors via Machine Learning Over Programs Executions.
Collaborator: Michael D. Ernst.

Other projects