Alexandra Meliou

Professor

Associate Chair for
Faculty Development

Robert and Donna Manning
College of Information and Computer Sciences

140 Governors Drive

University of Massachusetts

Amherst, MA 01003-9264 USA

Email:
Office:	330
Phone:	+1-413-545-3788
Fax:	+1-413-545-1249

Research

DREAM lab: Data-systems Research for Exploration, Analytics, and Mining

I co-direct the DREAM lab, with an awesome group of people! UMass is a vibrant place, and one of the top universities in the US for data management research!

My interests

Data is critical in almost every aspect of society, including education, technology, healthcare, economy, and science. Poor understanding and handling of data, data biases, poor data quality, and errors in data-driven processes are detrimental in all domains that rely on data. My research augments data management with user-facing functionality that helps people make sense of their data and use it effectively, at a time when data becomes increasingly unpredictable, unwieldy, and unmanageable. I focus on issues of provenance, causality, explanations, data quality, usability, and data and algorithmic bias.

Prospective students

I am actively recruiting strong graduate and junior / senior undergraduate students to work on research projects. If you are a UMass student interested in doing research with me, you should email me to set up an appointment, giving me a brief summary of your background and interests. My interests are not restricted to my existing projects, so do not hesitate to come to me with ideas of your own.

If you are not yet a UMass student: You should apply (undergraduate | graduate) to become one before contacting me. Emailing me directly will not affect your chances of admission, and I will not be able to respond to these requests. However, if you have done research, you can have your mentor email me with a recommendation.

Research highlights

Usability and analysis

As data is now a staple in so many aspects of human activity, the audience for data technologies has expanded to include a varied range of users: from non-experts wishing to peruse datasets, to domain experts with specialized data processing needs. Data systems have not adapted to address these demands effectively: databases' specialized query languages and structure create barriers for non-experts, while the lack of native support for important computing needs leaves experts to develop application-specific solutions themselves. Our work removes data-use barriers by simplifying access for non-experts to data and by augmenting database functionality with advanced problem-solving capabilities, thus simplifying analytics workflows by moving them closer to the data.

PackageBuilder: supporting queries for packages [Project page]
Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples—which we call a package—to satisfy constraints collectively, rather than individually. We developed an end-to-end system that supports package queries, allowing the declarative specification and efficient evaluation of a significant class of constrained optimization problems within a database.
Publications: [PVLDB 2016] [SIGMOD Record 2017] [VLDBJ 2017] [CACM 2018] [SIGMOD 2020] [PVLDB 2024]
award

Awards: ACM SIGMOD Research Highlight, CACM Research Highlight, Best demo in SIGMOD 2020, Best Papers of VLDB 2016

SQuID: Semantic-similarity-aware Query Intent Discovery [Project page]
Non-experts cannot easily peruse relational data, as traditional query interfaces allow data retrieval through well-structured queries. To write such queries, one needs expertise in the query language (typically SQL) and knowledge of the potentially complex database schema. Unfortunately, non-expert users typically lack both. SQuID infers query intent effectively by leveraging the data in the database to understand the context of the provided examples. SQuID's abduction-aware probabilistic model captures esoteric and complex semantic contexts, outperforming the state of the art.
Publications: [PVLDB 2019][SIGMOD 2018 demo]

Fairness and diversity

Data-driven software has the ability to shape human behavior: it affects the products we view and purchase, the news articles we read, the social interactions we engage in, and, ultimately, the opinions we form. Yet, data is an imperfect medium, tainted by errors, omissions, and biases. As a result, discrimination shows up in many data-driven applications, such as advertisements, hotel bookings, image search, and vendor services. Biases in data and software risk forming, propagating, and perpetuating biases in society. Data management research should develop tools to detect, inform, and mitigate the effects of bias, skew, and misuse in data-driven processes.

Fairness testing and mitigation [Project page]
Our work studied software fairness and discrimination and produced a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Our approach, Themis, is the first framework of its kind that automatically generates efficient test suites to measure discrimination. Our techniques rely on reasoning about causal relationships between inputs and outputs of a system. Understanding how inputs affect software behavior can empower developers to control for bias in data and ensure more fair use of software systems. Our work further contributes non-invasive strategies for mitigating bias in learned classification.
Publications: [ESEC/FSE 2017] [ESEC/FSE 2018 demo] [ESEC/FSE 2018 vision] [SIGMOD 2022] [ICDE 2023]
award

Awards: ACM SIGSOFT Distinguished Paper Award

Fast diverse data retrieval
Data skew is often a cause of algorithmic bias, and the ability to retrieve balanced, diverse datasets can mitigate the underlying problem. Diversification is one common way to present representative results to users, and it is employed by many real-world systems. However, providing diverse results for general range queries (i.e., queries that return a subset of the data based on filtering conditions) efficiently and scalably remains challenging. Our work introduced a general, index-based algorithm for diversifying the results of multi-dimensional range queries over a single relation. At a high level, our algorithm transforms each range query into a set of subordinate searches, performs these searches using a novel index structure, the RC-Index. We further developed algorithms with approximation guarantees for the problem of MaxMin diversification, under fairness constraints.
Publications: [ICDT 2022] [ICDT 2021] [PVLDB 2018]

Data quality

Data quality has long been a focus of data management research, but our data quality challenges have only grown. Data is produced at unprecedented rates, from sources that are broad, varied, and unreliable, and through large-scale processes that introduce their own inaccuracies (e.g., structured data extraction from unstructured text). Traditional data cleaning techniques identify discrepancies and purge datasets of errors, but they treat the manifestation of a problem, not its root cause. They disregard the fact that errors are often systemic, inherent to the process that produces the data, and thus will keep occurring unless the problems are corrected at their source. Our work offers crucial insights into data quality issues: instead of repairing the errors themselves, our research focuses on diagnosing the reasons for the errors and identifying repairs in the processes that produce the data.

Data X-Ray: Diagnosing errors in data systems
Data X-Ray is a diagnostic framework for profiling errors in data and determining systemic reasons for them in internet-scale knowledge extraction pipelines. This setting is challenging due to the large scale of the data, the prevalence of errors, and the complexity of the system.
Publications: [SIGMOD 2015] [PVLDB 2015 demo]

QFix: Diagnosing errors in relational logs
Relational databases are often dynamic, and even when data is cleaned, new errors can be introduced by applications and users who interact with the data. Subsequent valid updates can obscure these errors and propagate them through the dataset causing more discrepancies. Any discovered errors tend to be corrected superficially, on a case-by-case basis, further obscuring the true underlying cause, and making detection of the remaining errors harder. QFix derives explanations and repairs for discrepancies in relational data by analyzing the effects of queries that operated on the data and identifying potential mistakes in those queries.
Publications: [SIGMOD 2017] [SIGMOD 2016 demo]

Causality and explanations

Today's data is vast and often unreliable and the systems that process data are increasingly complex. Even simple transformations through database queries obscure the origins of data and the derivation of results. The goal of my research is to promote users' trust in data and systems through support for understanding and explanations. Explanations provide opportunities for systems to interact with humans and obtain feedback, improving their operation. Explanations also allow domain experts and system developers to understand system decisions and improve system function.

Causal analysis and explanations in data management [Project page (causality)]
Our research investigates techniques that help users understand the results of their queries by analyzing the history of data transformations (provenance). Unfortunately, using the provenance to explain query results is often impractical, as provenance information can grow very large even for simple transformations and modest-size datasets. Our work refines provenance information by analyzing the causal contributions of data to a result, and develops explanation frameworks for a variety of data-driven settings.
Publications (sample): [PVLDB 2019] [DE Bulletin 2018] [EDBT 2017] [PVLDB 2015] [PVLDB 2014 tutorial]

Funding sponsors

Publications

I am working to revamp my publications page. In the meantime, you can view my publications on DBLP, also loaded below.

Teaching

Spring 2023 COMPSCI 345: Practice and Applications of Data Management
Fall 2022 COMPSCI 345: Practice and Applications of Data Management
Spring 2020 COMPSCI 345: Practice and Applications of Data Management
Fall 2019 COMPSCI 345: Practice and Applications of Data Management
Fall 2019 COMPSCI H345: Practice and Applications of Data Management - Honors colloquium
Spring 2019 COMPSCI 345: Practice and Applications of Data Management
Fall 2018 COMPSCI 345: Practice and Applications of Data Management
Spring 2018 CMPSCI 645: Database Design and Implementation
Fall 2017 COMPSCI 345: Practice and Applications of Data Management
Fall 2017 COMPSCI H345: Practice and Applications of Data Management - Honors colloquium
Spring 2017 CMPSCI 645: Database Design and Implementation
Fall 2015 CMPSCI 345: Practice and Applications of Data Management
Spring 2015 CMPSCI 645: Database Design and Implementation
Fall 2014 CMPSCI 345: Practice and Applications of Data Management
Fall 2014 CMPSCI H345: Practice and Applications of Data Management - Honors section
Fall 2013 CMPSCI 390DB: Practice and Applications of Data Management
Fall 2013 CMPSCI H390DB: Practice and Applications of Data Management - Honors section
Spring 2013 CMPSCI 390DB: Practice and Applications of Data Management
Fall 2012 CMPSCI 645: Database Design and Implementation
Winter 2011 CSE444: Introduction to Database Systems (University of Washington)
Winter 2010 CSE590q: Positive and Negative Provenance in Database Systems (seminar, University of Washington)

Students

Current PhD students and postdocs:

Graduated PhD students and postdocs:

Matteo Brucato (PhD, 2021; co-advised with Peter Haas) Senior Researcher, Microsoft Research
Anna Fariha (PhD, 2021) Assistant Professor, University of Utah (previously, Researcher, Microsoft PROSE team)
Xiaolan Wang (PhD, 2018) Software Engineer, Meta (previously, Senior Research Scientist, Megagon Labs)
Yue Wang (PhD, 2017; co-advised with Gerome Miklau) Senior Research SDE, Microsoft Research.
Ke Yang (Postdoc) Assistant Professor, University of Texas San Antonio

Other alumni:

Maliha Islam (MS, Software Engineer, Microsoft)
Ravali Pochampally (MS, now at Google)
Rahul Ramakrishna (MS, Fall 2013-Spring 2014)
Kevin Fernandes (undergraduate, Spring 2014, now M.D Candidate at Albert Einstein College of Medicine)
Jeffrey Pezzone (undergraduate, Spring 2013, now at CISCO Systems)
Joseph Scherr (undergraduate, REU summer 2013)
Hridya Turlapati (undergraduate, Spring 2013)
Linda Yeboah (undergraduate, Summer 2014)

Bio

Short Bio

Alexandra Meliou is a Professor in the College of Information and Computer Sciences, at the University of Massachusetts Amherst. Prior to joining UMass, she was a Postdoctoral Research Associate at the University of Washington. Alexandra received her PhD degree from the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley. She has received recognitions for research, teaching, and service, including a CACM Research Highlight, an ACM SIGMOD Research Highlight Award, an ACM SIGSOFT Distinguished Paper Award, an NSF CAREER Award, a Google Faculty Research Award, multiple Distinguished Reviewer Awards, and a Lilly Fellowship for Teaching Excellence. Her research focuses on data provenance, causality, explanations, data quality, and algorithmic fairness.

Education

University of California, Berkeley

PhD in Computer Science (December 2009)
MS in Computer Science (December 2005)
Advisors: Joseph Hellerstein and Carlos Guestrin

National Technical University of Athens

BS in Electrical Engineering and Computer Science (June 2003)
Advisor: Timos Sellis

Service

Professional Service

Significant recent professional service roles

SIGMOD Executive Committee (2024-2025)
PVLDB Endowment Board of Trustees (2022-2027)
PVLDB Advisory Board
DBCares (chair)
Joint DB Task Force on Reviewing Processes (co-chair)
SIGMOD 2024 PC co-chair

Conference Organization and Editorial Service

Diversity and Inclusion Chair for ICDE 2023
Workshop Chair for VLDB 2021
Tutorials Chair for ICDE 2021
Associate editor for VLDBJ (2019-2025).
Associate editor for PVLDB 2020.
Associate editor for PVLDB 2019.
Associate editor for the IEEE Data Engineering Bulletin (2018-2019).
PC co-Chair for FairWare 2018.
PC co-Chair for ICDE PhD Symposium 2018.
PC co-Chair for WebDB 2017.
PC Track Chair for SIGMOD 2016.
New Researcher Symposium co-Chair for SIGMOD 2015.
Student Travel Award selection committee co-Chair for SIGMOD 2015.
New Researcher Symposium co-Chair for SIGMOD 2014.
Student Travel Award selection committee co-Chair for SIGMOD 2014.
Program Chair for TaPP 2013
Demonstration Chair for SSDBM 2013
Undergraduate Research Program co-Chair for SIGMOD 2013.

Program Committees

Proceedings of the VLDB Endowment (PVLDB) 2023.
Conference on Management of Data (SIGMOD) 2023.
Conference on Management of Data (SIGMOD) 2021 (Associate Editor).
Conference on Management of Data (SIGMOD) 2020.
Hellenic Database Management Symposium 2019.
Alberto Mendelzon International Workshop on Foundations of Data Management (AMW) 2019.
Conference on Management of Data (SIGMOD) 2019. (Core PC)
Proceedings of the VLDB Endowment (PVLDB) 2018.
Conference on Management of Data (SIGMOD) 2018.
Conference on Management of Data (SIGMOD) 2017.
Alberto Mendelzon International Workshop on Foundations of Data Management (AMW) 2017.
International Conference on Data Engineering (ICDE) 2016.
Conference on Management of Data (SIGMOD) 2015.
Proceedings of the VLDB Endowment (PVLDB) 2015.
VLDB PhD Workshop 2015.
International Conference on Scientific and Statistical Database Management (SSDBM) 2014, demo track.
Proceedings of the VLDB Endowment (PVLDB) 2014.
Conference on Management of Data (SIGMOD) 2014, demo track.
Conference on Very Large Databases (VLDB) 2013, demo track.
Conference on Management of Data (SIGMOD) 2013.
International Conference on Data Engineering (ICDE) 2013.
International Conference on Information and Knowledge Management (CIKM) 2012.
Conference on Management of Data (SIGMOD) 2012, demo track.
International Conference on Data Engineering (ICDE) 2012.
Workshop on the Theory and Practice of Provenance (TaPP) 2012.
Workshop on the Web and Databases (WebDB) 2012.
Proceedings of the VLDB Endowment (PVLDB) 2011.
Conference on Management of Data (SIGMOD) 2011.
Workshop on Management of Uncertain Data (MUD) 2011.