
Alexandra Meliou
Professor
Associate Chair for
Faculty Development
Robert and Donna Manning
College of Information
and Computer Sciences
140 Governors Drive
University of Massachusetts
Amherst, MA 01003-9264 USA
Email: | |
Office: | 330 |
Phone: | +1-413-545-3788 |
Fax: | +1-413-545-1249 |
Research
DREAM lab: Data-systems Research for Exploration, Analytics, and Mining
I co-direct the DREAM lab, with an awesome group of people! UMass is a vibrant place, and one of the top universities in the US for data management research!
My interests
Data is critical in almost every aspect of society, including education, technology, healthcare, economy, and science. Poor understanding and handling of data, data biases, poor data quality, and errors in data-driven processes are detrimental in all domains that rely on data. My research augments data management with user-facing functionality that helps people make sense of their data and use it effectively, at a time when data becomes increasingly unpredictable, unwieldy, and unmanageable. I focus on issues of provenance, causality, explanations, data quality, usability, and data and algorithmic bias.
Usability and analysis
As data is now a staple in so many aspects of human activity, the audience for data technologies has expanded to include a varied range of users: from non-experts wishing to peruse datasets, to domain experts with specialized data processing needs. Data systems have not adapted to address these demands effectively: databases' specialized query languages and structure create barriers for non-experts, while the lack of native support for important computing needs leaves experts to develop application-specific solutions themselves. Our work removes data-use barriers by simplifying access for non-experts to data and by augmenting database functionality with advanced problem-solving capabilities, thus simplifying analytics workflows by moving them closer to the data.
![]() |
PackageBuilder: supporting queries for packages [Project page]
Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples—which we call a package—to satisfy constraints collectively, rather than individually. We developed an end-to-end system that supports package queries, allowing the declarative specification and efficient evaluation of a significant class of constrained optimization problems within a database. Publications: [PVLDB 2016] [SIGMOD Record 2017] [VLDBJ 2017] [CACM 2018] [SIGMOD 2020] [PVLDB 2024] ![]() |
![]() |
SQuID: Semantic-similarity-aware Query Intent Discovery [Project page]
Non-experts cannot easily peruse relational data, as traditional query interfaces allow data retrieval through well-structured queries. To write such queries, one needs expertise in the query language (typically SQL) and knowledge of the potentially complex database schema. Unfortunately, non-expert users typically lack both. SQuID infers query intent effectively by leveraging the data in the database to understand the context of the provided examples. SQuID's abduction-aware probabilistic model captures esoteric and complex semantic contexts, outperforming the state of the art. Publications: [PVLDB 2019][SIGMOD 2018 demo] |
Fairness and diversity
Data-driven software has the ability to shape human behavior: it affects the products we view and purchase, the news articles we read, the social interactions we engage in, and, ultimately, the opinions we form. Yet, data is an imperfect medium, tainted by errors, omissions, and biases. As a result, discrimination shows up in many data-driven applications, such as advertisements, hotel bookings, image search, and vendor services. Biases in data and software risk forming, propagating, and perpetuating biases in society. Data management research should develop tools to detect, inform, and mitigate the effects of bias, skew, and misuse in data-driven processes.
![]() |
Fairness testing and mitigation [Project page]
Our work studied software fairness and discrimination and produced a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Our approach, Themis, is the first framework of its kind that automatically generates efficient test suites to measure discrimination. Our techniques rely on reasoning about causal relationships between inputs and outputs of a system. Understanding how inputs affect software behavior can empower developers to control for bias in data and ensure more fair use of software systems. Our work further contributes non-invasive strategies for mitigating bias in learned classification. Publications: [ESEC/FSE 2017] [ESEC/FSE 2018 demo] [ESEC/FSE 2018 vision] [SIGMOD 2022] [ICDE 2023] ![]() |
![]() |
Fast diverse data retrieval
Data skew is often a cause of algorithmic bias, and the ability to retrieve balanced, diverse datasets can mitigate the underlying problem. Diversification is one common way to present representative results to users, and it is employed by many real-world systems. However, providing diverse results for general range queries (i.e., queries that return a subset of the data based on filtering conditions) efficiently and scalably remains challenging. Our work introduced a general, index-based algorithm for diversifying the results of multi-dimensional range queries over a single relation. At a high level, our algorithm transforms each range query into a set of subordinate searches, performs these searches using a novel index structure, the RC-Index. We further developed algorithms with approximation guarantees for the problem of MaxMin diversification, under fairness constraints. Publications: [ICDT 2022] [ICDT 2021] [PVLDB 2018] |
Data quality
Data quality has long been a focus of data management research, but our data quality challenges have only grown. Data is produced at unprecedented rates, from sources that are broad, varied, and unreliable, and through large-scale processes that introduce their own inaccuracies (e.g., structured data extraction from unstructured text). Traditional data cleaning techniques identify discrepancies and purge datasets of errors, but they treat the manifestation of a problem, not its root cause. They disregard the fact that errors are often systemic, inherent to the process that produces the data, and thus will keep occurring unless the problems are corrected at their source. Our work offers crucial insights into data quality issues: instead of repairing the errors themselves, our research focuses on diagnosing the reasons for the errors and identifying repairs in the processes that produce the data.
![]() |
Data X-Ray: Diagnosing errors in data systems Data X-Ray is a diagnostic framework for profiling errors in data and determining systemic reasons for them in internet-scale knowledge extraction pipelines. This setting is challenging due to the large scale of the data, the prevalence of errors, and the complexity of the system. Publications: [SIGMOD 2015] [PVLDB 2015 demo] |
![]() |
QFix: Diagnosing errors in relational logs Relational databases are often dynamic, and even when data is cleaned, new errors can be introduced by applications and users who interact with the data. Subsequent valid updates can obscure these errors and propagate them through the dataset causing more discrepancies. Any discovered errors tend to be corrected superficially, on a case-by-case basis, further obscuring the true underlying cause, and making detection of the remaining errors harder. QFix derives explanations and repairs for discrepancies in relational data by analyzing the effects of queries that operated on the data and identifying potential mistakes in those queries. Publications: [SIGMOD 2017] [SIGMOD 2016 demo] |
Causality and explanations
Today's data is vast and often unreliable and the systems that process data are increasingly complex. Even simple transformations through database queries obscure the origins of data and the derivation of results. The goal of my research is to promote users' trust in data and systems through support for understanding and explanations. Explanations provide opportunities for systems to interact with humans and obtain feedback, improving their operation. Explanations also allow domain experts and system developers to understand system decisions and improve system function.
![]() |
Causal analysis and explanations in data management [Project page (causality)]
Our research investigates techniques that help users understand the results of their queries by analyzing the history of data transformations (provenance). Unfortunately, using the provenance to explain query results is often impractical, as provenance information can grow very large even for simple transformations and modest-size datasets. Our work refines provenance information by analyzing the causal contributions of data to a result, and develops explanation frameworks for a variety of data-driven settings. Publications (sample): [PVLDB 2019] [DE Bulletin 2018] [EDBT 2017] [PVLDB 2015] [PVLDB 2014 tutorial] |
Publications
Teaching
- Spring 2023 COMPSCI 345: Practice and Applications of Data Management
- Fall 2022 COMPSCI 345: Practice and Applications of Data Management
- Spring 2020 COMPSCI 345: Practice and Applications of Data Management
- Fall 2019 COMPSCI 345: Practice and Applications of Data Management
- Fall 2019 COMPSCI H345: Practice and Applications of Data Management - Honors colloquium
- Spring 2019 COMPSCI 345: Practice and Applications of Data Management
- Fall 2018 COMPSCI 345: Practice and Applications of Data Management
- Spring 2018 CMPSCI 645: Database Design and Implementation
- Fall 2017 COMPSCI 345: Practice and Applications of Data Management
- Fall 2017 COMPSCI H345: Practice and Applications of Data Management - Honors colloquium
- Spring 2017 CMPSCI 645: Database Design and Implementation
- Fall 2015 CMPSCI 345: Practice and Applications of Data Management
- Spring 2015 CMPSCI 645: Database Design and Implementation
- Fall 2014 CMPSCI 345: Practice and Applications of Data Management
- Fall 2014 CMPSCI H345: Practice and Applications of Data Management - Honors section
- Fall 2013 CMPSCI 390DB: Practice and Applications of Data Management
- Fall 2013 CMPSCI H390DB: Practice and Applications of Data Management - Honors section
- Spring 2013 CMPSCI 390DB: Practice and Applications of Data Management
- Fall 2012 CMPSCI 645: Database Design and Implementation
- Winter 2011 CSE444: Introduction to Database Systems (University of Washington)
- Winter 2010 CSE590q: Positive and Negative Provenance in Database Systems (seminar, University of Washington)
Current PhD students and postdocs:
- Yanqi Chen (PhD)
- Riddho Ridwanul Haque (PhD)
- Iro Moumoulidou (PhD)
- Vasilis Vittis (PhD)
Graduated PhD students and postdocs:
- Matteo Brucato (PhD, 2021; co-advised with Peter Haas) Senior Researcher, Microsoft Research
- Anna Fariha (PhD, 2021) Assistant Professor, University of Utah (previously, Researcher, Microsoft PROSE team)
- Xiaolan Wang (PhD, 2018) Software Engineer, Meta (previously, Senior Research Scientist, Megagon Labs)
- Yue Wang (PhD, 2017; co-advised with Gerome Miklau) Senior Research SDE, Microsoft Research.
- Ke Yang (Postdoc) Assistant Professor, University of Texas San Antonio
Other alumni:
- Maliha Islam (MS, Software Engineer, Microsoft)
- Ravali Pochampally (MS, now at Google)
- Rahul Ramakrishna (MS, Fall 2013-Spring 2014)
- Kevin Fernandes (undergraduate, Spring 2014, now M.D Candidate at Albert Einstein College of Medicine)
- Jeffrey Pezzone (undergraduate, Spring 2013, now at CISCO Systems)
- Joseph Scherr (undergraduate, REU summer 2013)
- Hridya Turlapati (undergraduate, Spring 2013)
- Linda Yeboah (undergraduate, Summer 2014)
Bio
Alexandra Meliou is a Professor in the College of Information and Computer Sciences, at the University of Massachusetts Amherst. Prior to joining UMass, she was a Postdoctoral Research Associate at the University of Washington. Alexandra received her PhD degree from the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley. She has received recognitions for research, teaching, and service, including a CACM Research Highlight, an ACM SIGMOD Research Highlight Award, an ACM SIGSOFT Distinguished Paper Award, an NSF CAREER Award, a Google Faculty Research Award, multiple Distinguished Reviewer Awards, and a Lilly Fellowship for Teaching Excellence. Her research focuses on data provenance, causality, explanations, data quality, and algorithmic fairness.
University of California, Berkeley
PhD in Computer Science (December 2009)
MS in Computer Science (December 2005)
Advisors: Joseph Hellerstein and Carlos Guestrin
National Technical University of Athens
BS in Electrical Engineering and Computer Science (June 2003)
Advisor: Timos Sellis
Service
Significant recent professional service roles
- SIGMOD Executive Committee (2024-2025)
- PVLDB Endowment Board of Trustees (2022-2027)
- PVLDB Advisory Board
- DBCares (chair)
- Joint DB Task Force on Reviewing Processes (co-chair)
- SIGMOD 2024 PC co-chair
Conference Organization and Editorial Service
- Diversity and Inclusion Chair for ICDE 2023
- Workshop Chair for VLDB 2021
- Tutorials Chair for ICDE 2021
- Associate editor for VLDBJ (2019-2025).
- Associate editor for PVLDB 2020.
- Associate editor for PVLDB 2019.
- Associate editor for the IEEE Data Engineering Bulletin (2018-2019).
- PC co-Chair for FairWare 2018.
- PC co-Chair for ICDE PhD Symposium 2018.
- PC co-Chair for WebDB 2017.
- PC Track Chair for SIGMOD 2016.
- New Researcher Symposium co-Chair for SIGMOD 2015.
- Student Travel Award selection committee co-Chair for SIGMOD 2015.
- New Researcher Symposium co-Chair for SIGMOD 2014.
- Student Travel Award selection committee co-Chair for SIGMOD 2014.
- Program Chair for TaPP 2013
- Demonstration Chair for SSDBM 2013
- Undergraduate Research Program co-Chair for SIGMOD 2013.
Program Committees
- Proceedings of the VLDB Endowment (PVLDB) 2023.
- Conference on Management of Data (SIGMOD) 2023.
- Conference on Management of Data (SIGMOD) 2021 (Associate Editor).
- Conference on Management of Data (SIGMOD) 2020.
- Hellenic Database Management Symposium 2019.
- Alberto Mendelzon International Workshop on Foundations of Data Management (AMW) 2019.
- Conference on Management of Data (SIGMOD) 2019. (Core PC)
- Proceedings of the VLDB Endowment (PVLDB) 2018.
- Conference on Management of Data (SIGMOD) 2018.
- Conference on Management of Data (SIGMOD) 2017.
- Alberto Mendelzon International Workshop on Foundations of Data Management (AMW) 2017.
- International Conference on Data Engineering (ICDE) 2016.
- Conference on Management of Data (SIGMOD) 2015.
- Proceedings of the VLDB Endowment (PVLDB) 2015.
- VLDB PhD Workshop 2015.
- International Conference on Scientific and Statistical Database Management (SSDBM) 2014, demo track.
- Proceedings of the VLDB Endowment (PVLDB) 2014.
- Conference on Management of Data (SIGMOD) 2014, demo track.
- Conference on Very Large Databases (VLDB) 2013, demo track.
- Conference on Management of Data (SIGMOD) 2013.
- International Conference on Data Engineering (ICDE) 2013.
- International Conference on Information and Knowledge Management (CIKM) 2012.
- Conference on Management of Data (SIGMOD) 2012, demo track.
- International Conference on Data Engineering (ICDE) 2012.
- Workshop on the Theory and Practice of Provenance (TaPP) 2012.
- Workshop on the Web and Databases (WebDB) 2012.
- Proceedings of the VLDB Endowment (PVLDB) 2011.
- Conference on Management of Data (SIGMOD) 2011.
- Workshop on Management of Uncertain Data (MUD) 2011.
Other Service
- Core member of D&I in DB
- ACM Publications Board Task Force on Improving Peer-Reviewer Incentives.
- ACM SIGMOD Jim Gray Dissertation Award (2019 and 2020).
Personal
My sister is a doctor of internal medicine, specializing in infectious diseases.