I am an Assistant Professor in the
College of Information & Computer Science at the University of Massachusetts Amherst. I am also a Permanent Member of
Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) at Rutgers.
Before joining UMass in 2014, I was a Research Scientist at
AT&T Shannon Laboratories, New Jersey. I spent four wonderful years (2007-2011) at the University of Maryland College Park from where I received my Ph.D. in Computer Science.
Currently, I am spending a semester at the University of California Berkeley as a Visiting Scholar and as a fellow of the Simons Institute.
My research interests span
I particularly enjoy working on problems that are tied to core applications but have potentials to lead to beautiful theory.
I primarily focus on designing fast algorithms for a variety of applications, or understanding fundamental limitations of why that is not possible. This includes problems in pattern matching, network and graph problems, data quality, crowd sourcing, resource allocation etc. You will find the recurring theme of
randomization and approximation in all my works--though the reasons they are used could be different.
- Algorithm Design and Analysis,
- Large Scale Data Analytics,
- Randomization in Computation
Contact: Email. [my first name]@cs.umass.edu, email@example.com
Program Committee Member.
- Guest Editor: Transactions on Algorithms SODA’16 Special Issue
- SODA 2016
- WSDM 2016
- VLDB 2015
- ECML/PKDE 2015 (Industrial Track)
- CIKM 2015
- APPROX 2014
- CIKM 2014
- Grace Hopper Celebration Technical Committee 2012.
Recent Invited Talks/Visits
- Tutorial (not so recent): Data Quality: The other Face of Big Data, with Divesh Srivastava. ICDE, 2014.
- Mentor and member of the NSF funded organization Computing Behind Double Bind to increase support for women students in STEM.
- Panel Chair of the upcoming New England Celebration of Women in Computing organized by the ACM Council on Women in Computing.
See the complete list here. Also check here
to see papers (partial list) classified by Topic
Author names are in alphabetical order in most of my publications as per tradition in TCS.
Link to My CV(Updated December, 2013)
Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems,
IEEE Symposium on Foundations of Computer Science (FOCS) 2015.
Formal languages such as context-free grammars are used to model data formats/semantics in wide-array of applications. Any deviation
from this model can be captured by language edit distance problem, which estimates the minimum number of changes in data to validate it
according to the model. This works gives the first ever sub-cubic algorithm with near-optimal approximation guarantee for this problem, an improvement
after forty years. More
over, it also provides the first sub-cubic algorithm for stochastic context free grammar parsing. Stochastic context free grammars
lie at the foundation of statistical natural language processing, they generalize hidden Markov models, and have found widespread applications.
We also provide interesting connections between these problems and several fundamental graph problems.
The Dyck Language Edit Distance Problem in Near-linear Time,
IEEE Symposium on Foundations of Computer Science (FOCS) 2014. Journal version under review in JACM.
Given a string of parentheses how many edits are required to make it well-balanced? This problem known as Dyck Language Edit
Distance generalizes the well-studied string edit distance problem, and has found many applications in RNA secondary structure
prediction (RNA Folding), compiler optimization and recently in data quality (see our VLDB'13 paper). This paper provides a near-linear time
algorithm for Dyck language Edit distance (and for many other languages) improving over cubic run time of prior algorithms with
nontrivial approximation guarantees.
New Constructive Aspects of the Lovasz Local Lemma,
with Bernhard Haeupler and Aravind Srinivasan.
IEEE Symposium on Foundations of Computer Science (FOCS) 2010. Journal version in JACM 2011.
This paper settles an outstanding open question related to fair allocation of resources, known as the Santa Claus problem. More
generally, it extends the scope of algorithmic Local Lemma to handle much broader set of application that were not within the reach
of prior methods.
A Unified Approach to Ranking in Probabilistic Databases,
with Jian Li and Amol Deshpande.
35th International Conference on Very Large Data Bases (VLDB), 2009. BEST PAPER AWARD. Journal version in VLDB Journal 2011.
This paper provides a general framework for ranking based on probability-generating functions that captures most prior definitions
of ranking. Prior to our work, there were many proposals for ranking when data is uncertain with widely different behaviors.
See the complete list here.
- TreeScope: Finding Structural Anomalies In Semi-Structured Data, with Shanshan Ying, Flip Korn, and Divesh Srivastava. 41st International Conference on Very Large Data Bases (VLDB), 2015, (Demo).
- Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems, Barna Saha, 56th IEEE Symposium on Foundations of Computer Science (FOCS) 2015.
- Size-constrained Weighted Set Cover, with Lukasz Golab, Flip Korn, Feng Li and Divesh Srivastava. 31st IEEE International Conference on Data Engineering (ICDE), 2015.
- The Dyck Language Edit Distance Problem in Near-linear Time,
Barna Saha. 55th IEEE Symposium on Foundations of Computer Science (FOCS) 2014.
- A Constant Factor Approximation Algorithm for Fault Tolerant k-Median Problem,
with Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li and Shi Li. ACM-SIAM Symposium on Discrete Algorithms (SODA), 2014.
- Hierarchical Graph Partitioning,
with Mohammadtaghi Hajiaghayi, Theodore Johnson and Reza Khani. 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) 2014.
- Data Quality: The other Face of Big Data,
with Divesh Srivastava. Tutorial, 30th IEEE International Conference on Data Engineering (ICDE), 2014.
- Distributed Data Placement to Minimize Communication Costs via Graph Partitioning,
with Lukasz Golab, Marios Hadjieleftheriou, and Howard Karloff. 26th International Conference on Scientific and Statistical Database Management (SSDBM), 2014.
Algorithms for Data Science, Spring 2016, UMass Amherst. (This is a new course for the recently started Masters program on Data Science.)
Reasoning with Uncertainty, Spring 2016, UMass Amherst (Jointly teaching with Andrew McGregor)
Big Data Algorithms & Applications, Spring 2015, UMass Amherst
Approximation Algorithms & Combinatorial Optimization, Spring 2015, UMass Amherst
Algorithmic Techniques for Big Data Analysis, Fall 2013, University of Minnesota
My Phan (Ph.D., UMass Amherst)
Brendan Teich (Undergraduate, UMass Amherst)
Vivek Krishnamurthy (Undergraduate, UMass Amherst)
Manish Purohit (Summer 2013, Ph.D. student, University of Maryland College Park),
Shanshan Ying (Summer 2012, Ph.D. student, National University of Singapore),
Kook Jin Ahn (Summer 2012, Ph.D., University of Pennsylvania, Currently at Google),
Donatella Firmani ,(Summer 2012, Ph.D., Sapienza University of Rome, Currently Postdoc at University of Rome Tor Vergata).
Harmeet Jandu (Fall 2012, Masters Student, Rutgers University)