Barna Saha

I am an Assistant Professor in the College of Information & Computer Science at the University of Massachusetts Amherst. I am also a Permanent Member of Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) at Rutgers. Before joining UMass in 2014, I was a Research Scientist at AT&T Shannon Laboratories, New Jersey. I spent four wonderful years (2007-2011) at the University of Maryland College Park from where I received my Ph.D. in Computer Science.

Currently, I am spending a semester at the University of California Berkeley as a Visiting Scholar and as a fellow of the Simons Institute.

My research interests span

  • Algorithm Design and Analysis,
  • Large Scale Data Analytics,
  • Randomization in Computation

I particularly enjoy working on problems that are tied to core applications but have potentials to lead to beautiful theory. I primarily focus on designing fast algorithms for a variety of applications, or understanding fundamental limitations of why that is not possible. This includes problems in pattern matching, network and graph problems, data quality, crowd sourcing, resource allocation etc. You will find the recurring theme of randomization and approximation in all my works--though the reasons they are used could be different.

Contact: Email. [my first name]@cs.umass.edu, firstname.cs@gmail.com


  • Recent Awards.

  • Program Committee Member.

    • Guest Editor: Transactions on Algorithms SODA’16 Special Issue

    • SODA 2016
    • WSDM 2016
    • VLDB 2015
    • ECML/PKDE 2015 (Industrial Track)
    • CIKM 2015
    • APPROX 2014
    • CIKM 2014
    • Grace Hopper Celebration Technical Committee 2012.
  • Recent Invited Talks/Visits

    • Tutorial (not so recent): Data Quality: The other Face of Big Data, with Divesh Srivastava. ICDE, 2014.
  • Outreach Activities

    • Mentor and member of the NSF funded organization Computing Behind Double Bind to increase support for women students in STEM.
    • Panel Chair of the upcoming New England Celebration of Women in Computing organized by the ACM Council on Women in Computing.

    Selected Publications. See the complete list here. Also check here to see papers (partial list) classified by Topic

    Author names are in alphabetical order in most of my publications as per tradition in TCS. Link to My CV(Updated December, 2013)

  • Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems,
        Barna Saha.
        IEEE Symposium on Foundations of Computer Science (FOCS) 2015.
    Formal languages such as context-free grammars are used to model data formats/semantics in wide-array of applications. Any deviation from this model can be captured by language edit distance problem, which estimates the minimum number of changes in data to validate it according to the model. This works gives the first ever sub-cubic algorithm with near-optimal approximation guarantee for this problem, an improvement after forty years. More over, it also provides the first sub-cubic algorithm for stochastic context free grammar parsing. Stochastic context free grammars lie at the foundation of statistical natural language processing, they generalize hidden Markov models, and have found widespread applications. We also provide interesting connections between these problems and several fundamental graph problems.

  • The Dyck Language Edit Distance Problem in Near-linear Time,
        Barna Saha.
        IEEE Symposium on Foundations of Computer Science (FOCS) 2014. Journal version under review in JACM.
    Given a string of parentheses how many edits are required to make it well-balanced? This problem known as Dyck Language Edit Distance generalizes the well-studied string edit distance problem, and has found many applications in RNA secondary structure prediction (RNA Folding), compiler optimization and recently in data quality (see our VLDB'13 paper). This paper provides a near-linear time algorithm for Dyck language Edit distance (and for many other languages) improving over cubic run time of prior algorithms with nontrivial approximation guarantees.

  • New Constructive Aspects of the Lovasz Local Lemma,
        with Bernhard Haeupler and Aravind Srinivasan.
        IEEE Symposium on Foundations of Computer Science (FOCS) 2010. Journal version in JACM 2011.
    This paper settles an outstanding open question related to fair allocation of resources, known as the Santa Claus problem. More generally, it extends the scope of algorithmic Local Lemma to handle much broader set of application that were not within the reach of prior methods.

  • A Unified Approach to Ranking in Probabilistic Databases,
        with Jian Li and Amol Deshpande.
        35th International Conference on Very Large Data Bases (VLDB), 2009. BEST PAPER AWARD. Journal version in VLDB Journal 2011.
    This paper provides a general framework for ranking based on probability-generating functions that captures most prior definitions of ranking. Prior to our work, there were many proposals for ranking when data is uncertain with widely different behaviors.

    Recent Publications. See the complete list here.


    Teaching

    Students