ÿþ<html>
<head>
<title>Barna Saha: Assistant Professor, University of Massachusetts, Amherst</title>
<script language="javascript">
location.replace("http://barnasaha.net");
</script>
<meta http-equiv="refresh" content= 0; URL=http://barnasaha.net">
</head>
<h1 align=left style='text-align:left'><span style='mso-fareast-font-family: "Times New Roman"'><span class=SpellE>  Barna</span> <span class=SpellE>Saha</span><o:p></o:p></span></h1>
<table width="100%" border="0" bgcolor="#FFFFFF" cellpadding="15" cellspacing="0">
<tr>
<td width="70%" height="4" valign="top" colspan="2" border="0">
I am an Assistant Professor in the <a href="https://www.cs.umass.edu/">
College of Information & Computer Science</a> at the University of Massachusetts Amherst. I am also a Permanent Member of <a href="http://dimacs.rutgers.edu/">
Center for Discrete Mathematics and Theoretical Computer Science (DIMACS)</a> at Rutgers.
Before joining UMass in 2014, I was a Research Scientist at
<a href="http://www.research.att.com/editions/201106_home.html">
AT&T Shannon Laboratories, New Jersey</a>. I spent four wonderful years (2007-2011) at the University of Maryland College Park from where I received my Ph.D. in Computer Science.
<p> <font color="#990000">Currently, I am spending a semester at the University of California Berkeley as a Visiting Scholar and as a fellow of the Simons Institute.</font>
<p> My research interests span
<ul>
<li> Algorithm Design and Analysis,
<li> Large Scale Data Analytics,
<li> Randomization in Computation
</ul>
<p><ul style="list-style-image:url('https://people.cs.umass.edu/~barna/sqpurple.gif')">
<li><b>Check out a list of my <a href="https://people.cs.umass.edu/~barna/projects.html"> ACTIVE PROJECTS</a> to know more.</b> </li></ul>
I particularly enjoy working on problems that are tied to core applications but have potentials to lead to beautiful theory.
I primarily focus on designing fast algorithms for a variety of applications, or understanding fundamental limitations of why that is not possible. This includes problems in pattern matching, network and graph problems, data quality, crowd sourcing, resource allocation etc. You will find the recurring theme of
randomization and approximation in all my works--though the reasons they are used could be different.
<p> Contact: Email. [my first name]@cs.umass.edu, firstname.cs@gmail.com </p>
</td>
<td align="right" valign="top">
<font color="#990000" face="Arial" size="3">
<img border="1" src="pic1.jpg" width="280" height="280" hspace="12" vspace="12">
</td>
</tr>
</table>
<hr>
<table width="100%" border="0" bgcolor="#FFFFFF" cellpadding="15" cellspacing="0">
<tr>
<td width= 40%" height="4" valign="top" colspan="2" border="0">
<li style="list-style-image:url('https://people.cs.umass.edu/~barna/sqpurple.gif')">
<b> <font color="#990000" face="Arial" size= 1 ><h3>Recent Awards.</h3></font></b></li>
<ul><li><a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=1464310"><font color=#ff0000>NSF CRII Award</font></a>, 2015.
<li><a href="http://simons.berkeley.edu/"><font color=#ff0000>Simons Research Fellowship</font></a>, 2015.
<li> <font color=#ff0000>Yahoo! Academic Career Enhancement (ACE) Award</font>, 2015.
</ul>
<li style="list-style-image:url('https://people.cs.umass.edu/~barna/sqpurple.gif')">
<b><font color="#990000" face="Arial" size= 1 ><h3>Program Committee Member.</h3></font></b></li><ul>
<li> Guest Editor: <b>Transactions on Algorithms</b> SODA 16 Special Issue
<p>
<li> <b> SODA 2016</b>
<li> <b> WSDM 2016</b>
<li> <b>VLDB 2015</b>
<li> <b> ECML/PKDE 2015 (Industrial Track)</b>
<li> <b> CIKM 2015</b>
<li> <b> APPROX 2014</b>
<li> <b> CIKM 2014</b>
<li> <b>Grace Hopper Celebration Technical Committee 2012</b>.
</ul>
<td align= left valign="top">
<li style="list-style-image:url('https://people.cs.umass.edu/~barna/sqpurple.gif')"> <b> <font color="#990000" face="Arial" size="3"><h3>Recent Invited Talks/Visits </h3></font></b></li>
<ul>
<li><a href="https://simons.berkeley.edu/workshops/complexity2015-3">Computational Complexity of Low-Polynomial Time Problems</a> Simons Berkley Workshop (Dec 2015)</li>
<li> Computer Science Colloquium, USC (Nov 2015)
<li> Computer Science Colloquium, UCSB (Nov 2015)
<li> Algorithms and Complexity Seminar, MIT (Oct 2015)
<li><a href="https://www.math.mcgill.ca/bshepherd/Bellairs2015.html">Bellairs Workshop on Combinatorial Optimization</a> (April 2015)
<li> <a href="https://sites.google.com/site/awmmath/home/awm-research-symposium-2015">Association for Women in Mathematics</a> (April 2015)
<li>Google Mountain View (Feb 2015)
</ul>
<ul>
<li> <b><font color= blue >Tutorial</font></b> (not so recent): <b>Data Quality: The other Face of Big Data</b>, with Divesh Srivastava. ICDE, 2014.
</ul>
<li style="list-style-image:url('https://people.cs.umass.edu/~barna/sqpurple.gif')"> <b> <font color="#990000" face="Arial" size="3"><h3>Outreach Activities</h3></font></b></li>
<ul>
<li> Mentor and member of the NSF funded organization<b> Computing Behind Double Bind</b> to increase support for women students in STEM.
<li> Panel Chair of the upcoming <b> New England Celebration of Women in Computing </b> organized by the ACM Council on Women in Computing.
</ul>
</td>
</tr>
</table>
<hr>
<h3> Selected Publications.
See the complete list <a href="https://people.cs.umass.edu/~barna/publication.html">here.</a> Also check <a href="https://people.cs.umass.edu/~barna/projects.html"> here</a>
to see papers (partial list) classified by Topic</h3>
Author names are in alphabetical order in most of my publications as per tradition in TCS. <a href="CV.pdf"><font color="#000000" size="3"><del>Link to My CV</del></font></a>(Updated December, 2013)
<br></br>
<li><a href="http://arxiv.org/abs/1411.7315"><b>Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems,</b></a>
<br>     Barna Saha.
<br>     IEEE Symposium on Foundations of Computer Science (<b>FOCS</b>) 2015.
<br> Formal languages such as context-free grammars are used to model data formats/semantics in wide-array of applications. Any deviation
from this model can be captured by language edit distance problem, which estimates the minimum number of changes in data to validate it
according to the model. This works gives the first ever sub-cubic algorithm with near-optimal approximation guarantee for this problem, an improvement
after forty years. More
over, it also provides the first sub-cubic algorithm for stochastic context free grammar parsing. Stochastic context free grammars
lie at the foundation of statistical natural language processing, they generalize hidden Markov models, and have found widespread applications.
We also provide interesting connections between these problems and several fundamental graph problems.</br>
<br>
<li><a href="https://people.cs.umass.edu/~barna/paper/focs-full.pdf"><b>The Dyck Language Edit Distance Problem in Near-linear Time</b></a>,
<br>     Barna Saha.
<br>     IEEE Symposium on Foundations of Computer Science (<b>FOCS</b>) 2014. Journal version under review in <b>JACM</b>.
<br> Given a string of parentheses how many edits are required to make it well-balanced? This problem known as Dyck Language Edit
Distance generalizes the well-studied string edit distance problem, and has found many applications in RNA secondary structure
prediction (RNA Folding), compiler optimization and recently in data quality (see our <b>VLDB</b>'13 paper). This paper provides a near-linear time
algorithm for Dyck language Edit distance (and for many other languages) improving over cubic run time of prior algorithms with
nontrivial approximation guarantees.</br>
<br>
<li> <p style="margin-top: 0pt;"><a href="https://people.cs.umass.edu/~barna/paper/focs10.pdf"><b> New Constructive Aspects of the Lovasz Local Lemma</b></a>,
<br>     with Bernhard Haeupler and Aravind Srinivasan.
<br>     IEEE Symposium on Foundations of Computer Science (<b>FOCS</b>) 2010. Journal version in <b>JACM</b> 2011.
<br> This paper settles an outstanding open question related to fair allocation of resources, known as the Santa Claus problem. More
generally, it extends the scope of algorithmic Local Lemma to handle much broader set of application that were not within the reach
of prior methods. </br>
<li><p style="margin-top: 0pt;"><a href="https://people.cs.umass.edu/~barna/paper/ranking-vldb.pdf"><b>A Unified Approach to Ranking in Probabilistic Databases</b></a>,
<br>     with Jian Li and Amol Deshpande.
<br>    35th International Conference on Very Large Data Bases (<b>VLDB</b>), 2009. <font color="#ff0000"><b>BEST PAPER AWARD</b></font>. Journal version in <b>VLDB Journal</b> 2011.
<br> This paper provides a general framework for ranking based on probability-generating functions that captures most prior definitions
of ranking. Prior to our work, there were many proposals for ranking when data is uncertain with widely different behaviors.
<h3>Recent Publications.
See the complete list <a href="https://people.cs.umass.edu/~barna/publication.html">here.</a></font></h3>
<ul>
<li><a href=""><b>TreeScope: Finding Structural Anomalies In Semi-Structured Data,</b></a> with Shanshan Ying, Flip Korn, and Divesh Srivastava. 41st International Conference on Very Large Data Bases (<b>VLDB</b>), 2015, (Demo).
<li><a href="http://arxiv.org/abs/1411.7315"><b>Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems,</b></a> Barna Saha, 56th IEEE Symposium on Foundations of Computer Science (<b>FOCS</b>) 2015.
<li><a href="https://people.cs.umass.edu/~barna/paper/ICDE15_research_131.pdf"><b>Size-constrained Weighted Set Cover,</b></a> with Lukasz Golab, Flip Korn, Feng Li and Divesh Srivastava. 31st IEEE International Conference on Data Engineering (<b>ICDE</b>), 2015.
<li><a href="https://people.cs.umass.edu/~barna/paper/focs-full.pdf"><b>The Dyck Language Edit Distance Problem in Near-linear Time</b></a>,
Barna Saha. 55th IEEE Symposium on Foundations of Computer Science (<b>FOCS</b>) 2014.
<li><a href="http://arxiv.org/abs/1307.2808"><b>A Constant Factor Approximation Algorithm for Fault Tolerant k-Median Problem</b></a>,
with Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li and Shi Li. ACM-SIAM Symposium on Discrete Algorithms (<b>SODA</b>), 2014.
<li><a href="https://people.cs.umass.edu/~barna/paper/gp.pdf"><b>Hierarchical Graph Partitioning</b></a>,
with Mohammadtaghi Hajiaghayi, Theodore Johnson and Reza Khani. 26th ACM Symposium on Parallelism in Algorithms and Architectures (<b>SPAA</b>) 2014.
<li><a href="https://people.cs.umass.edu/~barna/paper/ICDE-Tutorial-DQ.pdf"><b>Data Quality: The other Face of Big Data</b></a>,
with Divesh Srivastava. Tutorial, 30th IEEE International Conference on Data Engineering (<b>ICDE</b>), 2014.
<li><a href="http://arxiv.org/abs/1312.0285"><b>Distributed Data Placement to Minimize Communication Costs via Graph Partitioning</b></a>,
with Lukasz Golab, Marios Hadjieleftheriou, and Howard Karloff. 26th International Conference on Scientific and Statistical Database Management (<b>SSDBM</b>), 2014.
</ul>
<hr>
<h3>Teaching </h3>
<ul>
Algorithms for Data Science, Spring 2016, UMass Amherst. (This is a new course for the recently started Masters program on Data Science.)
<br> Reasoning with Uncertainty, Spring 2016, UMass Amherst (Jointly teaching with Andrew McGregor)
<br><a href="https://people.cs.umass.edu/~barna/CMPSCI691DA.html"><b>Big Data Algorithms & Applications</b></a>, Spring 2015, UMass Amherst
<br><a href="https://people.cs.umass.edu/~barna/CMPSCI690AA.html"><b>Approximation Algorithms & Combinatorial Optimization</b></a>, Spring 2015, UMass Amherst
<br><a href="http://csci8980bigdataalgo.wordpress.com/"><b>Algorithmic Techniques for Big Data Analysis</b></a>, Fall 2013, University of Minnesota</p>
</ul>
<h3>Students</h3>
<ul>
<br>My Phan (Ph.D., UMass Amherst)
<br>Brendan Teich (Undergraduate, UMass Amherst)
<br>Vivek Krishnamurthy (Undergraduate, UMass Amherst)
<br><a href="http://www.cs.umd.edu/~manishp/">Manish Purohit</a> (Summer 2013, Ph.D. student, University of Maryland College Park),
<br><a href="http://www.comp.nus.edu.sg/~shanshan/">Shanshan Ying</a> (Summer 2012, Ph.D. student, National University of Singapore),
<br><a href="http://www.seas.upenn.edu/~kookjin/">Kook Jin Ahn <a/> (Summer 2012, Ph.D., University of Pennsylvania, Currently at Google),
<br><a href="http://www.dis.uniroma1.it/~firmani/component/content/">Donatella Firmani </a>,(Summer 2012, Ph.D., Sapienza University of Rome, Currently Postdoc at University of Rome Tor Vergata).
<br>Harmeet Jandu (Fall 2012, Masters Student, Rutgers University)
</ul>
</body>
</html>