As the Big Data era unfolds, there is a pressing demand for unified toolkits to handle scalable data analytics. My current works present one such generic framework via formal language
specifications that brings in many disparate problems under one umbrella and promises to provide massive speed-up
gain for big data analytics.
Formal languages such as context-free grammars are used to model data formats/semantics in wide-array of applications, e.g., human behavior analysis, network data modeling and event detection, structural phenomena found in biological sequences, speech processing, program syntax analysis, identifying topics in unstructured and semi-structured text etc. While, the expressive power of formal language modeling is well-understood, the grammar based computations may have very high time and space requirements. This severely limits the applicability of this method on large-scale data. Our goal is to overcome this barrier, and make grammar based computations accessible to its large number of applications.
Grammar based computations are closely related to fundamental graph problems and statistical inference algorithms. See the following publications for more details.