STOC Workshop: Algorithms for Distributed and Streaming Data

We traditionally think of algorithms as running on data available in a single location, typically in main memory or at least on disk. However, in many modern applications, the data is too large to reside in a single location (terabyte and petabyte sized datasets are increasingly common), is arriving incrementally over time, is noisy and uncertain, or all of the above. Processing such data requires new algorithms and new models of computation. In recent years, practitioners have turned to MapReduce-based systems, such as Hadoop, for large data analysis, data stream analysis systems such as AT&T's Gigascope for making sense of fast arriving data, and systems such as Storm and S4 for real time distributed computation on streaming data.

These practical developments represent huge opportunities to the theory community: what are the appropriate computational abstractions for these systems and how should we go about designing algorithms that are efficient in these models? What are the opportunities for industrial impact? What should we teach our undergraduate and graduate students. Our goals in this workshop are to a) survey the basic models that have been been proposed, b) present representative algorithmic results and c) highlight open problems and new directions of research.

Where and When:

Date: 19 May, 2012

Location: Room 101 in Warren Weaver Hall, 251 Mercer St, New York University

More: See here for further details and information about other STOC tutorials and workshops.

Schedule:

Time Speaker Title Slides

1:30-2:30 Sergei Vassilvitskii, Google Distributed and Parallel Models (Survey) Slides

2:30-3:30 Andrew McGregor, UMass Amherst Data Streams and Linear Sketches (Survey) Slides

3:30-4:00 Coffee Break

4:00-4:40 John Langford, Microsoft Research Special Topics: Fun Machine Learning Problems on Big Data Slides

4:40-5:20 Piotr Indyk, MIT Special Topics: CS on CS: Computer Science Insights into Compressive Sensing (and vice versa) Slides

5:20-6:00 Ashish Goel, Stanford and Twitter Special Topics: Challenges in Industry and Education Slides

Time	Speaker	Title	Slides
1:30-2:30	Sergei Vassilvitskii, Google	Distributed and Parallel Models (Survey)	Slides
2:30-3:30	Andrew McGregor, UMass Amherst	Data Streams and Linear Sketches (Survey)	Slides
3:30-4:00	Coffee Break
4:00-4:40	John Langford, Microsoft Research	Special Topics: Fun Machine Learning Problems on Big Data	Slides
4:40-5:20	Piotr Indyk, MIT	Special Topics: CS on CS: Computer Science Insights into Compressive Sensing (and vice versa)	Slides
5:20-6:00	Ashish Goel, Stanford and Twitter	Special Topics: Challenges in Industry and Education	Slides