CS691: Big Data Systems

 

Fall 2012

     

Home

Schedule

Resources

 

 
Note: This schedule is tentative and may change based on the composition and preferences of the class.
Date Paper reading Presenters Lighter reading
Sep 10

Overview
MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean, Sanjay Ghemawat, OSDI 2004.

Arun Challenges and opportunities with big data
Sep 17

Distributed key-value stores
Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, OSDI 2006.
Dynamo: Amazon's Highly Available Key-value Store, Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels, SOSP 2007.

Arun MapReduce.pptx

Pengyu BigTable.pdf

 

The end of theory (Wired)

The weatherman is not a moron (NYTimes)

Sep 24

Tutorial on datasets, tools, and project topics.

Aditya

Arun

Big data, big impact: New possibilities for international development

Oct 1 Oct 4 (Thu)

Distributed key-value stores
Comet: An Active Distributed Key-Value Store
, Roxana Geambasu, Amit A. Levy, Tadayoshi Kohno, Arvind Krishnamurthy, and Henry M. Levy, University of Washington, OSDI 2010.
HyperDex: A Distributed, Searchable Key-Value Store, Robert Escriva, Bernard Wong and Emin Gun Sirer., SIGCOMM 2012.

Tongping

Brian

Big data roadmap for government

TechAmerica Report

Oct 9 (Tue)

Enterprise data analytics
MAD Skills: New Analysis Practices for Big Data, Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, Caleb Welton, VLDB 2009.
SQL-MapReduce: A practical approach to selfdescribing, polymorphic, and parallelizable user-defined functions, Eric Friedman, Peter Pawlowski, John Cieslewicz, VLDB 2009.

Hardeep

Moaj

 
Oct 15

Solving Big Data Challenges for Enterprise Application Performance Management, Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen, Victor Muntes Mulero, Serge Mankovskii, VLDB 2012.

A Comparison of Approaches to Large-Scale Data Analysis, Andrew Pavlo , Erik Paulson , Alexander Rasin, Daniel J. Abadi , David J. Dewitt , Samuel Madden , Michael Stonebraker, SIGMOD 2009.

Hardeep

Abhigyan

 
Oct 22

Graph computation
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin, Carnegie Mellon University, OSDI 2012.
GraphChi: Large-Scale Graph Computation on Just a PC, Aapo Kyrola, Guy Blelloch, and Carlos Guestrin, Carnegie Mellon University, OSDI 2012.

Daniel

Brian

 

Nov 5

 

 

 

Datacenter storage and transport
Flat Datacenter Storage, Ed Nightingale and Jeremy Elson, Microsoft Research; Owen Hofmann, University of Texas at Austin; Yutaka Suzue, Jinliang Fan, and Jon Howell, Microsoft Research, OSDI 2012.
Managing data transfers in computer clusters with Orchestra, M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, SIGCOMM 2011.

Sean

Aditya

 

 
Nov 14 The SCADS Director: Scaling a distributed storage system under stringent performance requirements. B. Trushkowsky, P. Bodik, A. Fox, M. Franklin, M. I. Jordan, and D. Patterson. In 9th USENIX Conference on File and Storage Technologies (FAST '11)

CORFU: A Shared Log Design for Flash Clusters Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, and Ted Wobber, Microsoft Research Silicon Valley; Michael Wei, University of California, San Diego; John D. Davis, Microsoft Research Silicon Valley, NSDI 2012.

Sean

Sippakorn

 

Nov 19

Managing computing and data
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
, Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica, University of California, Berkeley, NSDI 2012.

Camdoop: Exploiting In-network Aggregation for Big Data Applications
, Paolo Costa, Microsoft Research Cambridge and Imperial College London; Austin Donnelly, Antony Rowstron, and Greg O'Shea, Microsoft Research Cambridge, NSDI 2012.

Aditya

 

Hardeep

 
Nov 26

PACMan: Coordinated Memory Caching for Parallel Jobs, Ganesh Ananthanarayanan, Ali Ghodsi, and Andrew Wang, University of California, Berkeley; Dhruba Borthakur, Facebook; Srikanth Kandula, Microsoft Research; Scott Shenker and Ion Stoica, University of California, Berkeley, NSDI 2012.

Reoptimizing Data Parallel Computing, Sameer Agarwal, University of California, Berkeley; Srikanth Kandula, Microsoft Research; Nico Bruno and Ming-Chuan Wu, Microsoft Bing; Ion Stoica, University of California, Berkeley; Jingren Zhou, Microsoft Bing, NSDI 2012.

Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions
, Jiaxing Zhang and Hucheng Zhou, Microsoft Research Asia; Rishan Chen, Microsoft Research Asia and Peking University; Xuepeng Fan, Microsoft Research Asia and Huazhong University of Science and Technology; Zhenyu Guo and Haoxiang Lin, Microsoft Research Asia; Jack Y. Li, Microsoft Research Asia and Georgia Institute of Technology; Wei Lin and Jingren Zhou, Microsoft Bing; Lidong Zhou, Microsoft Research Asia, NSDI 2012.

Moaj

Brian

Tongping

 
Dec 3

Miscellaneous
Large-scale system problems detection by mining console logs, W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, SOSP 2011.

Spanner: Google's Globally-Distributed Database, James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, OSDI 2012

Brian

Abhigyan

 
Dec 10

Project presentations

 

 

Dec 16

Project reports due