Home
Schedule
Resources
|
|
Note: This schedule is tentative and may change based on the composition and preferences of the class. |
Date |
Paper reading |
Presenters |
Lighter reading |
Sep 10 |
Overview
MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean, Sanjay Ghemawat, OSDI 2004. |
Arun |
Challenges and opportunities with big data |
Sep 17 |
Distributed key-value stores
Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, OSDI 2006.
Dynamo: Amazon's Highly Available Key-value Store, Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels, SOSP 2007. |
Arun MapReduce.pptx
Pengyu BigTable.pdf
|
The end of theory (Wired)
The weatherman is not a moron (NYTimes) |
|
Tutorial on datasets, tools, and project topics.
|
Aditya
Arun |
Big data, big impact: New possibilities for international development |
Oct 1 Oct 4 (Thu)
|
Distributed key-value stores
Comet: An Active Distributed Key-Value Store, Roxana Geambasu, Amit A. Levy, Tadayoshi Kohno, Arvind Krishnamurthy, and Henry M. Levy, University of Washington, OSDI 2010.
HyperDex: A Distributed, Searchable Key-Value Store, Robert Escriva, Bernard Wong and Emin Gun Sirer., SIGCOMM 2012. |
Tongping
Brian |
Big data roadmap for government
TechAmerica Report |
Oct 9 (Tue) |
Enterprise data analytics
MAD Skills: New Analysis Practices for Big Data, Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, Caleb Welton, VLDB 2009.
SQL-MapReduce: A practical approach to selfdescribing, polymorphic, and parallelizable user-defined functions, Eric Friedman, Peter Pawlowski, John Cieslewicz, VLDB 2009.
|
Hardeep
Moaj |
|
Oct 15 |
Solving Big Data Challenges for Enterprise Application Performance Management, Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen, Victor Muntes Mulero, Serge Mankovskii, VLDB 2012.
A Comparison of Approaches to Large-Scale Data Analysis, Andrew Pavlo , Erik Paulson , Alexander Rasin, Daniel J. Abadi , David J. Dewitt , Samuel Madden , Michael Stonebraker, SIGMOD 2009. |
Hardeep
Abhigyan
|
|
Oct 22 |
Graph computation
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin, Carnegie Mellon University, OSDI 2012.
GraphChi: Large-Scale Graph Computation on Just a PC, Aapo Kyrola, Guy Blelloch, and Carlos Guestrin, Carnegie Mellon University, OSDI 2012. |
Daniel
Brian |
|
Nov 5
|
Datacenter storage and transport
Flat Datacenter Storage, Ed Nightingale and Jeremy Elson, Microsoft Research; Owen Hofmann, University of Texas at Austin; Yutaka Suzue, Jinliang Fan, and Jon Howell, Microsoft Research, OSDI 2012.
Managing data transfers in computer clusters with Orchestra, M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, SIGCOMM 2011.
|
Sean
Aditya
|
|
Nov 14 |
The SCADS Director: Scaling a distributed storage system under stringent performance requirements. B. Trushkowsky, P. Bodik, A. Fox, M. Franklin, M. I. Jordan, and D. Patterson. In 9th USENIX Conference on File and Storage Technologies (FAST '11)
CORFU: A Shared Log Design for Flash Clusters Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, and Ted Wobber, Microsoft Research Silicon Valley; Michael Wei, University of California, San Diego; John D. Davis, Microsoft Research Silicon Valley, NSDI 2012. |
Sean
Sippakorn |
|
Nov 19 |
Managing computing and data
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica, University of California, Berkeley, NSDI 2012.
Camdoop: Exploiting In-network Aggregation for Big Data Applications, Paolo Costa, Microsoft Research Cambridge and Imperial College London; Austin Donnelly, Antony Rowstron, and Greg O'Shea, Microsoft Research Cambridge, NSDI 2012. |
Aditya
Hardeep |
|
Nov 26 |
PACMan: Coordinated Memory Caching for Parallel Jobs, Ganesh Ananthanarayanan, Ali Ghodsi, and Andrew Wang, University of California, Berkeley; Dhruba Borthakur, Facebook; Srikanth Kandula, Microsoft Research; Scott Shenker and Ion Stoica, University of California, Berkeley, NSDI 2012.
Reoptimizing Data Parallel Computing, Sameer Agarwal, University of California, Berkeley; Srikanth Kandula, Microsoft Research; Nico Bruno and Ming-Chuan Wu, Microsoft Bing; Ion Stoica, University of California, Berkeley; Jingren Zhou, Microsoft Bing, NSDI 2012.
Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions, Jiaxing Zhang and Hucheng Zhou, Microsoft Research Asia; Rishan Chen, Microsoft Research Asia and Peking University; Xuepeng Fan, Microsoft Research Asia and Huazhong University of Science and Technology; Zhenyu Guo and Haoxiang Lin, Microsoft Research Asia; Jack Y. Li, Microsoft Research Asia and Georgia Institute of Technology; Wei Lin and Jingren Zhou, Microsoft Bing; Lidong Zhou, Microsoft Research Asia, NSDI 2012. |
Moaj
Brian
Tongping |
|
Dec 3 |
Miscellaneous
Large-scale system problems detection by mining console logs, W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, SOSP 2011.
Spanner: Google's Globally-Distributed Database, James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, OSDI 2012
|
Brian
Abhigyan |
|
Dec 10 |
Project presentations |
|
|
Dec
16
|
Project reports due |
|
|
|