Making Cloud Intermediate Data Fault-Tolerant

Making Cloud Intermediate Data Fault-Tolerant Steven Y. Ko*, ImranulHoque, Brain Cho, Indranil Gupta Presentation by John Shu Dept. of Computer Science University of Illinois at Urbana-Champaign Urbana, IL,USA • *Dept. of Computer Science • Princeton University • Princeton, NJ, USA

Agenda • Terminology • Trends • Motivation • Key Issues • Solutions • Implementation • Results • Conclusion

Terminology • Map Reduce: Map Reduce is a software framework from Google to support distributed computing on a large data set on clusters of computers. • Map Step: Master Node takes input and partitions and distributes to smaller nodes. They send answers back to master. • Reduce Step: Master Node combines answer to problem in an attempt to generate an answer to the input data.

Terminology • Map Reduce: The paper outlines three major steps in the standard implementation • Map Step: Map phase executes user provided functions in parallel. Input is divided into chunks and stored in DFS. Map reads chunks, generates Intermediate Data (ID) and stores for next stage. • Shuffle Step: Shuffle phase moves Intermediate Data (ID) generated among machines in the cluster. Communication model is all to all i.e. from map to reduce. • Reduce Step: Reduce phase executes user provided functions in parallel over Map Reduce cluster. It stores it’s output in the DFS. For one task this is will final output but for more this will be ID .

Trends • With the advent of cloud computing the already existing need for copious amounts of data processing is hitting record highs. Yahoo web graph generation receives 280TB of input a day. • Parallel data flow programming offers a very feasible solution through frameworks like Map Reduce, Dryad, Pig, Hive etc • Organizations like A9.com, AOL, Facebook, Yahoo and the New York times use Hadoop an open source implementation of MapReduce. • Parallel Programs written in these test beds run in data centers such as private clouds eg Microsoft and public such as UIUC.

Motivation • In these huge data facilities, the focus is efficient performance and productivity without having to sacrifice processing time. • Parallel Data Flow programs generate enormous amounts of distributed data that is short-lived yet crucial for completion go the job and run time performance. • This distributed data is called Intermediate Data and this paper focuses on minimizing the effects of run time server failure on the availability of ID and also performance metrics e.g. job completion time.

ID Background • Intermediate Data is data generated during executions of parallel data flow programs directly or indirectly from the input data but excludes the final output and input data as well. • Framework operates as a sequence of map reduce jobs. In the execution of these jobs ID is produced as output from one stage and serves as input to the next stage. • The ID thus has to be distributed among the nodes in the cluster and continuous propagation adds up to really enormous amounts of data.

ID Background [Making Cloud Intermediate Data Fault-Tolerant, Steven Y. Ko et al]

Key Issues • Intermediate Data is short-lived, used immediately, written once and read once. It is stored in blocks and is distributed on a large scale across the cluster. • Now, the blocks for the next stage have to be ready from the previous stage before execution starts. In essence performance and job completion is dependent on generation of ID before next stage. • ID can be lost if there is server failure and for some small scale Hadoop applications this has led to 50% time prolongation.

Existing Solutions • First approach, data can be stored locally in naïve file system and fetched remotely by tasks of next stage. Data is not replicated here and failure results in re-execution of tasks. • Second approach involves DFS where data is written back to a distributed file system where it is automatically replicated. This adequately supports fault tolerance but incurs significant over head.

Proposed Solution • Goal is to achieve as good a performance as the local store but also the fault tolerance of the DFS approach • With Store local, a single failure results in cascaded re-execution • If this can be avoided by implementing a robust fault tolerant system while maintaining low over head then we can produce a better system

Proposed Solution: The How ? • Asynchronous replication which allows writers to proceed without waiting for replication to complete. • Rack-Level Replication where replicas of intermediate data blocks are always placed among the machines of the same rack. • Selective Replication which selectively replicates data to be consumed locally reducing the total amount of replication to be done.

Implementation • The above mentioned scheme was dubbed ISS (Intermediate Storage System). • It replicates the Map and Reduce with significantly less overhead while preventing cascaded re-execution for multi stage reduce programs • ISS is not a stand alone framework, it is implemented as an extension to the Hadoop and performs well enough elimination the need for the shuffle phase altogether.

Results

Conclusion • We have shown the need for, presented requirements towards, and designed a new intermediate storage system (ISS) that treats intermediate data in data ﬂow programs as a ﬁrst-class citizen in order to tolerate failures. • We have also shown that our asynchronous rack-level selective replication mechanism is effective and masks interference very well. • Under a failure, ISS incurs only up to 18% of overhead compared to Hadoop with no failures.

References • B. F. Cooper, R. Ramakrishnan, U. Srivastava,A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!’s Hosted Data Serving Platform. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2008. • M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs From Sequential Building Blocks. In Proceedings of the 2007 EuroSys Conference (EuroSys), 2007. • M. K. Aguilera, A. Merchant, M. A. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In Proceedings of the ACM Symposium on Operating systems principles (SOSP), 2007.

Questions ?

Making Cloud Intermediate Data Fault-Tolerant

Making Cloud Intermediate Data Fault-Tolerant

Presentation Transcript

Fault-Tolerant Broadcast

Fault Tolerant Parallel Data-Intensive Algorithms

Fault-Tolerant Broadcast

Fault-Tolerant CORBA

FAULT TOLERANT CORBA

Fault Tolerant MPI

Fault Tolerant Parallel Data-Intensive Algorithms

Fault-Tolerant Consensus

Fault Tolerant Backplane

Making Services Fault Tolerant

FAULT-TOLERANT COMPUTING

FAULT-TOLERANT COMPUTING

Fault Tolerant Configuration

Fault-tolerant Control

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

fault-tolerant

Fault-tolerant routing

Fault-Tolerant Consensus

Fault-Tolerant Broadcast

Fault-tolerant Computing