CARDIO: Cost-Aware Replication for Data-Intensive workflOws

### CARDIO: Cost-Aware Replication for Data-Intensive workflOws

Presented by Chen He

Motivation

- Is large scale cluster reliable?
- 5 average worker deaths per Map-Reduce job
- At least 1 disk failure in every run of a 6- hour MapReduce job on a 4000-node cluster

Motivation

- How to prevent node failure from affecting performance?
- Replication
- Capacity constraint
- Replication time, etc
- Regeneration through re-execution
- Delay program progress
- Cascaded re-execution

Outline

- Problem Exploration
- CARDIO Model
- Hadoop CARDIO System
- Evaluation
- Discussion

Problem Exploration

- Performance Costs
- Replication cost (R)
- Regeneration cost (G)
- Reliability cost (Z)
- Execution cost (A)
- Total cost (T)
- Disk cost (Y)

T=A+Z

Z=R+G

Problem Exploration

- Experiment Environment
- Hadoop 0.20.2
- 25 VMs
- Workloads: Tagger->Join->Grep->RecordCounter

Problem Exploration Summary

- Replication Factor for MR Stages

Problem Exploration Summary

- Detailed Execution Time of 3 Cases

CARDIO Model

- Block Failure Model
- Output of stage i is
- Replication factor is
- Total block number is
- Single block failure probability is
- Failure probability in stage i:

CARDIO Model

- Cost Computation Model
- Total time of stage i:
- Replication cost of stage i:
- Expected regeneration time of stage i:
- Reliability cost for all stages:
- Storage Constraint C of all stages:
- Choose to minimize Z

CARDIO Model

- Dynamic Replication
- Replication number x may vary during the program approaching
- Job is in Step k, the replication factor at this step is:

CARDIO Model

- Model for Reliability
- Minimize
- Based on
- In the condition of

CARDIO Model

- Resource Utilization Model
- Model Cost = resource utilized
- Resource type Q
- CPU, Network, Disk, and Storage resource, etc.
- Utilization of q resource in stage i:
- Normalize usage by
- Relative costs weights:

CARDIO Model

- Resource Utilization Model
- The cost for A is:
- Total Cost:
- Optimization target:
- Choose to minimize T

CARDIO Model

- Optimization Problem
- Job optimality (JO)
- Stage optimality (SO)

Hadoop CARDIO System

- CardioSense
- Obtain progress from JT periodically
- Be triggered by pre-configured threshold-value
- Collect resource usage statistics for running stages
- Rely on HMon on each worker node
- HMon based on Atop has low overhead

Hadoop CARDIO System

- CardioSolve
- Receive data from CardioSense
- Solve SO problem
- Decide the replication factors for current and previous stages

Hadoop CARDIO System

- CardioAct
- Implement the command from CardioSolve
- Use HDFS API setReplication(file, replicaNumber)

Evaluation

- Several Important Parameters
- p is the failure rate 0.2 if not specified
- is the time to replicate a data unit, 0.2 as well
- is the computation resource of stage i, it follows uniform distribution U(1,Cmax),Cmax=100 in general.
- is the output of stage i, it is obtained from a uniform distribution U(1, Dmax), Dmax varies within the [1,Cmax].
- C is the storage constraint for the whole process. Default value is

Evaluation

- Effect of Dmax

Evaluation

- Effect of Failure rate p

Evaluation

- Effect of block size

Evaluation

- Effect of different resource constraints

++ means over-utilzed, and this type of resource is regarded as expensive

P=0.08, C=204GB, delta=0.6

S3 is CPU intensive

DSK has similar performance pattern as NET

CPU 0010, NET 0011, DSKIO 0011,STG0011

Evaluation

S2 re-execute more frequently due to the failure injection. Because it has large data output.

P=0.02, 0.08 and 0.1

1 , 3, 21

API reason

