CARDIO: Cost-Aware Replication for Data-Intensive workflOws. Presented by Chen He. Motivation. Is large scale cluster reliable? 5 average worker deaths per Map-Reduce job At least 1 disk failure in every run of a 6- hour MapReduce job on a 4000-node cluster. Motivation.
Presented by Chen He
All pictures adopted from the Internet
++ means over-utilzed, and this type of resource is regarded as expensive
P=0.08, C=204GB, delta=0.6
S3 is CPU intensive
DSK has similar performance pattern as NET
CPU 0010, NET 0011, DSKIO 0011,STG0011
S2 re-execute more frequently due to the failure injection. Because it has large data output.
P=0.02, 0.08 and 0.1
1 , 3, 21