Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk

Comp6611 Course Lecture Big data applications Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk Material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)

Today's Topics • MapReduce • Background information/overview • Map and Reduce -------- from a programmer's perspective • Architecture and workflow -------- a global overview • Virtues and defects • Improvement • Spark Background MapReduce Spark

MapReduce Background • Before MapReduce, large-scale data processing was difficult • Managing parallelization and distribution • Application development is tedious and hard to debug • Resource scheduling and load-balancing • Data storage and distribution • Distributed file system • “Moving computation is cheaper than moving data.” • Fault/crash tolerance • Scalability Background MapReduce Spark

Does “Divide and Conquer paradigm” still work in big data? Work Partition Worker Worker Worker Result Background MapReduce Spark Combine

Programming Model • Opportunity: design an software abstraction undertake the divide and conquer and reduce programmers' workload for • resource management • task scheduling • distributed synchronization and communication • Functional programming, which has long history, provides some high-order functions to support divide and conquer. • Map: do something to everything in a list • Fold: combine results of a list in some way Application Background MapReduce Spark Abstraction Computer Computer Computer …

Map • Map is a higher-order function • How map works: • Function is applied to every element in a list • Result is a new list f f f f f Background MapReduce Spark

Fold • Fold is also a higher-order function • How fold works: • Accumulator set to initial value • Function applied to list element and the accumulator • Result stored in the accumulator • Repeated for every item in the list • Result is the final value in the accumulator f f f f f final value Background MapReduce Spark Initial value

Map/Fold in Action • Simple map example: • Fold examples: • Sum of squares: (map (lambda (x) (* x x))'(1 2 3 4 5))  '(1 4 9 16 25) (fold + 0 '(1 2 3 4 5))  15 (fold * 1 '(1 2 3 4 5))  120 (define (sum-of-squares v) (fold + 0 (map (lambda (x) (* x x)) v))) (sum-of-squares '(1 2 3 4 5))  55 Background MapReduce Spark

MapReduce • Programmers specify two functions: map(k1,v1) → list(k2,v2) reduce(k2, list (v2)) → list(v2) function map(String name, String document): // K1 name: document name // V1 document: document contents for each word w in document: emit (w, 1) function reduce(String word, Iterator partialCounts): // K2 word: a word // list(V2) partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum) Background MapReduce Spark An implementation of WordCount

It's just divide and conquer! Data Store Initial kv pairs Initial kv pairs Initial kv pairs Initial kv pairs map map map map k1, values… k1, values… k1, values… k1, values… k3, values… k3, values… k3, values… k3, values… k2, values… k2, values… k2, values… k2, values… Barrier: aggregate values by keys k1, values… k2, values… k3, values… Background MapReduce Spark reduce reduce reduce final k1 values final k2 values final k3 values

Behind the scenes… Background MapReduce Spark

Programming interface • input reader • Map function • partitionfunction • The partition function is given the key and the number of reducers and returns the index of the desired reduce. • For load-balance, e.g. Hash function • comparefunction • The compare function is used to sort computing output. • Ordering guarantee • Reduce function • outputwriter Background MapReduce Spark

Ouput of a Hadoop job ypeng@vm115:~/hadoop-0.20.2$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount /user/hduser/wordcount/15G-enwiki-input /user/hduser/wordcount/15G-enwiki-output 13/01/16 07:00:48 INFO input.FileInputFormat: Total input paths to process : 1 13/01/16 07:00:49 INFO mapred.JobClient: Running job: job_201301160607_0003 13/01/16 07:00:50 INFO mapred.JobClient: map 0% reduce 0% ......................... 13/01/16 07:01:50 INFO mapred.JobClient: map 18% reduce 0% 13/01/16 07:01:52 INFO mapred.JobClient: map 19% reduce 0% 13/01/16 07:02:06 INFO mapred.JobClient: map 20% reduce 0% 13/01/16 07:02:08 INFO mapred.JobClient: map 20% reduce 1% 13/01/16 07:02:10 INFO mapred.JobClient: map 20% reduce 2% ......................... 13/01/16 07:06:41 INFO mapred.JobClient: map 99% reduce 32% 13/01/16 07:06:47 INFO mapred.JobClient: map 100% reduce 33% 13/01/16 07:06:55 INFO mapred.JobClient: map 100% reduce 39% ......................... 13/01/16 07:07:21 INFO mapred.JobClient: map 100% reduce 99% 13/01/16 07:07:31 INFO mapred.JobClient: map 100% reduce 100% 13/01/16 07:07:43 INFO mapred.JobClient: Job complete: job_201301160607_0003 (To continue.) Progress Background MapReduce Spark

Counters in a Hadoop job 13/01/16 07:07:43 INFO mapred.JobClient: Counters: 18 13/01/16 07:07:43 INFO mapred.JobClient: Job Counters 13/01/16 07:07:43 INFO mapred.JobClient: Launched reduce tasks=24 13/01/16 07:07:43 INFO mapred.JobClient: Rack-local map tasks=17 13/01/16 07:07:43 INFO mapred.JobClient: Launched map tasks=249 13/01/16 07:07:43 INFO mapred.JobClient: Data-local map tasks=203 13/01/16 07:07:43 INFO mapred.JobClient: FileSystemCounters 13/01/16 07:07:43 INFO mapred.JobClient: FILE_BYTES_READ=12023025990 13/01/16 07:07:43 INFO mapred.JobClient: HDFS_BYTES_READ=15492905740 13/01/16 07:07:43 INFO mapred.JobClient: FILE_BYTES_WRITTEN=14330761040 13/01/16 07:07:43 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=752814339 13/01/16 07:07:43 INFO mapred.JobClient: Map-Reduce Framework 13/01/16 07:07:43 INFO mapred.JobClient: Reduce input groups=39698527 13/01/16 07:07:43 INFO mapred.JobClient: Combine output records=508662829 13/01/16 07:07:43 INFO mapred.JobClient: Map input records=279422018 13/01/16 07:07:43 INFO mapred.JobClient: Reduce shuffle bytes=2647359503 13/01/16 07:07:43 INFO mapred.JobClient: Reduce output records=39698527 13/01/16 07:07:43 INFO mapred.JobClient: Spilled Records=828280813 13/01/16 07:07:43 INFO mapred.JobClient: Map output bytes=24932976267 13/01/16 07:07:43 INFO mapred.JobClient: Combine input records=2813475352 13/01/16 07:07:43 INFO mapred.JobClient: Map output records=2376465967 13/01/16 07:07:43 INFO mapred.JobClient: Reduce input records=71653444 Summary of counters in job Background MapReduce Spark

Master in MapReduce • Resource Management • Maintain the current resource usage of each Worker(CPU, RAM, Used & free disk space, etc.) • Examine worker failure periodically. • Task Scheduling • “Moving computation is cheaper than moving data.” • Map and reduce tasks are assigned to idle Workers. • Tasks on failure workers will be re-scheduled. • When job is close to end, it launches backup tasks. • Counter • provides interactive job progress. • stores the occurrences of various events. • is helpful to performance tuning. Background MapReduce Spark

Data-oriented Map scheduling Launch map 1 on Worker 3 input 1 + 2 + 3 + 4 + 5 = input splits Switch Switch 3 4 4 2 1 Worker 1 Worker 4 5 2 2 5 Worker 2 3 Worker 5 4 1 5 1 3 Worker 3 Worker 6 Rack 1 Rack 2 Background MapReduce Spark Launch map 2 on Worker 4 Launch map 3 on Worker 1 Launch map 4 on Worker 2 Launch map 5 on Worker 5

Data flow in MapReduce jobs GFS rack-local split non-local split local split intermediate files (on disk) merged spills (on disk) Reducer Mapper GFS circular buffer (in memory) Combiner Background MapReduce Spark other Reducers spills (on disk) other Mappers

Map internal • The map phase reads the task’s input split from GFS, parses it into records(key/value pairs), and applies the map function to each records. • After the map function has been applied to each record, the commit phase registers the final output to Master, which will tell reduce the location of map output. Background MapReduce Spark

Reduce internal • The shuffle phase fetches the reduce task’s input data. • The sort phase groups records with the same key together. • The reduce phase applies the user-defined reduce function to each key and corresponding list of values. Background MapReduce Spark

Backup Tasks • There are barriers in a MapReduce job. • No reduce function executes until all maps finish. • The jobcan not complete until all reduces finish. • The execution time of a job will be severely lengthened if a task is blocked. • Master schedules backup/speculative tasks for unfinished ones before the job is close to end. • A job will take 44% longer if backup tasks are disabled. Map Reduce Map Job complete Reduce Map Reduce Map Background MapReduce Spark

Virtues and defects of MR Virtues Defects Bad for iterative ML algorithms Not sure • Towards large scale data • Programming friendly • Implicit parallelism • Data-locality • Fault/crash tolerance • Scalability • Open-source with good ecosystem[1] Background MapReduce Spark [1] http://docs.hortonworks.com/CURRENT/index.htm#About_Hortonworks_Data_Platform/Understanding_Hadoop_Ecosystem.htm

Network traffic in MapReduce 1. Map may read split from remote ChunkServer 2. Reduce copy the output of Map 3. Reduce output write to GFS 1 2 3 Background MapReduce Spark

Disk R/W in MapReduce 1. ChunkServer reads local block for remote split fetching 2. Spill intermediate result to disk 3. Write the copied partition to local disk 4. Write the result output to local ChunkServer 5. Write the result output to remote ChunkServer 1 2 3 4 Background MapReduce Spark 5

Iterative MapReduce Performing graph algorithm Using MapReduce. Background MapReduce Spark

Motivation of Spark • Iterativealgorithms (machine learning, graphs) • Interactivedata mining tools (R, Excel, Python) Background MapReduce Spark

Programming Model • Fine-grained • Computing outputs of every iteration are distributed and store to stable storage • Coarse-grained • Only logging the transformations to build a dataset (i.e. lineage) • Resilient distributed datasets (RDDs) • Immutable, partitioned collections of objects • Created through parallel transformations (map, filter, groupBy, join, …) on data in stable storage • Can be cached for efficient reuse • Actionson RDDs • Count, reduce, collect, save, … Background MapReduce Spark

Spark Operations Background MapReduce Spark

Example: Log Mining Load error messages from a log into memory, then interactively search for various patterns Cache 1 Base RDD Transformed RDD lines = spark.textFile(“hdfs://...”) errors = lines.filter(_.startsWith(“ERROR”)) messages = errors.map(_.split(‘\t’)(2)) cachedMsgs = messages.cache() Worker results tasks Block 1 Driver Action cachedMsgs.filter(_.contains(“foo”)).count Cache 2 cachedMsgs.filter(_.contains(“bar”)).count Worker . . . Cache 3 Block 2 Worker Result: full-text search of Wikipedia in <1 sec (vs 20 sec for on-disk data) Result: scaled to 1 TB data in 5-7 sec(vs 170 sec for on-disk data) Block 3

RDD Fault Tolerance RDDs maintain lineage information that can be used to reconstruct lost partitions Ex: messages = textFile(...).filter(_.startsWith(“ERROR”)) .map(_.split(‘\t’)(2)) HDFS File Filtered RDD Mapped RDD filter(func = _.contains(...)) map(func = _.split(...)) Background MapReduce Spark

Example: Logistic Regression Goal: find best line separating two sets of points random initial line + + + + + + – + + – – + – + – – – – Background MapReduce Spark – – target

Example: Logistic Regression valdata = spark.textFile(...).map(readPoint).cache() varw = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = data.map(p => (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x ).reduce(_ + _) w -= gradient } println("Finalw: " + w) Keep variable “data” in memory Background MapReduce Spark

Logistic Regression Performance 127 s / iteration first iteration 174 s further iterations 6 s Background MapReduce Spark

Spark Programming Interface (eg. page rank)

Representing RDDs

Spark Scheduler • Dryad-like DAGs • Pipelines functionswithin a stage • Cache-aware workreuse & locality • Partitioning-awareto avoid shuffles Background MapReduce Spark = cached data partition

Behavior with Not Enough RAM

Fault Recovery Results

Conclusion • Both MapReduce and Spark are excellent big data software, which are scalable, fault-tolerant, and programming friendly. • Especially, Spark provides a more effective method for iterative computing jobs. Background MapReduce Spark

Thanks! Questions?

Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk

Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk

Presentation Transcript

Monday, March 25 (computer lab)

Monday, March 25, 2013

Monday, March 18, 2013

Monday, March 18, 2013

Monday , November 11, 2013

Monday, March 4, 2013

Monday, November 11, 2013

Superintendent’s report Monday, march 11, 2013

Monday, 11/18/2013

Monday, March 18, 2013

Monday Mar 11, 2013

Monday, March 11 th

Monday November 11, 2013

HKUST CSE Dept.

November 11, 2013: Monday

Monday March , 2013

Monday, March 4, 2013

CSE 171 Lab 11

Monday, 11/4/2013

Monday, March 3, 2013

Monday February 11, 2013

Monday March , 2013