1 / 59

Map-Reduce and Its Children

Map-Reduce and Its Children. Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion. Distributed File Systems. Chunking Replication Distribution on Racks. Distributed File Systems. Files are very large, read/append. They are divided into chunks .

Olivia
Download Presentation

Map-Reduce and Its Children

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion

  2. Distributed File Systems Chunking Replication Distribution on Racks

  3. Distributed File Systems • Files are very large, read/append. • They are divided into chunks. • Typically 64MB to a chunk. • Chunks are replicated at several compute-nodes. • A master (possibly replicated) keeps track of all locations of all chunks.

  4. Compute Nodes • Organized into racks. • Intra-rack connection typically gigabit speed. • Inter-rack connection faster by a small factor.

  5. File Chunks Racks of Compute Nodes

  6. 3-way replication of files, with copies on different racks.

  7. Implementations • GFS (Google File System – proprietary). • HDFS (Hadoop Distributed File System – open source). • CloudStore (Kosmix File System, open source).

  8. Above the DFS Map-Reduce Key-Value Stores SQL Implementations

  9. The New Stack SQL Implementations, e.g., PIG (relational algebra), HIVE Object Store (key-value store), e.g., BigTable, Hbase, Cassandra Map-Reduce, e.g. Hadoop Distributed File System

  10. Map-Reduce Systems • Map-reduce (Google) and open-source (Apache) equivalent Hadoop. • Important specialized parallel computing tool. • Cope with compute-node failures. • Avoid restart of the entire job.

  11. Key-Value Stores • BigTable (Google), Hbase, Cassandra (Apache), Dynamo (Amazon). • Each row is a key plus values over a flexible set of columns. • Each column component can be a set of values.

  12. SQL-Like Systems • PIG – Yahoo! implementation of relational algebra. • Translates to a sequence of map-reduce operations, using Hadoop. • Hive – open-source (Apache) implementation of a restricted SQL, called QL, over Hadoop.

  13. SQL-Like Systems – (2) • Sawzall – Google implementation of parallel select + aggregation. • Scope – Microsoft implementation of restricted SQL.

  14. Map-Reduce Formal Definition Fault-Tolerance Example: Join

  15. Map-Reduce • You write two functions, Map and Reduce. • They each have a special form to be explained. • System (e.g., Hadoop) creates a large number of tasks for each function. • Work is divided among tasks in a precise way.

  16. Map-Reduce Pattern “key”-value pairs Input from DFS Output to DFS Map tasks Reduce tasks

  17. Map-Reduce Algorithms • Map tasks convert inputs to key-value pairs. • “keys” are not necessarily unique. • Outputs of Map tasks are sorted by key, and each key is assigned to one Reduce task. • Reduce tasks combine values associated with a key.

  18. Coping With Failures • Map-reduce is designed to deal with compute-nodes failing to execute a task. • Re-executes failed tasks, not whole jobs. • Failure modes: • Compute node failure (e.g., disk crash). • Rack communication failure. • Software failures, e.g., a task requires Java n; node has Java n-1.

  19. Things Map-Reduce is Good At • Matrix-Matrix and Matrix-vector multiplication. • One step of the PageRank iteration was the original application. • Relational algebra operations. • We’ll do an example of the join. • Many other “embarrassingly parallel” operations.

  20. Joining by Map-Reduce • Suppose we want to compute R(A,B) JOIN S(B,C), using k Reduce tasks. • I.e., find tuples with matching B-values. • R and S are each stored in a chunked file.

  21. Joining by Map-Reduce – (2) • Use a hash function h from B-values to k buckets. • Bucket = Reduce task. • The Map tasks take chunks from R and S, and send: • Tuple R(a,b) to Reduce task h(b). • Key = b value = R(a,b). • Tuple S(b,c) to Reduce task h(b). • Key = b; value = S(b,c).

  22. Map tasks send R(a,b) if h(b) = i All (a,b,c) such that h(b) = i, and (a,b) is in R, and (b,c) is in S. Map tasks send S(b,c) if h(b) = i Joining by Map-Reduce – (3) Reduce task i

  23. Joining by Map-Reduce – (4) • Key point: If R(a,b) joins with S(b,c), then both tuples are sent to Reduce task h(b). • Thus, their join (a,b,c) will be produced there and shipped to the output file.

  24. Dataflow Systems Arbitrary Acyclic Flow Among Tasks Preserving Fault Tolerance The Blocking Property

  25. Generalization of Map-Reduce • Map-reduce uses only two functions (Map and Reduce). • Each is implemented by a rank of tasks. • Data flows from Map tasks to Reduce tasks only.

  26. Generalization – (2) • Natural generalization is to allow any number of functions, connected in an acyclic network. • Each function implemented by tasks that feed tasks of successor function(s). • Key fault-tolerance (blocking ) property: tasks produce all their output at the end.

  27. Many Implementations • Clustera – University of Wisconsin. • Hyracks – Univ. of California/Irvine. • Dryad/DryadLINQ – Microsoft. • Nephele/PACT – T. U. Berlin. • BOOM – Berkeley. • epiC – N. U. Singapore.

  28. Example: Join + Aggregation • Relations D(emp, dept) and S(emp,sal). • Compute the sum of the salaries for each department. • D JOIN S computed by map-reduce. • But each Reduce task can also group its emp-dept-sal tuples by dept and sum the salaries.

  29. Example: Continued • A Third function is needed to take the dept-SUM(sal) pairs from each Reduce task, organize them by dept, and compute the final sum for each department.

  30. Final Group + Aggreg- ate Join + Group Tasks Hash by dept Hash by emp 3-Layer Dataflow Map Tasks D S

  31. Recursion Transitive-Closure Example Fault-Tolerance Problem Endgame Problem Some Systems and Approaches Recent research ideas contributed by F. Afrati, V. Borkar, M. Carey, N. Polyzotis

  32. Applications Requiring Recursion • PageRank, the original map-reduce application is really a recursion implemented by many rounds of map-reduce. • Analysis of Web structure. • Analysis of social networks. • PDE’s.

  33. Nonlinear (Right) Linear Transitive Closure • Many recursive applications involving large data are similar to transitive closure : Path(X,Y) :- Arc(X,Y) Path(X,Y) :- Path(X,Z) & Path(Z,Y) Path(X,Y) :- Arc(X,Y) Path(X,Y) :- Arc(X,Z) & Path(Z,Y)

  34. Implementing TC on a Cluster • Use k tasks. • Hash function h sends each node of the graph to one of the k tasks. • Task i receives and stores Path(a,b) if either h(a) = i or h(b) = i, or both. • Task i must join Path(a,c) with Path(c,b) if h(c) = i.

  35. TC on a Cluster – Basis • Data is stored as relation Arc(a,b). • Map tasks read chunks of the Arc relation and send each tuple Arc(a,b) to recursive tasks h(a) and h(b). • Treated as if it were tuple Path(a,b). • If h(a) = h(b), only one task receives.

  36. Send Path(a,c) to tasks h(a) and h(c); send Path(d,b) to tasks h(d) and h(b) Path(a,b) received Task i Store Path(a,b) if new. Otherwise, ignore. Look up Path(b,c) and/or Path(d,a) for any c and d TC on a Cluster – Recursive Tasks

  37. Big Problem: Managing Failure • Map-reduce depends on the blocking property. • Only then can you restart a failed task without restarting the whole job. • But any recursive task has to deliver some output and later get more input.

  38. HaLoop (U. Washington) • Iterates Hadoop, once for each round of the recursion. • Like iterative PageRank. • Similar idea: Twister (U. Indiana). • Clever piece is the way HaLoop tries to run each task in round i at a compute node where it can find its needed output from round i – 1.

  39. Pregel (Google) • Views all computation as a recursion on some graph. • Nodes send messages to one another. • Messages bunched into supersteps. • Checkpoint all compute nodes after some fixed number of supersteps. • On failure, rolls all tasks back to previous checkpoint.

  40. Is this the shortest path from M I know about? I found a path from node M to you of length L I found a path from node M to you of length L+5 I found a path from node M to you of length L+6 I found a path from node M to you of length L+3 Example: Shortest Paths Via Pregel Node N table of shortest paths to N 5 6 3

  41. Using Idempotence • Some recursive applications allow restart of tasks even if they have produced some output. • Example: TC is idempotent; you can send a task a duplicate Path fact without altering the result. • But if you were counting paths, the answer would be wrong.

  42. Big Problem: The Endgame • Some recursions, like TC, take a large number of rounds, but the number of new discoveries in later rounds drops. • T. Vassilakis (Google): searches forward on the Web graph can take hundreds of rounds. • Problem: in a cluster, transmitting small files carries much overhead.

  43. Approach: Merge Tasks • Decide when to migrate tasks to fewer compute nodes. • Data for several tasks at the same node are combined into a single file and distributed at the receiving end. • Downside: old tasks have a lot of state to move. • Example: “paths seen so far.”

  44. Approach: Modify Algorithms • Nonlinear recursions can terminate in many fewer steps than equivalent linear recursions. • Example: TC. • O(n) rounds on n-node graph for linear. • O(log n) rounds for nonlinear.

  45. Advantage of Linear TC • The data-volume cost (= sum of input sizes of all tasks) for executing linear TC is generally lower than that for nonlinear TC. • Why? Each path is discovered only once. • Note: distinct paths between the same endpoints may each be discovered.

  46. Example: Linear TC Arc + Path = Path

  47. Nonlinear TC Constructs Path + Path = Path in Many Ways

  48. Smart TC • (Valduriez-Boral, Ioannides) Construct a path from two paths: • The first has a length that is a power of 2. • The second is no longer than the first.

  49. Example: Smart TC

  50. Other Nonlinear TC Algorithms • You can have the unique-decomposition property with many variants of nonlinear TC. • Example: Balance constructs paths from two equal-length paths. • Favor first path when length is odd.

More Related