1 / 16

Lecture #6 MapReduce (II)

Lecture #6 MapReduce (II). CS492 Special Topics in Computer Science: Distributed Algorithms and Systems. MapReduce Assumptions. Hardware Components are reliable Components are homogeneous Software It’s correct Network Latency is zero Bandwidth is infinite It’s secure Overall system

aelan
Download Presentation

Lecture #6 MapReduce (II)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture #6MapReduce (II) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems

  2. MapReduce Assumptions • Hardware • Components are reliable • Components are homogeneous • Software • It’s correct • Network • Latency is zero • Bandwidth is infinite • It’s secure • Overall system • Configuration is stable • There is one administrator

  3. MapReduce Execution Overview

  4. Question of the Day What goes on underneath?

  5. Step #1 Splits the input files into M pieces (64MB) Starts up many copies of the program

  6. Step #2 One special copy (the master) of the porgram assigns work to the rest of copies (workers) M map tasks and R reduce tasks

  7. Step #3 A worker with a map task conducts the Map function. Output buffered in memory

  8. Step #4 Periodically, the buffered output is written to local disk, partitioned into R regions by the partitioning function => info passed onto the master

  9. Step #5 When a reduce worker is notified by the master about the locations, it uses RPC to read the buffered data from the local disks of the map worker. The reduce worker sorts the intermediate keys

  10. Step #6 It goes thru the unique keys and perform Reduce

  11. Step #7 When all Map and Reduce tasks are complete, the master wakes up the user program

  12. Fault Tolerance • Master detects worker failures • How? • What if a Map worker dies? • What if a Reduce worker died after completing the task?

  13. Locality • Task assignment by the Master • Mapping between the input file and workers?

  14. Backup How to deal with stragglers?

  15. Refinements Partitioning functions Ordering guarantees Combiner function Input and output types Side-effects Skipping bad records Local execution Status information Counters

  16. Reading for next class “Lessons from Giant-Scale Services” by Eric Brewer, IEEE Internet Computing, July-August 2001 “The Google File System” by Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, SOSP 2003, NY Short quiz on “Lessons ...”

More Related