Map Reduce

MapReduce Dustin Beaupre Thuy Nguyen Relation to our course: Chapter 8 : Physical Data Model - 8.3.2 Hash Tables & Files - 8.6.3 : Parallel Processing Sources: 1. wikipedia entry (en.wikipedia.org/wiki/MapReduce) 2. Apache MapReduce Tutorial (hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html)

Why is MapReduce useful? • Computes large amounts of data using parallel processing. • Divides the workload across a large number of machines. • If an update in the data is required, you have to re-map. • Useful for data mining. • Has fault tolerance, meaning that if one machine stops working, it will reassign the task to another

What does map do? Distributes the workload to multiple machines. map() performs filtering and sorting. What does reduce do? Combines the output from the mapping into a single output reduce() performs summary operation.

Logical View • (key, value) pair • Map(): take one pair of data in one domain and return a list of pairs in a different domain • Map(k1, v1) -> list (k2, v2) • Reduce(): apply in parallel in each group to produce a collection of value in the same domain • Reduce(k2, list(v2)) -> list (v3) • Transform a list of (key, value) pair into a single list of values

Execution Trace for Wordcount Mapper A Reducer Mapper B

SQL SELECT eyeColor, COUNT(*) FROM worldPopulation GROUP BY eyeColor • Suppose everyone was in this database all ~7,222,157,690 people • Sequential response time is too large! • Map Reduce may help!

Execution Trace for EyeColorCount Mapper A Reducer Mapper B

MapReduce steps • Prepare the Map() input • Run the user-provided Map() code • “Shuttle” the Map output to the Reduce processors • Run the user-provided Reduce() code • Produce the final output

Overall, the goal of MapReduce is to provide correct output of large data sets in the smallest amount of time. Any Questions?

Map Reduce

Map Reduce

Presentation Transcript

Applications of Map-Reduce

Map Reduce Architecture

Map Reduce

Intro to Map-Reduce

Map-reduce programming paradigm

Google’s Map Reduce

Map Reduce - an overview

Map Reduce

Map Reduce

Map/Reduce Programming Model

Map/Reduce

Map Reduce - an overview

Map Reduce Architecture

Map-Reduce

MAP REDUCE PROGRAMMING

Map Reduce Programming

Map Reduce Programming

Hadoop & Map Reduce

Google’s Map Reduce

Generalizing Map-Reduce

Map Reduce

Map Reduce

Map Reduce

Map Reduce

Presentation Transcript

Applications of Map-Reduce

Map Reduce Architecture

Map Reduce

Intro to Map-Reduce

Map-reduce programming paradigm

Google’s Map Reduce

Map Reduce - an overview

Map Reduce

Map Reduce

Map/Reduce Programming Model

Map/Reduce

Map Reduce - an overview

Map Reduce Architecture

Map-Reduce

MAP REDUCE PROGRAMMING

Map Reduce Programming

Map Reduce Programming

Hadoop &amp; Map Reduce

Google’s Map Reduce

Generalizing Map-Reduce

Map Reduce

Map Reduce

Hadoop & Map Reduce