mapreduce programming oct 25 2011 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
MapReduce Programming Oct 25, 2011 PowerPoint Presentation
Download Presentation
MapReduce Programming Oct 25, 2011

Loading in 2 Seconds...

play fullscreen
1 / 41

MapReduce Programming Oct 25, 2011 - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

MapReduce Programming Oct 25, 2011. 15-440. Topics Large-scale computing Traditional high-performance computing (HPC) Cluster computing MapReduce Definition Examples Implementation Properties. Typical HPC Machine. Compute Nodes High end processor(s) Lots of RAM Network Specialized

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'MapReduce Programming Oct 25, 2011' - pillan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mapreduce programming oct 25 2011
MapReduce ProgrammingOct 25, 2011

15-440

  • Topics
    • Large-scale computing
      • Traditional high-performance computing (HPC)
      • Cluster computing
    • MapReduce
      • Definition
      • Examples
    • Implementation
    • Properties
typical hpc machine
Typical HPC Machine
  • Compute Nodes
    • High end processor(s)
    • Lots of RAM
  • Network
    • Specialized
    • Very high performance
  • Storage Server
    • RAID-based disk array

Compute Nodes

• • •

CPU

CPU

CPU

Mem

Mem

Mem

Network

Storage Server

• • •

hpc machine example
HPC Machine Example
  • Jaguar Supercomputer
    • 3rd fastest in world
  • Compute Nodes
    • 18,688 nodes in largest partition
    • 2X 2.6Ghz 6-core AMD Opteron
    • 16GB memory
    • Total: 2.3 petaflop / 300 TB memory
  • Network
    • 3D torus
      • Each node connected to 6 neighbors via 6.0 GB/s links
  • Storage Server
    • 10PB RAID-based disk array
hpc programming model
HPC Programming Model
  • Programs described at very low level
    • Specify detailed control of processing & communications
  • Rely on small number of software packages
    • Written by specialists
    • Limits classes of problems & solution methods

Application

Programs

Software

Packages

Machine-Dependent

Programming Model

Hardware

bulk synchronous programming
Bulk Synchronous Programming
  • Solving Problem Over Grid
    • E.g., finite-element computation
  • Partition into Regions
    • p regions for p processors
  • Map Region per Processor
    • Local computation sequential
    • Periodically communicate boundary values with neighbors
typical hpc operation

P1

P2

P3

P4

P5

Message Passing

Typical HPC Operation
  • Characteristics
    • Long-lived processes
    • Make use of spatial locality
    • Hold all program data in memory (no disk access)
    • High bandwidth communication
  • Strengths
    • High utilization of resources
    • Effective for many scientific applications
  • Weaknesses
    • Requires careful tuning of application to resources
    • Intolerant of any variability
hpc fault tolerance
HPC Fault Tolerance

P1

P2

P3

P4

P5

Checkpoint

  • Checkpoint
    • Periodically store state of all processes
    • Significant I/O traffic
  • Restore
    • When failure occurs
    • Reset state to that of last checkpoint
    • All intervening computation wasted
  • Performance Scaling
    • Very sensitive to number of failing components

Wasted

Computation

Restore

Checkpoint

google data centers
Google Data Centers
  • Dalles, Oregon
    • Hydroelectric power @ 2¢ / KW Hr
    • 50 Megawatts
      • Enough to power 60,000 homes
  • Engineered for maximum modularity & power efficiency
  • Container: 1160 servers, 250KW
  • Server: 2 disks, 2 processors
typical cluster machine
Typical Cluster Machine
  • Compute + Storage Nodes
    • Medium-performance processors
    • Modest memory
    • 1-2 disks
  • Network
    • Conventional Ethernet switches
      • 10 Gb/s within rack
      • 100 Gb/s across racks

Compute + Storage Nodes

CPU

CPU

CPU

• • •

Mem

Mem

Mem

Network

machines with disks
Machines with Disks
  • Lots of storage for cheap
    • Seagate Barracuda
    • 2 TB @ $99

5¢ / GB

(vs. 40¢ in 2007)

  • Drawbacks
    • Long and highly variable delays
    • Not very reliable
  • Not included in HPC Nodes
oceans of data skinny pipes
Oceans of Data, Skinny Pipes
  • 1 Terabyte
    • Easy to store
    • Hard to move
ideal cluster programming model
Ideal Cluster Programming Model
  • Application programs written in terms of high-level operations on data
  • Runtime system controls scheduling, load balancing, …

Application

Programs

Machine-Independent

Programming Model

Runtime

System

Hardware

map reduce programming model

k1

k1

kr

  

Reduce

Key-Value

Pairs

M

M

M

M

  

Map

xn

x1

x2

x3

Map/Reduce Programming Model
  • Map computation across many objects
    • E.g., 1010 Internet web pages
  • Aggregate results in many different ways
  • System deals with issues of resource allocation & reliability

Dean & Ghemawat: “MapReduce: Simplified Data Processing on Large Clusters”, OSDI 2004

map reduce example

1

3

6

1

3

and

come

see

dick

spot

Sum

see, 1

dick, 1

come, 1

Word-Count

Pairs

come, 1

spot, 1

come, 1

and, 1

see, 1

come, 1

come, 2

and, 1

and, 1

M

M

M

M

M

Extract

Map/Reduce Example
  • Create an word index of set of documents
  • Map: generate word, count pairs for all words in document
  • Reduce: sum word counts across documents

Come and see Spot.

Come,

Dick

Come, come.

Come and see.

Come and see.

getting started
Getting Started
  • Goal
    • Provide access to MapReduce framework
  • Software
    • Hadoop Project
      • Open source project providing file system and Map/Reduce
      • Supported and used by Yahoo
      • Rapidly expanding user/developer base
      • Prototype on single machine, map onto cluster
hadoop api
Hadoop API
  • Requirements
    • Programmer must supply Mapper & Reducer classes
  • Mapper
    • Steps through file one line at a time
    • Code generates sequence of <key, value>
      • Call output.collect(key, value)
    • Default types for keys & values are strings
      • Lots of low-level machinery to convert to & from other data types
      • But can use anything “writable”
  • Reducer
    • Given key + iterator that generates sequence of values
    • Generate one or more <key, value> pairs
      • Call output.collect(key, value)
hadoop word count mapper
Hadoop Word Count Mapper

public class WordCountMapper extends MapReduceBase

implements Mapper {

private final static Text word = new Text();

private final static IntWritable count = new IntWritable(1);

public void map(WritableComparable key, Writable values,

OutputCollector output, Reporter reporter)

throws IOException {

/* Get line from file */

String line = values.toString();

/* Split into tokens */

StringTokenizeritr = new StringTokenizer(line.toLowerCase(),

" \t.!?:()[],'&-;|0123456789");

while(itr.hasMoreTokens()) {

word.set(itr.nextToken());

/* Emit <token,1> as key + value

output.collect(word, count);

}

}

}

hadoop word count reducer
Hadoop Word Count Reducer

public class WordCountReducer extends MapReduceBase

implements Reducer {

public void reduce(WritableComparable key, Iterator values,

OutputCollector output, Reporter reporter)

throws IOException {

intcnt = 0;

while(values.hasNext()) {

IntWritableival = (IntWritable) values.next();

cnt += ival.get();

}

output.collect(key, new IntWritable(cnt));

}

}

map reduce operation

Map/Reduce

Map

Map

Map

Map

Reduce

Reduce

Reduce

Reduce

Map/Reduce Operation
  • Characteristics
    • Computation broken into many, short-lived tasks
      • Mapping, reducing
    • Use disk storage to hold intermediate results
  • Strengths
    • Great flexibility in placement, scheduling, and load balancing
    • Can access large data sets
  • Weaknesses
    • Higher overhead
    • Lower raw performance
map reduce fault tolerance
Map/Reduce Fault Tolerance

Map/Reduce

  • Data Integrity
    • Store multiple copies of each file
    • Including intermediate results of each Map / Reduce
      • Continuous checkpointing
  • Recovering from Failure
    • Simply recompute lost result
      • Localized effect
    • Dynamic scheduler keeps all processors busy

Map

Reduce

Map

Reduce

Map

Reduce

Map

Reduce

cluster scalability advantages
Cluster Scalability Advantages
    • Distributed system design principles lead to scalable design
    • Dynamically scheduled tasks with state held in replicated files
  • Provisioning Advantages
    • Can use consumer-grade components
      • maximizes cost-peformance
    • Can have heterogenous nodes
      • More efficient technology refresh
  • Operational Advantages
    • Minimal staffing
    • No downtime
exploring parallel computation models
Exploring Parallel Computation Models

Map/Reduce

  • Map/Reduce Provides Coarse-Grained Parallelism
    • Computation done by independent processes
    • File-based communication
  • Observations
    • Relatively “natural” programming model
    • Research issue to explore full potential and limits

MPI

SETI@home

Threads

PRAM

Low Communication

Coarse-Grained

High Communication

Fine-Grained

example sparse matrices with map reduce

C

-10

-80

-60

-250

A

B

-170

-460

10

20

-1

30

40

-2

-3

50

60

70

-4

Example: Sparse Matrices with Map/Reduce
    • Task: Compute product C = A·B
    • Assume most matrix entries are 0
  • Motivation
    • Core problem in scientific computing
    • Challenging for parallel execution
    • Demonstrate expressiveness of Map/Reduce

X

=

computing sparse matrix product

40

50

-4

30

70

20

10

-1

60

-2

-3

3

2

3

1

1

1

2

3

3

2

2

2

1

1

2

2

3

3

1

3

2

1

A

A

A

A

B

A

B

B

A

B

A

A

B

10

20

-1

30

40

-2

-3

50

60

70

-4

Computing Sparse Matrix Product
  • Represent matrix as list of nonzero entries

row, col, value, matrixID

  • Strategy
    • Phase 1: Compute all products ai,k · bk,j
    • Phase 2: Sum products for each entry i,j
    • Each phase involves a Map/Reduce
phase 1 map of matrix multiply

Key = 1

-1

-2

40

-3

-1

-2

70

-3

50

20

-4

40

-4

60

30

10

20

70

10

30

60

50

3

3

2

2

2

2

3

1

1

1

1

3

3

1

3

2

2

2

3

3

2

1

2

2

1

3

2

3

3

2

3

1

2

3

1

1

1

3

1

2

2

1

2

1

A

B

A

A

A

A

A

B

B

B

A

B

A

A

A

B

B

A

A

B

A

A

Key = 2

Key = 3

Phase 1 Map of Matrix Multiply
  • Group values ai,k and bk,j according to key k

Key = col

Key = row

phase 1 reduce of matrix multiply

Key = 1

-60

-10

-50

-2

70

40

-1

-160

20

-3

60

-90

-280

-80

-4

10

50

-120

30

-180

1

2

1

3

3

2

1

2

3

1

2

3

3

3

1

2

2

3

2

3

2

1

2

2

1

2

2

2

1

3

3

1

1

1

2

1

1

3

2

2

A

C

A

A

A

A

C

C

C

A

A

A

B

C

C

B

C

B

B

C

Key = 2

Key = 3

Phase 1 “Reduce” of Matrix Multiply
  • Generate all products ai,k · bk,j

X

X

X

phase 2 map of matrix multiply

Key = 1,1

Key = 1,2

-90

-280

-280

-160

-10

-50

-80

-60

-50

-60

-180

-10

-120

-120

-80

-90

-180

-160

2

2

3

3

1

1

1

2

3

2

3

3

3

2

3

3

2

1

2

2

2

1

2

1

1

1

1

2

2

2

2

2

2

1

1

1

C

C

C

C

C

A

C

C

C

C

C

C

C

C

C

A

C

C

Key = 2,1

Key = 2,2

Key = 3,1

Key = 3,2

Phase 2 Map of Matrix Multiply
  • Group products ai,k · bk,j with matching values of i and j

Key = row,col

phase 2 reduce of matrix multiply

Key = 1,1

Key = 1,2

-60

-90

-10

-50

-10

-460

-180

-60

-160

-170

-80

-120

-80

-280

-250

3

2

3

2

1

1

2

3

1

3

3

2

3

1

2

1

2

2

2

2

1

2

1

1

1

2

1

1

2

2

C

C

C

C

C

C

C

C

C

C

C

C

C

C

A

Key = 2,1

C

Key = 2,2

-10

-80

-60

-250

-170

-460

Key = 3,1

Key = 3,2

Phase 2 Reduce of Matrix Multiply
  • Sum products to get final entries
matrix multiply phase 1 mapper
Matrix Multiply Phase 1 Mapper

public class P1Mapper extends MapReduceBase implements Mapper {

public void map(WritableComparable key, Writable values,

OutputCollector output, Reporter reporter) throws IOException {

try {

GraphEdge e = new GraphEdge(values.toString());

IntWritable k;

if (e.tag.equals("A"))

k = new IntWritable(e.toNode);

else

k = new IntWritable(e.fromNode);

output.collect(k, new Text(e.toString()));

} catch (BadGraphException e) {}

}

}

matrix multiply phase 1 reducer
Matrix Multiply Phase 1 Reducer

public class P1Reducer extends MapReduceBase implements Reducer {

public void reduce(WritableComparable key, Iterator values,

OutputCollector output, Reporter reporter)

throws IOException

{

Text outv = new Text(""); // Don't really need output values

/* First split edges into A and B categories */

LinkedList<GraphEdge> alist = new LinkedList<GraphEdge>();

LinkedList<GraphEdge> blist = new LinkedList<GraphEdge>();

while(values.hasNext()) {

try {

GraphEdge e =

new GraphEdge(values.next().toString());

if (e.tag.equals("A")) {

alist.add(e);

} else {

blist.add(e);

}

} catch (BadGraphException e) {}

}

// Continued

mm phase 1 reducer cont
MM Phase 1 Reducer (cont.)

// Continuation

Iterator<GraphEdge> aset = alist.iterator();

// For each incoming edge

while(aset.hasNext()) {

GraphEdgeaedge = aset.next();

// For each outgoing edge

Iterator<GraphEdge> bset = blist.iterator();

while (bset.hasNext()) {

GraphEdgebedge = bset.next();

GraphEdgenewe = aedge.contractProd(bedge);

// Null would indicate invalid contraction

if (newe != null) {

Text outk = new Text(newe.toString());

output.collect(outk, outv);

}

}

}

}

}

matrix multiply phase 2 mapper
Matrix Multiply Phase 2 Mapper

public class P2Mapper extends MapReduceBase implements Mapper {

public void map(WritableComparable key, Writable values,

OutputCollector output, Reporter reporter)

throws IOException {

String es = values.toString();

try {

GraphEdge e = new GraphEdge(es);

// Key based on head & tail nodes

String ks = e.fromNode + " " + e.toNode;

output.collect(new Text(ks), new Text(e.toString()));

} catch (BadGraphException e) {}

}

}

matrix multiply phase 2 reducer
Matrix Multiply Phase 2 Reducer

public class P2Reducer extends MapReduceBase implements Reducer {

public void reduce(WritableComparable key, Iterator values,

OutputCollector output, Reporter reporter)

throws IOException

{

GraphEdgeefinal = null;

while (efinal == null && values.hasNext()) {

try {

efinal = new GraphEdge(values.next().toString());

} catch (BadGraphException e) {}

}

if (efinal != null) {

while(values.hasNext()) {

try {

GraphEdgeeother =

new GraphEdge(values.next().toString());

efinal.weight += eother.weight;

} catch (BadGraphException e) {}

}

if (efinal.weight != 0)

output.collect(new Text(efinal.toString()),

new Text(""));

}

}

}

lessons from sparse matrix example
Lessons from Sparse Matrix Example
  • Associative Matching is Powerful Communication Primitive
    • Intermediate step in Map/Reduce
  • Similar Strategy Applies to Other Problems
    • Shortest path in graph
    • Database join
  • Many Performance Considerations
    • Kiefer, Volk, Lehner, TU Dresden
    • Should do systematic comparison to other sparse matrix implementations
mapreduce implementation
MapReduce Implementation
  • Built on Top of Parallel File System
    • Google: GFS, Hadoop: HDFS
    • Provides global naming
    • Reliability via replication (typically 3 copies)
  • Breaks work into tasks
    • Master schedules tasks on workers dynamically
    • Typically #tasks >> #processors
  • Net Effect
    • Input: Set of files in reliable file system
    • Output: Set of files in reliable file system
    • Can write program as series of MapReduce steps
mapping
Mapping
  • Parameters
    • M: Number of mappers
      • Each gets ~1/M of the input data
    • R: Number of reducers
      • Each reducer i gets keys k such that hash(k) = i
  • Tasks
    • Split input files into M pieces, 16—64 MB each
    • Scheduler dynamically assigns worker for each “split”
  • Task operation
    • Parse “split”
    • Generate key, value pairs & write R different local disk files
      • Based on hash of keys
    • Notify master of worker of output file locations
reducing
Reducing
  • Shuffle
    • Each reducer fetches its share of key, value pairs from each mapper using RPC
    • Sort data according to keys
      • Use disk-based (“external”) sort if too much data for memory
  • Reduce Operation
    • Step through key-value pairs in sorted order
    • For each unique key, call reduce function for all values
    • Append result to output file
  • Result
    • R output files
    • Typically supply to next round of MapReduce
example parameters
Example Parameters
  • Sort Benchmark
    • 1010 100-byte records
    • Partition into M = 15,000 64MB pieces
      • Key = value
      • Partition according to most significant bytes
    • Sort locally with R = 4,000 reducers
  • Machine
    • 1800 2Ghz Xeons
    • Each with 2 160GB IDE disks
    • Gigabit ethernet
    • 891 seconds total
interesting features
Interesting Features
  • Fault Tolerance
    • Assume reliable file system
    • Detect failed worker
      • Heartbeat mechanism
    • Rescheduled failed task
  • Stragglers
    • Tasks that take long time to execute
    • Might be bug, flaky hardware, or poor partitioning
    • When done with most tasks, reschedule any remaining executing tasks
      • Keep track of redundant executions
      • Significantly reduces overall run time
generalizing map reduce
Generalizing Map/Reduce
    • Microsoft Dryad Project
  • Computational Model
    • Acyclic graph of operators
      • But expressed as textual program
    • Each takes collection of objects and produces objects
      • Purely functional model
  • Implementation Concepts
    • Objects stored in files or memory
    • Any object may be lost; any operator may fail
    • Replicate & recompute for fault tolerance
    • Dynamic scheduling
      • # Operators >> # Processors

Opk

Opk

Opk

  

Opk

  

Op2

Op2

Op2

  

Op2

Op1

Op1

Op1

Op1

  

x1

x2

x3

xn

conclusions
Conclusions
  • Distributed Systems Concepts Lead to Scalable Machines
    • Loosely coupled execution model
    • Lowers cost of procurement & operation
  • Map/Reduce Gaining Widespread Use
    • Hadoop makes it widely available
    • Great for some applications, good enough for many others
  • Lots of Work to be Done
    • Richer set of programming models and implementations
    • Expanding range of applicability
      • Problems that are data and compute intensive
      • The future of supercomputing?