mapreduce in action n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
MapReduce in Action PowerPoint Presentation
Download Presentation
MapReduce in Action

Loading in 2 Seconds...

play fullscreen
1 / 27

MapReduce in Action - PowerPoint PPT Presentation


  • 176 Views
  • Uploaded on

MapReduce in Action. 数据挖掘研究组 Data Mining Group @ Xiamen University. College of Information Science and Technology. Team 306 Led by Chen Lin. Contents. 1. Basic MapReduce Programs. 2. Advanced MapReduce. 3. Beyond the horizon. 4. discussion. Job Configuration. Master

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'MapReduce in Action' - gigi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mapreduce in action

MapReduce in Action

数据挖掘研究组

Data Mining Group @ Xiamen University

College of Information Science and Technology

Team 306

Led by

Chen Lin

slide2

Contents

1. Basic MapReduce Programs

2. Advanced MapReduce

3. Beyond the horizon

4. discussion

basic mapreduce programs

Job

Configuration

Master

Jobtracker

Master

Jobtracker

Job

Basic MapReduce Programs
slide4

Implement

Interface

Java Class

Environment

Configuration

Basic MapReduce Programs

Job Configuration?

slide5

Combiner

InputFormat

OutputFormat

Mapper

Reducer

Partitioner

Interface

slide6

jvm:

Mapred.child.java.opts

{mapred.local.dir}

InputPath

OutputPath

How many

Map/Reduce

Tasks?

Configure

basic mapreduce program

InputFormat

Map

OutputFormat

Reduce

Basic MapReduceProgram

<K1,V2>

Inputsplit

K1,List<V1>

List<K1,V1>

Text

slide9

PARTITIONERS AND COMBINERS

  • Combiners

an optimization in MapReduce that allow for local aggregation before the shue and sort phase

  • Partitioner

determines which reducer will be responsible for processing a particular key, and the execution framework uses this information to copy the data to the right location during the shue and sort phase

slide10

Basic MapReduce Program

InputFormat

CREATING

CUSTOM INPUTFORMAT

KeyValue

Text

Text

Input

Format

Sequence File

NLine

slide11

InputFormat

  • TextInputFormat

- Each line in the text fi les is a record. Key is the byte

offset of the line, and value is the content of the line.

  • KeyValueTextInputFormat

- Each line in the text fi les is a record. The fi rst separator

character divides each line. Everything before the

separator is the key, and everything after is the value.

The separator is set by the key.value.separator.in.input.line property, and

the default is the tab (\t) character.

  • NLineInputFormat

- Same as TextInputFormat, but each split is guaranteed

to have exactly N lines. The mapred.line.input.format.

Lines/map property, which defaults to one, sets N.

slide12

Basic MapReduce Program

types for the key/value pairs

4

slide13

code for

mapper, reducer,

combiner, partitioner,

along with

job conguration parameters

The execution framework

handles

everything else

Summary for basic Program

What’s a complete MapReducejob ??

slide14

Advanced MapReduce

Chaining MapReducejobs

LOCAL AGGREGATION

SECONDARY SORTING

Work on Hadoop Files

slide15

Chaining MapReduce jobs

  • You’ve been doing data processing tasks which a single MapReduce job can accomplish.
  • But……
  • As you get more comfortable writing MapReduce programs and take on more ambitious data processing tasks
  • you’ll find many complex tasks need to be broken down into simpler subtasks, each accomplished by an individual MapReduce job
slide16

LOCAL AGGREGATION

  • in Hadoop, intermediate results are written to local disk before being sent over the network.
  • Reductions in the amount of intermediate data translate should increase in algorithmic efficiency
  • use of the combiner is possible to substantially reduce both the number and size of key-value pairs that need to be shuffled from the mappers to the reducers
slide19

LOCAL AGGREGATION

  • 1. combiners must have the same input and output key-value type
  • 2. Combiners are optimizations that cannot change the correctness of the algorithm

Hadoopmakes no guarantees on how many times combiners are called; it could be zero, one, or multiple times

slide21

SECONDARY SORTING

  • we also need to sort by value sometimes
  • (k1;m1; v8)
  • (k1;m2; v1)
  • (k1;m3; v7)
  • :::
  • (k2;m1; v2)
  • (k2;m2; v6)
  • (k2;m3; v9)
  • k1 (m1; k8)
  • (k1; m1) (k8)
slide22

Beyond the horizon

  • It’s a shame
  • The rest I will talk about Plays an important role in MapReduce, but, they are beyond my horizon.
  • So, need all your help, to master them together….
slide23

Beyond the horizon

Creat user custom Inputformat

Creat user custom

Partitioner

Manipulate local file

Streaming other language

Pipes for C++

slide24

Beyond the horizon

Joining data from different sources

Hive

Pig

Multiple

File

output

HBase

slide25

Joining data from different sources

Customers file

CSV format

record fields: (Customer ID,

Name, and Phone Number)

Orders files CSV format

fields: (Customer ID, Order ID, Price, and Purchase Date)

slide26

Joey Leung,555-555-55

Edward,123-456-7890

Jose Madriz,281-330-8004

David Stork,408-555-0000

…....

Joey Leung,555-555-5555,B,88.25,20-May-2008

Edward,123-456-7890,C,32.00,30-Nov-2007

Jose Madriz,281-330-8004,A,12.95,02-Jun-2008

Jose Madriz,281-330-8004,D,25.02,22-Jan-2009

A,12.95,02-Jun-2008

B,88.25,20-may-2008

C,32.00,30-Nov-2007

D,25.02,22-Jan-2009

Joining data from different sources

slide27

数据挖掘研究组

Data Mining Group @ Xiamen University

Thank you!