CPS216: Advanced Database Systems (Data-intensive Computing Systems)
Download
1 / 15

Shivnath Babu - PowerPoint PPT Presentation


  • 154 Views
  • Uploaded on

CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop. Shivnath Babu. Word Count over a Given Set of Web Pages. see 1 bob 1 throw 1 see 1 spot 1 run 1. bob 1 run 1 see 2 spot 1 throw 1. see bob throw.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Shivnath Babu' - ofira


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CPS216: Advanced Database Systems (Data-intensive Computing Systems)Introduction to MapReduce and Hadoop

Shivnath Babu


Word count over a given set of web pages
Word Count over a Given Set of Web Pages Systems)

  • see 1

  • bob 1

  • throw 1

  • see 1

  • spot 1

  • run 1

  • bob 1

  • run 1

  • see 2

  • spot 1

  • throw 1

  • see bob throw

  • see spot run

Can we do word count in parallel?



Automatic parallel execution in mapreduce google
Automatic Parallel Execution in MapReduce (Google) Systems)

Handles failures automatically, e.g., restarts tasks if a node fails; runs multiples copies of the same task to avoid a slow task slowing down the whole job





Data flow in a mapreduce program in hadoop
Data Flow in a MapReduce Program in Hadoop Systems)

 1:many

  • InputFormat

  • Map function

  • Partitioner

  • Sorting & Merging

  • Combiner

  • Shuffling

  • Merging

  • Reduce function

  • OutputFormat


Lifecycle of a mapreduce job

Map function Systems)

Reduce function

Run this program as a

MapReduce job

Lifecycle of a MapReduce Job


Lifecycle of a mapreduce job1

Map function Systems)

Reduce function

Run this program as a

MapReduce job

Lifecycle of a MapReduce Job


Lifecycle of a MapReduce Job Systems)

Time

Reduce

Wave 1

Input

Splits

Reduce

Wave 2

Map

Wave 1

Map

Wave 2

How are the number of splits, number of map and reduce

tasks, memory allocation to tasks, etc., determined?


Job configuration parameters

190+ parameters in Hadoop Systems)

Set manually or defaults are used

Job Configuration Parameters




ad