1 / 36

MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka

This Edureka MapReduce Tutorial (MapReduce Tutorial blog: https://goo.gl/W0Rmtd) will help you understand the basic concepts of Hadoop's processing component - MapReduce. Below are the topics covered in this MapReduce Tutorial:<br><br>1) What is Hadoop MapReduce?<br>2) MapReduce In Nutshell<br>3) Advantages of MapReduce<br>4) Hadoop MapReduce Approach with an Example<br>5) Hadoop MapReduce/YARN Components<br>6) YARN With MapReduce<br>7) Yarn Application Workflow<br>8) Running a MapReduce Program<br><br>Check our complete Hadoop playlist here: https://goo.gl/ExJdZs

EdurekaIN
Download Presentation

MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  2. Agenda for today’s Session 1. 2. 3. 4. 5. 6. 7. 8. What is Hadoop MapReduce? MapReduce In Nutshell Advantages of MapReduce Hadoop MapReduce Approach with an Example Hadoop MapReduce/YARN Components YARN With MapReduce Yarn Application Workflow MapReduce Program with Hands On EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  3. Hadoop Components 2 main Hadoop Components Storage Processing EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  4. MapReduce: Data Processing Using Programming  Hadoop MapReduce is the processing component of Apache Hadoop  It processes data parallelly in distributed environment Big Data Result EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  5. MapReduce In Nutshell Index and Search Map Classification Reduce Function Recommendation Used in Google HDFS Implemented Analytics MapReduce Pig Apache Hadoop Design Pattern For Hive A Program Model Summarization Eg: Inverted Index Features Large Scale Distributed Model Classification Eg: Top N records HBase Parallel Programming Recommendation Eg: Sort Analytics Eg: Join, Selection EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  6. 2 Biggest Advantages of MapReduce EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  7. Advantage 1: Parallel Processing Slave A Data  Slave B Slave E  Data is processed in parallel  Processing becomes fast Master Slave C Slave D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  8. Advantage 2: Data Locality - Processing to Storage Slave A Data  Slave B Slave E  Moving Data to processing is very costly  In MapReduce, we move processing to Data Master Slave C Slave D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  9. Traditional vs MapReduce Way EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  10. Election Votes Counting Booth A Data  Booth B Booth E Election Votes Casting  Votes is stored at different Booths  Result Centre has the details of all the Booths Result Centre Booth C Booth D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  11. Election Votes Counting – Traditional Way Booth A Data  Booth B Booth E Counting – Traditional Approach  Votes are moved to Result Centre for counting  Moving all the votes to Centre is costly Result Centre  Result Centre is over-burdened  Counting takes time Booth C Booth D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  12. Hadoop MapReduce To the Rescue! Booth A Data  Hadoop MapReduce Doesn’t Follow This Approach Booth B Booth E Result Centre Booth C Booth D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  13. Election Votes Counting – MapReduce Way Booth A Votes Booth B Booth E Counting – MapReduce Approach  Votes are counted at individual booths  Booth-wise results are sent back to the result centre Result Centre  Final Result is declared easily and quickly using this way Booth C Booth D EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  14. MapReduce In Detail EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  15. MapReduce Way EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  16. Anatomy of a MapReduce Program Map: Key Value (K1, V1) List (K2, V2) Reduce: MapReduce (K2, list (V2)) List (K3, V3) EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  17. Let us take an example to understand MapReduce Way EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  18. MapReduce Way – Word Count Process EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  19. Executing a MapReduce Program EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  20. MapReduce Using Yarn EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  21. YARN – Moving beyond MapReduce OTHER (Search) (Weave..) BATCH (MapReduce) INTERACTIVE (Text) ONLINE (HBase) STREAMING (Storm,S4, …) IN-MEMORY (Spark) HPC MPI (OpenMPI) GRAPH (Giraph) EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  22. Hadoop 2.x Daemons EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  23. Hadoop 2.x MapReduce Yarn Components  Job History Server » Maintains information about submitted MapReduce jobs after their ApplicationMaster terminates  Client » Submits a MapReduce Job  ApplicationMaster  Resource Manager » » » » One per application Short life Coordinates and Manages MapReduce Jobs Negotiates with Resource Manager to schedule tasks The tasks are started by NodeManager(s) » » Cluster Level resource manager Long Life, High Quality Hardware  Node Manager » » » One per Data Node Monitors resources on Data Node  Container » » Created by NM when requested Allocates certain amount of resources (memory, CPU etc.) on a slave node EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  24. YARN Application Workflow in MapReduce EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  25. YARN Workflow Scheduler Resource Manager Applications Manager (AsM) Node Manager Node Manager Node Manager Node Manager Container 2.2 Container 1.2 Node Manager Container 1.1 Node Manager Node Manager App Master 2 Node Manager Container 2.1 Node Manager Node Manager Node Manager Node Manager App Container 2.3 Master 1 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  26. Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 1 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  27. Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  28. Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  29. Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  30. Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 5. AM notifies NM to launch containers 5 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  31. Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 5. AM notifies NM to launch containers 5 6. Application code is executed in container 6 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  32. Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 5. AM notifies NM to launch containers 5 6. Application code is executed in container 6 7 7. Client contacts RM/AM to monitor application’s status EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  33. Application Workflow Execution Sequence : Client RM NM AM 1. Client submits an application 2. RM allocates a container to start AM 1 2 3. AM registers with RM 3 4. AM asks containers from RM 4 5. AM notifies NM to launch containers 5 6. Application code is executed in container 6 7 7. Client contacts RM/AM to monitor application’s status 8. AM unregisters with RM 8 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  34. Learning Resources  Hadoop Tutorial: www.edureka.co/blog/hadoop-tutorial  MapReduce Tutorial: www.edureka.co/blog/mapreduce-tutorial  MapReduce Interview Questions: www.edureka.co/blog/interview-questions/hadoop-interview-questions-mapreduce EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  35. Thank You … Questions/Queries/Feedback EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

More Related