1 / 29

What Is Hadoop | Hadoop Tutorial For Beginners | Edureka

( Hadoop Training: https://www.edureka.co/hadoop ) <br>This Edureka "What is Hadoop" tutorial ( Hadoop Blog series: https://goo.gl/LFesy8 ) helps you to understand how Big Data emerged as a problem and how Hadoop solved that problem. This tutorial will be discussing about Hadoop Architecture, HDFS & it's architecture, YARN and MapReduce in detail. Below are the topics covered in this tutorial: <br><br>1) 5 Vu2019s of Big Data <br>2) Problems with Big Data <br>3) Hadoop-as-a solution <br>4) What is Hadoop? <br>5) HDFS <br>6) YARN <br>7) MapReduce <br>8) Hadoop Ecosystem

EdurekaIN
Download Presentation

What Is Hadoop | Hadoop Tutorial For Beginners | Edureka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda 1. 5 V’s of Big Data 2. Problems with Big Data 3. Hadoop-as-a solution 4. What is Hadoop? 5. HDFS 6. YARN 7. MapReduce 8. Hadoop Ecosystem EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  2. 5V’s of Big Data EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  3. 5 V’s of Big Data Volume Data is being generated at an accelerating speed Value Mechanism to bring the correct meaning out of the data Value? Variety Different kinds of data is being generated from various sources Veracity Uncertainty and inconsistencies in the data Velocity Data is being generated at an alarming rate EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  4. Problems with Big Data Processing EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  5. Problems with Big Data Highly Scalable Processing data having complex structure (structured, un-structured, semi- structured) Storing huge and exponentially growing datasets Bringing huge amount of data to computation unit becomes a bottleneck EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  6. So for Big Data problem statement, Hadoop emerged as a solution…. What is Hadoop? EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  7. Hadoop Hadoop is a framework that allows us to store and process large data sets in parallel and distributed fashion Allows to dump any kind of data HDFS (Storage) across the cluster Allows parallel processing of the MapReduce (Processing) data stored in HDFS EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  8. Hadoop Storing exponentially growing huge datasets Storing unstructured data Processing data faster Allows to store any kind of data, be it structured, semi- structured or unstructured Provides parallel processing of data present in HDFS HDFS, storage unit of Hadoop is a Distributed File System Allows to process data locally i.e. each node works with a part of data which is stored on it 1 2 3 Write Read 1 hr. HDFS EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  9. Hadoop Distributed File System (HDFS) EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  10. HDFS Master Node HDFS ▪ Storage unit of Hadoop NameNode ▪ Distributed File System ▪ Divide files (input data) into smaller chunks and stores it across the cluster ▪ Horizontal Scaling as per requirement Slave Node ▪ Stores any kind of data ▪ No schema validation is done while dumping data DataNode DataNode DataNode EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  11. HDFS Block • HDFS stores the data in form of blocks • Block size can be configured base on requirements 128 MB 128 MB file.xml 128 MB 128 MB moving to HDFS HDFS Cluster HDFS Blocks Note: The default Block Size is 128 MB EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  12. NameNode NameNode • Master daemon • Maintains and Manages DataNodes • Records metadata e.g. location of blocks stored, the size of the files, permissions, hierarchy, etc. • Receives heartbeat and block report from all the DataNodes Secondary NameNode NameNode DataNode DataNode DataNode EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  13. DataNode Secondary NameNode NameNode DataNode ▪ Slave daemons DataNode DataNode DataNode ▪ Stores actual data ▪ Serves read and write requests EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  14. Secondary NameNode Secondary NameNode • Checkpointing is a process of combining edit logs with FsImage • Allows faster Failover as we have a back up of the metadata • Checkpointing happens periodically (default: 1 hour) Secondary NameNode NameNode DataNode DataNode DataNode EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  15. Hadoop Distributed File System Secondary NameNode NameNode Secondary NameNode NameNode editLog editLog First time copy fsImage fsImage DataNode DataNode DataNode editLog (new) FsImage (final) Temporary During checkpoint EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  16. YARN (Yet Another Resource Negotiator) EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  17. YARN ResourceManager • Receives the processing requests • Passes the parts of requests to corresponding NodeManagers Resource Manager NodeManagers • Installed on every DataNode • Responsible for execution of task on every single DataNode Node Manager Node Manager Node Manager EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  18. YARN Architecture ResourceManager has two components: Schedulers & ApplicationsManager NodeManager has two components: ApplicationMaster & Container Resource Manager App Manager Node Status Resource Request Client MapReduce Status Node Manager Node Manager Node Manager App Master App Master App Master container container container EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  19. YARN Architecture ApplicationsManager • ApplicationsManager accepts the job submission • Negotiates to containers for executing the application specific ApplicationMaster and monitoring the progress Resource Manager App Manager Node Status Resource Request Node Manager ApplicationsMaster ApplicationMasters are the deamons which reside on DataNode Communicates to containers for execution of tasks on each DataNode • App Master container • EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  20. Hadoop Architecture Bigger Picture EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  21. Hadoop Architecture HDFS YARN JobHistory Secondary NameNode NameNode ResourceManager Server NodeManger NodeManager DataNode DataNode App Master App Master container container NodeManger NodeManager DataNode DataNode App Master App Master container container EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  22. MapReduce EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  23. MapReduce MapReduce is a software framework which helps in writing applications that processes large data sets using distributed and parallel algorithms inside Hadoop environment. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  24. MapReduce Job Workflow MAPPING INPUT REDUCING FINAL RESULT SPLITTING SHUFFLING IND, (1,1,1) IND, 3 IND, 1 ENG, 1 IND, ENG, AUS, NZ AUS, 1 NZ, 1 ENG, (1,1) ENG, 2 IND, 3 ENG, 2 IND ENG AUS NZ NZ, 1 ENG, 1 AUS, 3 NZ ENG AUS IND NZ, ENG, AUS, IND AUS, (1,1,1) AUS, 3 AUS, 1 IND, 1 NZ, 3 AUS IND SL NZ SL, 1 NZ, (1,1,1) NZ, 3 AUS, 1 IND, 1 AUS, IND, SL, NZ SL, 1 NZ, 1 SL, (1) SL, 1 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  25. Hadoop Ecosystem EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  26. Hadoop Ecosystem EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  27. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

More Related