140 likes | 271 Views
This week we explore Hadoop's architecture, focusing on the Hadoop Distributed File System (HDFS). HDFS operates on a master-slave model with a single namenode for metadata management and multiple datanodes for data storage. Files are split into 64MB blocks, with data replicated across nodes for fault tolerance. We also examine the MapReduce framework—how jobs are handled by a JobTracker and executed by TaskTrackers. This session covers essential concepts like data flow, mapper/reducer roles, and shuffling/sorting phases vital for mastering Big Data analytics.
E N D
HDFS: Hadoop Distributed File System • HDFS is a master-slave architecture • Master: namenode • Slave: datanode (100s or 1000s of nodes) • Single namenode and many datanodes • Namenode maintains the file system metadata • Files are split into fixed sized blocks and stored on data nodes (Default 64MB) • Data blocks are replicated for fault tolerance and fast access (Default is 3)
HDFS Architecture • Default placement policy: where to put a given block? • Frist copy is written to the node creating the file (write affinity) • Second copy is written to a datanode within the same rack • Third copy is written to a datanode in a different rack • Objectives: load balancing, fast access, fault tolerance
MapReduce: Hadoop Execution Layer • JobTracker knows everything about submitted jobs • Divides jobs into tasks and decides where to run each task • Continuously communicating with TaskTrackers • TaskTrackers execute task (multiple per node) • Monitors the execution of each task • Continuously sending feedback to JobTracker • MapReduce is a master-slave architecture • Master: JobTracker • Slave: TaskTrackers (100s or 1000s of tasktrackers) • Every datanode is running a TaskTracker
Hadoop Computing Model • Mapper and Reducers consume and produce (key, value) pairs • Users define the data type of the Key and Value • Shuffling and Sorting phase • Map output is shuffled such that all same-key records go the same reducer • Each reducer may receive multiple key sets • Each reducer sorts its records to group similar keys, then process each group
Online Documentation Windows: http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html Mac: http://hadoop.apache.org/docs/stable/single_node_setup.html