60 likes | 76 Views
Apache Hadoop is an open-source system developed in Java, supporting large datasets and clusters of servers on low-cost hardware. It allows work and storage fragmentation over clusters and provides resilience through automatic failure handling. The architecture includes Hadoop Common utilities, MapReduce for parallel processing, Yarn for scheduling, and Hadoop Distributed File System for data storage across clusters.
E N D
Apache Hadoop • What is it ? • Architecture • Related Projects • Large users
Hadoop – What is it ? • An open source system developed using Java • Supports very large data sets • Supports large clusters of servers • Designed to run on pre existing low cost hardware • Allows for fragmentation of work over cluster • Allows for fragmentation of storage over cluster • Provides resiliance via automatic failure handling
Hadoop - Architecture Hadoop consists of • Hadoop Common Common utilities for Hadoop module support • Hadoop MapReduce Parallel processing of Hadoop data • Hadoop Yarn Scheduler and resource manager • Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.
Interesting, right? This is just a sneak preview of the full presentation. We hope you like it! To see the rest of it, just click here to view it in full on PowerShow.com. Then, if you’d like, you can also log in to PowerShow.com to download the entire presentation for free.