1 / 17

Hadoop

Hadoop. Carson Gallimore , Chris Zingraf , Jonathan Light. Contents. Hadoop Overview MapReduce HDFS History Architecture Applications. What is Hadoop?. Open Source software project Used to distribute the processing of large data sets over clusters of servers.

armani
Download Presentation

Hadoop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hadoop Carson Gallimore, Chris Zingraf, Jonathan Light

  2. Contents • Hadoop Overview • MapReduce • HDFS • History • Architecture • Applications

  3. What is Hadoop? • Open Source software project • Used to distribute the processing of large data sets over clusters of servers. • Software is resilient because it is great at detecting and handling failures at the application layer. http://tinyurl.com/m33wgcw

  4. Overview • Hadoop contains a lot of apache projects (e.g. Pig, Hive, Zookeeper) • Mainly relies on MapReduce and HDFS (Hadoop Distributed File System) • MapReduce is a framework that assigns work to the nodes in a cluster • HDFS is a file system that spans over all of the nodes in the cluster to store data. http://www.ibmbigdatahub.com/sites/default/files/public_images/hadoop.jpg

  5. MapReduce • “MapReduce is the heart of Hadoop. It is this programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster”. http://www-01.ibm.com/software/data/infosphere/hadoop/mapreduce/ http://people.apache.org/~rdonkin/hadoop-talk/diagrams/map-reduce.png

  6. Example: http://www-01.ibm.com/software/ebusiness/jstart/graphics/hadoopDiagram.png

  7. HDFS • The HDFS breaks down the data in the cluster into small blocks and distributes them throughout the cluster. • This helps with scalability because you can break down the data making the map and reduce functions able to work on smaller subsets of the large data sets. • The goal of Hadoop is to use common servers with inexpensive internal disk drives in large clusters

  8. HDFS, Cont. • More machines means potentially higher fault rate • Hadoop was developed with high fail rates in mind • Hadoop has built-in fault tolerance and compensation capabilities. The same for HDFS.

  9. HDFS, Cont. • The data gets divided into blocks, and then copies of these blocks are made. • The copied blocks are then stored throughout the other servers in the cluster. • This was if the cluster fails, you can get the file by combining the copied blocks

  10. History • Underlying technology invented by Google in order to index the rich textural and structural information. • Designed to solve large data problems where you have a mixture of structured and complex data.

  11. History, Cont. • Uses a MapReduce engine, HDFS • Written in Java • Being consistently built and used by a global community of contributors.

  12. Architecture • Designed to run on many machines that do not share memory or disks. • The software busts data into pieces and spread it across all the machines. • To achieve this Hadoop implements MapReduce.

  13. Architecture, Cont. • Hadoop keeps track of where all the data resides and keeps copies in case of a server failure. • There are many different ways to customize Hadoop to fit specific needs.

  14. Applications • Hadoop can be applied to multiple markets. • Including: - Risk analysis for financing corporations - online retail, product suggestions

  15. References • Turner, James. January 12, 2011. Hadoop: what it is, how it works, and what it can do. <http://strata.oreilly.com/2011/01/what-is-hadoop.html> • Wikipedia. September 18, 2013. Apache Hadoop.<http://en.wikipedia.org/wiki/Hadoop>

  16. References cont. • What is Hadoop?< http://www-01.ibm.com/software/data/infosphere/hadoop/> • What is MapReduce?<http://www-01.ibm.com/software/data/infosphere/hadoop/mapreduce/> • What is HDFS?<http://www-01.ibm.com/software/data/infosphere/hadoop/hdfs/>

  17. Questions?

More Related