1 / 8

Introduction to Apache Hadoop

A short presentation to introduce Apache Hadoop, what is it and what can it do ? What are the other products associated with it ?

semtechs
Download Presentation

Introduction to Apache Hadoop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Hadoop • What is it ? • Architecture • Related Projects • Large users

  2. Hadoop – What is it ? • An open source system developed using Java • Supports very large data sets • Supports large clusters of servers • Designed to run on pre existing low cost hardware • Allows for fragmentation of work over cluster • Allows for fragmentation of storage over cluster • Provides resiliance via automatic failure handling

  3. Hadoop - Architecture Hadoop consists of • Hadoop Common Common utilities for Hadoop module support • Hadoop MapReduce Parallel processing of Hadoop data • Hadoop Yarn Scheduler and resource manager • Hadoop Distributed File System (HDFS)‏ A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.

  4. Hadoop – Related Projects

  5. Hadoop – Related Projects • Pig - for analysing large data sets • Hive – data warehouse system for Hadoop • Mahout – machine learning and data mining • Avro – a data serialization system • Zoo Keeper – helps build distributed applications • Chukwa – data collection and analysis

  6. Hadoop – Related Projects • Hue – Hadoop user interface • Oozie – work flow scheduler • Hama – bulk synchronous parallel framework • For massive scientific computations • Nutch – web crawler • Hbase – Non relational database

  7. Hadoop – Large Users • Yahoo • 10,000 core Linux cluster • Facebook • 100 Petabytes, growing at .5 Petabytes a day • Amazon • Its possible to run Hadoop on Amazon's EC2 and S3

  8. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

More Related