hadoop tutorial n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hadoop Introduction PowerPoint Presentation
Download Presentation
Hadoop Introduction

Loading in 2 Seconds...

play fullscreen
1 / 26

Hadoop Introduction - PowerPoint PPT Presentation


  • 6 Views
  • Uploaded on

DataFlair's Big Data Hadoop Tutorial PPT for Beginners takes you through various concepts of Hadoop:This Hadoop tutorial PPT covers: 1. Introduction to Hadoop 2. What is Hadoop 3. Hadoop History 4. Why Hadoop 5. Hadoop Nodes 6. Hadoop Architecture 7. Hadoop data flow 8. Hadoop components – HDFS, MapReduce, Yarn 9. Hadoop Daemons 10. Hadoop characteristics & features Related Blogs: Hadoop Introduction – A Comprehensive Guide: https://goo.gl/QadBS4 Wish to Learn Hadoop & Carve your career in Big Data, Contact us: info@data-flair.training +91-7718877477, +91-9111133369 Or visit our website https://data-flair.training/

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hadoop Introduction' - PritamPal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
agenda
Agenda
  • Introduction to Hadoop
  • Hadoop nodes & daemons
  • Hadoop Architecture
  • Characteristics
  • Hadoop Features
what is hadoop
What is Hadoop?

The Technology that empowers Yahoo, Facebook, Twitter, Walmart and others

Hadoop

what is hadoop 1
What is Hadoop?

An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware

what is hadoop 2
What is Hadoop?

An Open Sourceframework that allows distributed processing of large data-sets across the cluster of commodity hardware

Open Source

  • Source code is freely available
  • It may be redistributed and modified
what is hadoop 3
What is Hadoop?

An open source framework that allows Distributed Processingof large data-sets across the cluster of commodity hardware

Distributed Processing

  • Data is processed distributedlyon multiple nodes / servers
  • Multiple machines processes the data independently
what is hadoop 4
What is Hadoop?

An open source framework that allows distributed processing of large data-sets across the Clusterof commodity hardware

Cluster

  • Multiple machines connected together
  • Nodes are connected via LAN
what is hadoop 5
What is Hadoop?

An open source framework that allows distributed processing of large data-sets across the cluster of CommodityHardware

Commodity Hardware

  • Economic / affordable machines
  • Typically low performance hardware
what is hadoop 6
What is Hadoop?
  • Open source framework written in Java
  • Inspired by Google's Map-Reduce programming model as well as its file system (GFS)
hadoop history

Hadoop History

Doug Cutting added

DFS & MapReduce

in

Hadoop defeated

Super computer

converted 4TB of

image archives over

100 EC2 instances

Doug Cutting started

working on

Doug Cutting

joined Cloudera

2002

2003

2004

2005

2006

2007

2008

2009

published GFS &

MapReduce papers

Hadoop became

top-level project

Development of

started as Lucenesub-project

    • launched Hive,
  • SQL Support for Hadoop
hadoop nodes
Hadoop Nodes

Nodes

Master Node

Slave Node

hadoop daemons
Hadoop Daemons

Nodes

Master Node

Slave Node

Resource Manager

Node

Manager

NameNode

DataNode

basic hadoop architecture
Basic Hadoop Architecture

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Work

Sub Work

Sub Work

Sub Work

Sub Work

USER

Master(s)

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

Sub Work

100 SLAVES

hadoop characteristics
Hadoop Characteristics

Distributed Processing

Open Source

Fault Tolerance

Easy to use

Reliability

Economic

High Availability

Scalability

open source
Open Source
  • Source code is freely available
  • Can be redistributed
  • Can be modified

Free

Transparent

Affordable

Inter-operable

Open

Source

No vendor lock

Community

distributed processing
Distributed Processing
  • Data is processed distributedly on cluster
  • Multiple nodes in the cluster process data independently

Centralized Processing

Distributed Processing

fault tolerance
Fault Tolerance
  • Failure of nodes are recovered automatically
  • Framework takes care of failure of hardware as well tasks
reliability
Reliability
  • Data is reliably stored on the cluster of machines despite machine failures
  • Failure of nodes doesn’t cause data loss
high availability
High Availability
  • Data is highly available and accessible despite hardware failure
  • There will be no downtime for end user application due to data

USER

scalability
Scalability
  • Vertical Scalability – New hardware can be added to the nodes
  • Horizontal Scalability – New nodes can be added on the fly
economic
Economic
  • No need to purchase costly license
  • No need to purchase costly hardware

Economic

Open Source

Commodity Hardware

=

+

easy to use
Easy to Use
  • Distributed computing challenges are handled by framework
  • Client just need to concentrate on business logic
data locality
Data Locality
  • Move computation to data instead of data to computation
  • Data is processed on the nodes where it is stored

Data

Data

Data

Data

App Servers

Storage Servers

Algo

Algo

Data

Data

Algorithm

Algo

Algo

Data

Data

Servers

summary
Summary
  • Everyday we generate 2.3 trillion GBs of data
  • Hadoop handles huge volumes of data efficiently
  • Hadoop uses the power of distributed computing
  • HDFS & Yarn are two main components of Hadoop
  • It is highly fault tolerant, reliable & available
thank you
Thank You

DataFlair

/DataFlairWS

/c/DataFlairWS