dan bassett jonathan canfield december 13 2011 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Dan Bassett, Jonathan Canfield December 13, 2011 PowerPoint Presentation
Download Presentation
Dan Bassett, Jonathan Canfield December 13, 2011

Loading in 2 Seconds...

play fullscreen
1 / 21

Dan Bassett, Jonathan Canfield December 13, 2011 - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

Dan Bassett, Jonathan Canfield December 13, 2011. What is Hadoop ?. Allows for the distributed processing of large data sets across clusters of computers Open-source project written in Java Actively supported Inspired by a project that Google started. What’s the big deal?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dan Bassett, Jonathan Canfield December 13, 2011' - brody


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
what is hadoop
What is Hadoop?
  • Allows for the distributed processing of large data sets across clusters of computers
  • Open-source project written in Java
  • Actively supported
  • Inspired by a project that Google started
what s the big deal
What’s the big deal?
  • Changes the economics and dynamics of large scale computing
  • Scalable
  • Cost effective
  • Flexible
  • Fault Tolerant
commercially supported
Commercially supported
  • InfoSphereBigInsights
  • Silicon Graphics CloudRack
  • EMC Greenplum
  • Google App Engine
  • Oracle Big Data Appliance
  • ClouderaCDH, Professional Services
  • Microsoft Windows Server, SQL Server
prominent users
Prominent Users
  • Facebook - claims to have the largest Hadoop cluster in the world at 30PB.
  • Yahoo! - claims to have the world’s largest Hadoop production application.
  • eBay – 5.3PB, 532 nodes cluster
  • New York Times – processed 4TB of image data into 11 million PDFs at cost of ~ $240
architecture
Architecture
  • Hadoop Common
  • HadoopDistributed File System (HDFS)
  • MapReduce Engine
file system hdfs
File System (HDFS)
  • One big file system from many nodes
  • Fault-tolerant
  • Runs on low-cost commodity hardware
mapreduce engine
MapReduce Engine
  • Splits input data
  • Assigns work to nodes
  • Processed in parallel
resources
Resources
  • Project Homehttp://hadoop.apache.org/
  • Wikipediahttp://en.wikipedia.org/wiki/Apache_Hadoop
  • IBMhttp://www-01.ibm.com/software/data/infosphere/hadoop/