Introduction to Apache Hadoop HDFS - PowerPoint PPT Presentation

semtechs
apache hadoop hdfs n.
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Apache Hadoop HDFS PowerPoint Presentation
Download Presentation
Introduction to Apache Hadoop HDFS

play fullscreen
1 / 10
Download Presentation
Introduction to Apache Hadoop HDFS
113 Views
Download Presentation

Introduction to Apache Hadoop HDFS

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Apache Hadoop HDFS • What is it ? • What is it for ? • Architecture • Resilience • Administration • Data access • Future changes ?

  2. HDFS – What is it ? • HDSF = Hadoop Distributed File System • It is a distributed file system • Runs on low cost hardware • It is open source • Written in Java • Fault tolerant • Designed for very large data sets • Tuned for high throughput

  3. HDFS – What is it for ? • Designed for batch processing • Streaming access to data • Large data sizes i.e. Terabytes • Highly reliable using data replication • Supports very large node clusters • Supports large files • Supports file numbers into millions

  4. HDFS – Architecture

  5. HDFS – Architecture • Has a master / slave architecture • A master NameNode • Controls file system operations • Maps data blocks to DataNodes • Logs all changes • Slave DataNodes • Store file blocks • Store replicated data

  6. HDFS – Resilience • Data is replicated across DataNodes • Nodes may fail but data is still available • DataNodes indicate state via heart beat report • Single point of failure in master NameNode • Data integrity via check sums

  7. HDFS – Administration • Access via Java API • FS Shell commands language • HTTP browser • C wrapper for Java API • Space reclamation • Via control of replication factor • Deleted files sent to trash folder • Trash folder cleaned after configurable time

  8. HDFS – Future changes Things they might consider for HDFS • File append • User quotas • File links • Stand by nodes

  9. Other Areas • Want to know about ? • Big Data • Nutch • Solr • see my other presentations

  10. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems