1 / 24

excelonlineclasses.co.nr/ excel.onlineclasses@gmail

http://www.excelonlineclasses.co.nr/ excel.onlineclasses@gmail.com. Excel Online Classes offers following services :. Online Training Development Testing Job support Technical Guidance Job Consultancy Any needs of IT Sector. Nagarjuna K. HDFS. HDFS .

maude
Download Presentation

excelonlineclasses.co.nr/ excel.onlineclasses@gmail

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://www.excelonlineclasses.co.nr/ excel.onlineclasses@gmail.com http://www.excelonlineclasses.co.nr/

  2. Excel Online Classes offers following services: • Online Training • Development • Testing • Job support • Technical Guidance • Job Consultancy • Any needs of IT Sector http://www.excelonlineclasses.co.nr/

  3. Nagarjuna K HDFS http://www.excelonlineclasses.co.nr/

  4. HDFS • Distributed FS designed to run on Commodity Hardware • Provides high throughput access to application data , suitable for applications having large datasets http://www.excelonlineclasses.co.nr/

  5. Assumptions & Goals • Hardware Failure • Streaming Data Access • Large Datasets • Simple coherency Model • Moving Computation cheaper than moving data http://www.excelonlineclasses.co.nr/

  6. Hardware Failure Assumptions & Goals • HDFS instance  many machines • Each storing part of the data • Chances that any machine goes down can’t be avoided • Detection of faults, auto recovery is core architectural goal of HDFS http://www.excelonlineclasses.co.nr/

  7. Streaming Data Access Assumptions & Goals • HDFS is designed fro batch processing rather than interactive usage by users. • Emphasis on Data throughput • Not on low Latency data access. http://www.excelonlineclasses.co.nr/

  8. Streaming Data Access Assumptions & Goals • HDFS built on !dea“Write once , Read many times pattern” • Overtime  data set generated and placed in HDFS • Analysis is done one large part of data , rather than on first few records • Time to read whole data set is more than retrieving first or the last record. http://www.excelonlineclasses.co.nr/

  9. Large Datasets Assumptions & Goals • A typical file ranges from GB to TB http://www.excelonlineclasses.co.nr/

  10. Simple Coherency Model Assumptions & Goals • HDFS built on !dea “Write once , Read many times pattern” • The assumption enables high through put access http://www.excelonlineclasses.co.nr/

  11. Moving Computation OR Data ? Assumptions & Goals • Computation intensive porgraming • Data intensive programing http://www.excelonlineclasses.co.nr/

  12. Where HDFS doesn’t fit • Low latency data access • Lots of small files • Multiple writers, arbitrary file modifications http://www.excelonlineclasses.co.nr/

  13. Where HDFS doesn’t fit • Low latency data access • Lots of small files • High latency time • Each file (say 10 KB of size) takes up a block in HDFS Compress • All the metadata is stored in HDFS memory http://www.excelonlineclasses.co.nr/

  14. Where HDFS doesn’t fit • Multiple writers, arbitrary file modifications • Single user writes files in HDFS. Appending only at the end. Multiple sources of writing into a same file or writing at arbitrary offset is not supported (currently) http://www.excelonlineclasses.co.nr/

  15. Blocks • disc has block size • minimum amount of data that is read/write • 512 bytes • FileSystem blocks are few multiple of disc block size • few KB http://www.excelonlineclasses.co.nr/

  16. Blocks • In classical FS, single block may contain data of only single file • Leads to internal fragmentation. • Newer file systems, solves this problem by • block suballocation • tail merging http://www.excelonlineclasses.co.nr/

  17. Blocks • HDFS also has a block size • 64 MB • Unlike normal FS , if file is less than 64 MB it doesn’t occupy underlying storage of 64MB. http://www.excelonlineclasses.co.nr/

  18. Why BIG BLOCK size ? • Throughput vs Latency • time to seek start of block • Reading the whole block http://www.excelonlineclasses.co.nr/

  19. Why BIG BLOCK size ? • seek time = 10ms • transfer rate (throughput) = 100MBPS • make seek time 1% of transfer rate , • block size = 100MB • Default is 64 MB • As the transfer rate increases , Block size can be increased http://www.excelonlineclasses.co.nr/

  20. hadoopfsck / -files -blocks • Gives information about all the files and blocks in the file system • Replication • under • over etc., • corrupt ? • etc., http://www.excelonlineclasses.co.nr/

  21. File Permissions on HDFS • Client’s identity determined • user name and groups from which it operates. • Sharing of FS shouldn’t be used hostile environment • Going forward • Kerberos authentication http://www.excelonlineclasses.co.nr/

  22. Hadoop File Systems • HDFS is just one implementation of Hadoop FileSystems. • org.apache.hadoop.fs.FileSystem • represents a FileSystem in hadoop http://www.excelonlineclasses.co.nr/

  23. Hadoop File Systems http://www.excelonlineclasses.co.nr/

  24. Hadoop File Systems http://www.excelonlineclasses.co.nr/

More Related