1 / 38

Overview of HDF5

Overview of HDF5. HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006. Topics. What is HDF? Sample uses of HDF THG the Company. What is HDF?. Matter & the universe. Life and nature. Weather and climate. August 24, 2001. August 24, 2002. Total Column Ozone (Dobson).

guy
Download Presentation

Overview of HDF5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of HDF5 HDF Summit Boeing Seattle The HDF Group (THG) September 19, 2006

  2. Topics • What is HDF? • Sample uses of HDF • THG the Company

  3. What is HDF?

  4. Matter & the universe Life and nature Weather and climate August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Answering big questions …

  5. involves big data …

  6. varied data… caacaagccaaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtgg cgagatatctcttggaaaaactttcaagagcaactcaatcaactttctcgagcattgctt gctcacaatattgacgtacaagataaaatcgccatttttgcccataatatggaacgttgg gttgttcatgaaactttcggtatcaaagatggtttaatgaccactgttcacgcaacgact acaatcgttgacattgcgaccttacaaattcgagcaatcacagtgcctatttacgcaacc aatacagcccagcaagcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtc ggcgatcaagagcaatacgatcaaacattggaaattgctcatcattgtccaaaattacaa aaaattgtagcaatgaaatccaccattcaattacaacaagatcctctttcttgcacttgg

  7. Contig Summaries Discrepancies Contig Qualities Coverage Depth and complex relationships… SNP Score Trace Reads Aligned bases Read quality Contig Percent match

  8. on big computers…

  9. and on little computers.

  10. HDF How do we… • Describe the data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and repositories

  11. HDF is • A file format for managing any kind of data • Software to store and access data in the format • Suited especially to large or complex data collections • Suited for every size of system • Platform independent – runs almost anywhere • Open – both file formats and software

  12. Efficient storage, I/O Scientific data file format CommonData models I/O software & tools StandardAPIs HDF solution

  13. palette An HDF file is a container… …into which you can put your data objects. lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

  14. “/” (root) “/foo” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array HDF structures for organizing objects in files

  15. Mesh Example, in HDFView

  16. HDF5 Software Tools & Applications HDF I/O Library HDF File

  17. Goals of HDF5 Library • Flexible API to support a wide range of operations on data • High performance access in serial and parallel computing environments • Compatibility with common data models and programming languages

  18. Features • Ability to create complex data structures • Complex subsetting • Efficient storage • Flexible I/O (parallel, remote, etc.) • Ability to transform data during I/O • Support for key language models • OO compatible • C & Fortran primarily • Also Java, C++

  19. Sample uses of HDF

  20. Aqua (6/01) Terra CERES MISR MODIS MOPITT AquaCERES MODIS AMSR Aura TES HRDLS MLS OMI HDF-EOS 1. NASA Earth Observing System (EOS)

  21. 2. Advanced Simulation & Computing (ASC) Question: How do we maintain a nuclear stockpile in the absence of testing?

  22. Answer: Very large simulationson very large computers

  23. ASC Data requirements • Large datasets (> a terabyte) • Good I/O performance on massive parallel systems Complex data and extensive metadata

  24. 3. Bioinformatics--Managing genomic data caacaagccaaaactcgtacaa Cgagatatctcttggaaaaact gctcacaatattgacgtacaag gttgttcatgaaactttcggta Acaatcgttgacattgcgacct aatacagcccagcaagcagaat

  25. DNA sequencing workflows • Diverse formats • Highly redundant data • Repeated file processing • Disconnected programs • Non-scalable storage • Lack of persistence

  26. Contig Summaries Discrepancies Contig Qualities Coverage Depth Multiple levels and relationships SNP Score Trace Reads Aligned bases Read quality Contig Percent match

  27. BioHDF HDF5 as binary format for bioinformatics

  28. 4. Flight test data--

  29. HDF- Time-history HDF- PACKET 3. Boeing flight test

  30. Flight test data requirements • Fast data acquisition from 1000s of sources • Wide variety of data types • Active archive • Standardization for data/software exchange • Special features

  31. THG the Company

  32. What is the HDF Group? • 18 years at National Center for Supercomputing Center (NCSA) at University of Illinois • Recent spin-off U of I • Non-profit 501(c)(3) • 17 scientific, technology, and professional staff • 5 students • 2+million product users world-wide • Cross industry sectors and disciplines

  33. THG missionTo support the vast community of HDF users and to ensure the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data.

  34. Business model • Non-profit: mission driven • Intellectual property: • U of I plans to assign ownership to THG • The HDF formats will remain free, and HDF software will remain open source. • Continue close ties to U of I and NCSA.

  35. Income-generating activities • Major client support • Targeted HDF development • Grant-supported R&D • Consulting

  36. Thank you

  37. HDF Information • HDF Information Center • http://hdfgroup.org/ • HDF Help email address • hdfhelp@hdfgroup.org/ • HDF users mailing list • hdfnews@hdfgroup.org/

More Related