1 / 84

HDF5 Tutorial

HDF5 Tutorial. 37 th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group. Outline. 8:00 – 9:00 Introduction to HDF5 data, programming models and tools 9:00 – 9:30 Advanced features 10:00 – 12:00 Introduction to Parallel HDF5 13:15 – 14:15 Caching and buffering in HDF5

providencia
Download Presentation

HDF5 Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDF5 Tutorial 37th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group SPEEDUP Workshop - HDF5 Tutorial

  2. Outline 8:00 – 9:00 Introduction to HDF5 data, programming models and tools 9:00 – 9:30 Advanced features 10:00 – 12:00 Introduction to Parallel HDF5 13:15 – 14:15 Caching and buffering in HDF5 14:45 – 16:45 New features in HDF5 1.8.0 SPEEDUP Workshop - HDF5 Tutorial

  3. Introduction to HDF5 Data, Programming Modelsand Tools SPEEDUP Workshop - HDF5 Tutorial

  4. What is HDF? SPEEDUP Workshop - HDF5 Tutorial

  5. HDF is… • A file format for managing any kind of data • Software system to manage data in the format • Designed for high volume or complex data • Designed for every size and type of system • Open format and software library, tools • There are two HDF’s: HDF4 and HDF5 • Today we focus on HDF5 SPEEDUP Workshop - HDF5 Tutorial

  6. HDF5The Format SPEEDUP Workshop - HDF5 Tutorial

  7. palette An HDF5 “file” is a container… …into which you can put your data objects lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 SPEEDUP Workshop - HDF5 Tutorial

  8. “Groups” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array “Datasets” Structures to organize objects “/” (root) “/foo” SPEEDUP Workshop - HDF5 Tutorial

  9. HDF5 model • Groups – provide structure among objects • Datasets – where the primary data goes • Data arrays • Rich set of datatype options • Flexible, efficient storage and I/O • Attributes, for metadata Everything else is built essentially from these parts. SPEEDUP Workshop - HDF5 Tutorial

  10. HDF5The Software SPEEDUP Workshop - HDF5 Tutorial

  11. HDF5 Software Tools, Applications, Libraries HDF5 I/O Library HDF5 File SPEEDUP Workshop - HDF5 Tutorial

  12. Most data consumers are here. Scientific/engineering applications. Domain-specific libraries/API, tools. Applications, tools use this API to create, read, write, query, etc. Power users (consumers) Modules to adapt I/O to specific features of system, or do I/O in some special way. “File” could be on parallel system, in memory, network, collection of files, etc. Users of HDF5 Software Tools & Applications HDF5 Application Programming Interface “Virtual file layer” (VFL) File system, MPI-IO, SAN, other layers “HDF5 File” SPEEDUP Workshop - HDF5 Tutorial

  13. HDF5 Philosophy A single platform with multiple uses • One general format • One library, with • Options to adapt I/O and storage to data needs • Layers on top and below • Ability to interact well with other technologies • Attention to past, present, future compatibility SPEEDUP Workshop - HDF5 Tutorial

  14. Who uses HDF5? SPEEDUP Workshop - HDF5 Tutorial

  15. Who uses HDF5? • Applications that deal with big or complex data • Over 200 different types of apps • 2+million product users world-wide • Academia, government agencies, industry SPEEDUP Workshop - HDF5 Tutorial

  16. Applications with large amounts of data SPEEDUP Workshop - HDF5 Tutorial

  17. NASA EOS remote sense data • HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. • Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program. SPEEDUP Workshop - HDF5 Tutorial

  18. Large simulations • A simulation can have billions of elements • Each element can have dozens of associated values SPEEDUP Workshop - HDF5 Tutorial

  19. Large images Electron tomography 25-80Å resolution 4k x 4k x 500 images now 8k x 8k x 1k images (soon 256 GB) SPEEDUP Workshop - HDF5 Tutorial

  20. It is not just about size SPEEDUP Workshop - HDF5 Tutorial

  21. Data complexity Thanks to Mark Miller, LLNL SPEEDUP Workshop - HDF5 Tutorial

  22. Complex relationships within data SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match SPEEDUP Workshop - HDF5 Tutorial

  23. High speed, multi-stream, multi-modal data collection Analyze and query specific parameters by time, space Different views of data Flight test SPEEDUP Workshop - HDF5 Tutorial

  24. HDF5 Data Model SPEEDUP Workshop - HDF5 Tutorial

  25. HDF5 model (recap) • Groups – provide structure among objects • Datasets – where the primary data goes • Data arrays • Rich set of datatype options • Flexible, efficient storage and I/O • Attributes, for metadata • Other objects • Links (point to data in a file or in another HDF5 file) • Datatypes (can be stored for complex structures and reused by multiple datatsets) SPEEDUP Workshop - HDF5 Tutorial

  26. Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 HDF5 Dataset SPEEDUP Workshop - HDF5 Tutorial

  27. HDF5 Dataspace • Two roles • Dataspace contains spatial info about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition • Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimensions = 12 SPEEDUP Workshop - HDF5 Tutorial

  28. HDF5 Datatype • Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound • Can be stored in a file as an HDF5 object (HDF5 committed datatype) • Can be shared among different datasets SPEEDUP Workshop - HDF5 Tutorial

  29. HDF5 Datatype • HDF5 atomic types include • normal integer & float • user-definable (e.g., 13-bit integer) • variable length types (e.g., strings) • references to objects/dataset regions • enumeration - names mapped to integers • array • HDF5 compound types • Comparable to C structs (“records”) • Members can be atomic or compound types SPEEDUP Workshop - HDF5 Tutorial

  30. HDF5 dataset: array of records 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Record SPEEDUP Workshop - HDF5 Tutorial

  31. Better subsetting access time; extendable chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable File B Metadata in HDF5 file, raw data in a binary file Dataset “Fred” external File A Metadata for Fred Data for Fred Special storage options for dataset SPEEDUP Workshop - HDF5 Tutorial

  32. HDF5 Attribute • Attribute – data of the form “name = value”, attached to an object by application • Operations similar to dataset operations, but … • Not extendible • No compression or partial I/O • Can be overwritten, deleted, added during the “life” of a dataset SPEEDUP Workshop - HDF5 Tutorial

  33. A mechanism for organizing collections of related objects Every file starts with a root group Similar to UNIXdirectories Can have attributes HDF5 Group “/” SPEEDUP Workshop - HDF5 Tutorial

  34. Path to HDF5 object in a file “/” Y • / (root) • /X • /Y • /Y/temp • /Y/bar/temp X bar temp temp SPEEDUP Workshop - HDF5 Tutorial

  35. Shared HDF5 objects “/” A C B R P P • /A/P • /B/R • /C/P SPEEDUP Workshop - HDF5 Tutorial

  36. HDF5 Data ModelExample ENSIGHT Automotive crash simulation SPEEDUP Workshop - HDF5 Tutorial

  37. Automotive crash simulation SPEEDUP Workshop - HDF5 Tutorial

  38. Automotive crash simulation SPEEDUP Workshop - HDF5 Tutorial

  39. Automotive crash simulation SPEEDUP Workshop - HDF5 Tutorial

  40. Solid modeling SPEEDUP Workshop - HDF5 Tutorial

  41. Solid modeling SPEEDUP Workshop - HDF5 Tutorial

  42. HDF5mesh SPEEDUP Workshop - HDF5 Tutorial

  43. Mesh Example, in HDFView April 28, 2008 LCI Tutorial SPEEDUP Workshop - HDF5 Tutorial 43

  44. HDF5 Software SPEEDUP Workshop - HDF5 Tutorial

  45. HDF5 software stack Tools & Applications HDF I/O Library HDF File SPEEDUP Workshop - HDF5 Tutorial

  46. Structure of HDF5 Library • Object API (C, Fortran 90, Java, C++) • Specify objects and transformation properties • Invoke data movement operations and data transformations • Library internals • Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.) • Virtual file I/O (C only) • Perform byte-stream I/O operations (open/close, read/write, seek) • User-implementable I/O (stdio, network, memory, etc.) SPEEDUP Workshop - HDF5 Tutorial

  47. Write – from memory to disk memory disk SPEEDUP Workshop - HDF5 Tutorial

  48. disk memory (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array Partial I/O Move just part of a dataset disk memory (a) Hyperslab from a 2D array to the corner of a smaller 2D array SPEEDUP Workshop - HDF5 Tutorial

  49. memory disk (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O Move just part of a dataset memory disk SPEEDUP Workshop - HDF5 Tutorial

  50. Layers – parallel example Application I/O flows through many layers from application to disk. Parallel computing system (Linux cluster) Computenode Computenode Computenode Computenode I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Disk architecture & layout of data on disk SPEEDUP Workshop - HDF5 Tutorial

More Related