1 / 40

Marcel Ritter , Werner Benger , Joseph Stoeckl , Donna Delparte , Mike Folk, Quincey Koziol,

HDF5. Cross Disciplinary Applications of Multiplex Observational and Computational Datasets using for Archiving and High Performance Processing. Marcel Ritter , Werner Benger , Joseph Stoeckl , Donna Delparte , Mike Folk, Quincey Koziol, Frank Steinbacher and Markus Aufleger .

shanae
Download Presentation

Marcel Ritter , Werner Benger , Joseph Stoeckl , Donna Delparte , Mike Folk, Quincey Koziol,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDF5 Cross Disciplinary Applications of Multiplex Observational and Computational Datasets usingfor Archiving and High Performance Processing. Marcel Ritter, Werner Benger, Joseph Stoeckl, Donna Delparte, Mike Folk, Quincey Koziol, Frank SteinbacherandMarkus Aufleger ASTRO@UIBK Center for Computation & Technology

  2. Outlook • Motivation • Requirements on a Data Format • Introduction HDF5 • F5 • Introduction • Examples of Data Sets • Application Example: • The Hawaiian Geospatial Data Repository • Conclusion

  3. Motivation Scientific Collaboration Workgroup A Workgroup B Workgroup C Workgroup D Software 3 Software 4

  4. Motivation Scientific Collaboration Workgroup A Workgroup B Software Tool 1 Software Tool 2 File Format 2 File Format 1 Workgroup C Workgroup D Software 3 Software 4

  5. Motivation Workgroup A Workgroup B Software Tool 1 Software Tool 2 File Format 2 File Format 1 Data Exchange Workgroup C Workgroup D Software 3 Software 4

  6. Motivation File Format 1 … File Format 2 File Format N File Format 3 File Format 5 File Format 4 Workgroup C Workgroup D Software 3 Software 4

  7. Motivation File Format 1 … File Format 2 File Format N o(N2) File Format 3 File Format 5 File Format 4 Huge Implementation Effort Workgroup C Workgroup D Software 3 Software 4

  8. Motivation File Format 1 … File Format 2 File Format N Common Data Format File Format 3 File Format 5 File Format 4 o(N) Less Implementation Effort Workgroup C Workgroup D Software 3 Software 4

  9. Motivation Easier collaboration More time for science Workgroup B Workgroup A Software Tool 1 Software Tool 2 Software 4 Software 3 Common Data Format Workgroup C Workgroup D

  10. Requirementson a Data Format • Easy to read and write • Fast and efficient • Hold hugedata sets ( Terabytes ) • Multiple operating systems • Hold huge variety of data • Store meta information of the data • Self-descriptive • Well-documented, active support and community • Sustainable (still easily accessible in >10 years) !

  11. HDF5 Hierarchical Data Format 5 http://www.hdfgroup.org/HDF5

  12. HDF5 - A Few Analogies • File system (in a file) • Binary XML file • PDF for numerical data • Database (container for array variables)

  13. HDF5 - Relationships / SimOut Parameters 10;100;1000 Relation Attribute Timestep 36,000 City A Group lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Dataset

  14. HDF5 -What Users Get… • A multi-platform library and tools built on over 10 years experience in large data handling from the high performance computing community (HPC). • A capability that: • Lets them organizelarge and/or complex collections of data • Gives them efficient and scalabledata storage and access • Lets them integrate a wide varietyof types of data and data sources • Guarantees long-term data integrity and preservation

  15. HDF5 • Shapefiles: HDF5 as container format Browser application

  16. HDF5 Pixel data Vector data • Shapefiles: HDF5 as container format Attribute data Browser application

  17. Aqua (6/01) Terra CERES MISR MODIS MOPITT AquaCERES MODIS AMSR Aura TES HRDLS MLS OMI HDF5 - More Applications Earth Science (Earth Observing System) Big simulations Billions of elements/dozens associated values Flight Testing Movie Making

  18. HDF5 • More than a ZIP or TAR • also allows to describe the structure of the contents of a file • How to store different kinds of data sets consistently in HDF5?

  19. F5 Fiber Bundle Data Model http://www.fiberbundle.net

  20. F5 • Based on HDF5 • Inspired by concepts of: • Topology • Differential Geometry • Geometric Algebra • Separation of Geometry (Grids) and Datafield (Fields) Grid Field

  21. F5 Field • Hierarchical Structure: Coordinates Topology Grid Time Slice Fiber Bundle

  22. F5 Field • Hierarchical Structure: Coordinates Topology Grid Visible to the end user Time Slice FiberBundle

  23. Fiber: 0D 1D 3D 6D Base: 3D 2D 1D 0D

  24. F5 • Multi Channel – Multi Resolution Images:

  25. F5 • Multi Channel – Multi Resolution Images: Time Grid Topology Representation Field [Datatype] /1.4/Satellite/VertexRefinement1x1/Cartesian/Positions [uniform-grid]/RGB [byte,byte,byte] /N-IR [float64]/T-IR [float64] /VertexRefinement2x2/Cartesian/Positions /RGB “/N-IR/T-IR /1.6/ …

  26. F5 • Full Waveform LIDAR: t_emission t3 t1 t2

  27. F5 • Full Waveform LIDAR: - Laser Data Time Grid Topology Representation Field [Datatype] /CorseTime/LASER/POINTS/CartesianCoords/Positions [point3D] /TimeStamp [float64]/Waveform [uint16,uint16] /Reflectance [float32] /SHOTS /SHOTSAsPOINTS/Positions vlen[uint32] /Origin [point3D] /Direction [vector3D] /EmissionTime [float64] t_emission t3 t1 t2

  28. F5 • Full Waveform LIDAR: - Airplane Data /CorseTime/PLANE/POINTS/CartesianCoords/Positions [point3D] /Rotation [rotor3D] /TimeStamps [float64]

  29. F5 • Bringing together in F5: • Satellite data • LIDAR • Shapefiles • Features of HDF5 • Sustainable storage • Meta data • Compression • Parallel IO • Hyperslab access • Consistent data organization of simple and complex spatial-temporal data • Handle time series of data easily • Make tools of other disciplines applicableto the Geo-science Community, such as astrophysics imaging mosaic tools for satellite data: Montage, http://montage.ipac.caltech.edu Benefits

  30. Application Example HawaiIan Data repoSitory http://www.epscor.hawaii.edu

  31. HawaiIan Data repoSitory Goal: Centralized integrative capability to storeand manage access to massive (terabytes) research datasets Broad statewide research community University of Hawaii research teams Users: Mission: Objectives: Collect, store and manage access to data Discovery, manipulation, fusion and visualization Utilize user portals Utilize and link to the Maui High Performance Computing Center (MHPCC)

  32. Geospatial Information and Mass Storage

  33. Geospatial Information and Mass Storage How to manage and store large complex datasets?!!

  34. Geospatial Information and Mass Storage HDF5

  35. Geospatial Information and Mass Storage F5 HDF5

  36. Conclusion collaborations • A common data format eases andreduces wasted time spenton data conversions • Data formats for sustainable transparent storage of huge and complex data exist, one just has to use them – • captures observational and simulation data consistently. • Geoscience repositories, such as the can be built upon this format. HDF5 F5 Hawaiian Data repository

  37. Thankyou References: http://www.hdfgroup.org/HDF5 http://www.fiberbundle.net http://www.epscor.hawaii.edu http://montage.ipac.caltech.edu http://sciviz.cct.lsu.edu http://www.marcel-ritter.com

  38. HDF5 - HDFView screenshot of shapefiles

  39. Geospatial Information and Mass Storage • Weather station data • Marine buoy sensor data • GPS data collection • Database datasets, excel files • Spatial data - imagery, LiDAR, GIS • Geowebapplication services – WMS, WFS, WPC • Database management • Data streaming • Data storage of statewide datasets • Access to HPC services • real-time modeling and analysis • Upload and download capability • Metadata search capacity • Visualization of spatial and non-spatial datasets

  40. F5 • Grid • Manifold describing the base space • Topology • Refinement level • Coordinate representation • Vertex positions in representation

More Related