1 / 11

PreDatA -- Preparatory Data Analytics on Peta-Scale Machines

PreDatA -- Preparatory Data Analytics on Peta-Scale Machines. Fang Zheng Hasan Abbasi Jianting Cao Jai Dayal Jay Lofstead Karsten Schwan Matthew Wolf CERCS Center Georgia Tech. Qing Liu Scott Klasky Norbert Podhorszki Oak Ridge National Laboratory. Ciprian Docan Manish Parashar

nishi
Download Presentation

PreDatA -- Preparatory Data Analytics on Peta-Scale Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PreDatA -- Preparatory Data Analytics on Peta-Scale Machines Fang Zheng Hasan Abbasi Jianting Cao Jai Dayal Jay Lofstead Karsten Schwan Matthew Wolf CERCS Center Georgia Tech Qing Liu Scott Klasky Norbert Podhorszki Oak Ridge National Laboratory Ciprian Docan Manish Parashar Rutgers University

  2. Background • “Big Data” problem for Peta-scale scientific applications • Scientists desire: • Faster I/O • Faster data analysis

  3. Preparatory Data Analytics • Simulation output data needs to be prepared/pre-analyzed: • Indexing, annotation, reduction, sorting, layout re-organization, etc. to speedup future analysis and visualization • Latent data characterization for validation and monitoring • Preparatory data analytics can be critical for end-to-end performance of computational science discoveries • Needle hasn’t grown as fast as the haystack! Big Data

  4. Problem • How to do preparatory data analytics? • Scalable • Efficient • Conventional Approaches: In-compute-node vs. Offline S S S S S S S S CN CN … CN CN CN CN … CN CN CN CN F F F F F F Compute Node CN Storage Storage Simulation S Pre-analytics F

  5. PreDatA Middleware S S S S CN CN … CN CN Simulation CN CN Staging Area F F Storage

  6. PreDatA Architecture • Asynchronous data movement with Datatap/EVPath • Pluggable pre-data analytics • User-defined operations • Higher-level Data Services • Integrated operations, separated from application codes with ADIOS Staging node Compute node Application Data Operation High Level Data Service ADIOS High-level Abstraction Data Operation Buffer Management Task Execution Data Extraction Data Movement Data Shuffling

  7. BP file sorted array BP writer Sort Bitmap Indexing Particle array Index file Histogram Plotter 2D Histogram Plotter Driver Applications • GTC (Gyrokinetic Toroidal Code) • Output: 16384 cores outputs 260GB / 120 seconds • Pre-analytics:

  8. Driver Applications (Cont.) • GTC@JaguarPF • Performance & Cost 98 CPU hours saved in a 30min run 1,716,960 CPU hours saved in a year! CPU Seconds = Total Simulation Time x Total Number of Cores Used 1.2~3% improvement in cost (CPU seconds)

  9. Output Data Diagnostics Particle Diag. Toroidal flux Diag. Momentum Diag. Velocity divergence Diag. Energy Diag. Growth rate Diag. … Current Diag. Maximum velocity Diag. Visualization BP file Layout Re-organization BP writer Driver Applications (Cont.) • Pixie3D (3D MHD code) • Output: 16384 cores, 32 GB / 100 seconds • 3D domain decomposition • Pre-analytics: diagnostics + layout re-organization 10x read performance improvement through layout re-organization

  10. Current Work Programming Interface/Runtime system to enable In-situ Workflow in Staging Area A collection of analysis operations organized as workflow Use ADIOS as coupling interface Treat analysis operations as black box Runtime system: Workflow scheduling Data movement Layout re-distribution Fault tolerance Integration with Deep analysis tools (Hadoop, Paraview/Visit) Work with real-world applications Pixie3D, GTC, GTS, LAMMPS, S3D

  11. Thank you!

More Related