Fast Adaptive Storage and Retrieval

Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and Engineering University of California, San Diego

Motivation Some applications are able to distinguish interesting features from background data using on-line analysis

Features

Animation

Fast Adaptive Storage and Retrieval • If the volume fraction of interesting data is small, then we can reduce storage, memory, and network bandwidth requirements significantly by storing only what is “needed” • We call a scheme that realizes this capabilityAdaptive Storage and Retrieval (FASTR) • This is a new paradigm for scientific users, since they are reluctant to part with their data • We use resources only to the extent that we require them: remote knowledge discovery and data browsing

The KeLP Project • C++ run time libraries for parallel application & library development • Hide low level details without sacrificing performance • Irregular block structured data • Express communication at a high level using intuitive geometric set operations • Also applies to data intensive applications • KeLP I/O: out of core (Bradley Broom, Rice)

Data intensive application of KeLP • KDistuf • Turbulent flow with Direct Numerical Simulation • Collaboration involving K. Nomura (UCSD MAE), W. Kerney and D. Shalit (UCSD CSE),G. Balls (UCSDSC), P. Diamessis (USC) • Content-based data compression • Borrow structured adaptive mesh refinement grid techniques to… • Capture features at full resolution • Discard remaining background data

More about the application • Turbulent mixing in stably stratified flow under the influence of background shear • Solve the incompressible Navier Stokes equations • Follow the time evolution of regions of overturned dense fluid, which are the main agents of stirring and mixing “The efficiency of mixing in turbulent patches: inferences from direct simulations and microstructure observations,” in press, J. Phys. Ocean. Smyth, Moum, and Caldwell, 2001.

Information discovery • Oceanographic observations are incomplete: restricted to 1 dimensional observations • Discovery: time evolution, energy dissipation and lifetime of overturn regions, which have irregular shapes Bill Smyth, Dept. Oceanic & Atmospheric Sciences,Oregon State University

Fast Adaptive Storage and Retrieval • Compression depends on the data, currently on 1283 • Best case ~ 20:1 compression (10 GB  500 MB), worst case ~ 2.8:1 • Lempel-Ziv (gzip) give us only 10%

Further savings: another application • Use volume tracking [Silver, Rutgers] to follow individual features • FASTR permits us to extract only the data we need out of the many features present • Computational volume: 2M pts • Average feature size: 1K points • Maximum feature size: 20K pts • Saves additional two orders of magnitude in communication bandwidth requirements • Perform local analysis on a workstation

Future plans • Develop remote analysis capability • Integrate with DTF data handling infrastructure • Larger scale simulations on Blue Horizon and on clusters: 2563 • Study vortex pairs in a stratified turbulent environment • Improved understanding of aircraft wake vortices • Practical importance for air traffic control

Remote analysis capability • Perform analysis on data sets stored remotely, e.g. Data Cutter • We can perform some data analysis on a local workstation • For highly intensive data analysis, we can use higher end resources, but again we access only the data we need

Publications and people • FASTR is based on a research prototype called MOLD, which is the M.S. thesis research of UCSD CSE student William Kerney “MOLD: A System for Breaking Down Large Visualization and Post-Processing Problems.” Expected March 2002. • Peter Diamessis, then a PhD student with Keiko Nomura (UCSD MAE Dept), used MOLD to carry out an exploration of overturns • An Investigation of Vortical Structures and Density Overturns in Stably Stratified Homogeneous Turbulence by Means of Direct Numerical Simulation, P. Diamessis, PhD thesis, 2001 • “Automated Tracking of Turbulent Structures in Direct Numerical Simulation,” P. Diamessis et al, PARA 2002, Helsinki, Finland. To appear.

Software availability • FASTR- contact us • KeLP • Hardened version of KeLP, AKA KeLP1.4 • http://www.cse.ucsd.edu/groups/hpcl/scg/kelp • NPACI Blue Horizon, Sun HPC, Cray T3E, Linux clusters • Workstations: Solaris, Linux, etc. • Dual tier variant, KeLP2.1: hierarchical KeLP for SMP clusters and SMP based machines (e.g. BH)

Fast Adaptive Storage and Retrieval