10 likes | 137 Views
This study presents an innovative technical solution for managing massive scientific datasets in climate and cosmology. By integrating bitmap indexing with stratified random sampling, researchers can effectively scale down extensive queries while maintaining robust error estimates. This approach allows scientists to interactively and automatically reduce data queries into manageable samples, significantly accelerating the analysis workflow in these fields. The findings, authored by Su, Agrawal, Woodring, Myers, Wendelberger, and Ahrens, are set to appear in HPDC 2013.
E N D
Combining Statistics, Sampling, and Indexing to Quantitatively Scale Massive Scientific Data Science Problem: Climate and cosmological analysis and exploration via queries on their massive data sets may still result in a massive amount of data for the scientist to comb through and/or transfer. Technical Solution: Combine bitmap indexing with stratified random sampling (stratified by bins, random within bins) with precalculated errors to quickly and quantitatively scale data. Science Impact: Climate and cosmological scientists are able to interactively and automatically scale data queries down to smaller samples with automatic error estimates, accelerating their scientific analysis workflow. Su, Agrawal, Woodring, Myers, Wendelberger, and Ahrens. “Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices.” To appear in HPDC 2013.