1 / 7

Big Data Vs. (Traditional) HPC

Big Data Vs. (Traditional) HPC . Gagan Agrawal Ohio State . Big Data Vs. (Traditional) HPC. They will clearly co-exist Fine-grained simulations will prompt more `big-data’ problems Ability to analyze data will prompt finer-grained simulations

zizi
Download Presentation

Big Data Vs. (Traditional) HPC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Vs. (Traditional) HPC GaganAgrawal Ohio State ICPP Big Data Panel (09/12/2012)

  2. Big Data Vs. (Traditional) HPC • They will clearly co-exist • Fine-grained simulations will prompt more `big-data’ problems • Ability to analyze data will prompt finer-grained simulations • Even instrument data can prompt more simulations • Third and Fourth Pillars of Scientific Research • Critical Need • HPC community must get very engaged in `big-data’ ICPP Big Data Panel (09/12/2012)

  3. Other Thoughts • Onus on HPC Community • Database, Cloud, and Viz communities active for a while now • Abstractions like MapReduce are neat! • So are Parallel and Streaming Visualization Solutions • Many existing solutions very low on performance • Do people realize how slow Hadoop really is? • And, yet, one of the most successful open source software? • We are needed! • Programming model design and implementation community hasn’t even looked at `big-data’ applications • We must engage application scientists • Who are often struck in `I don’t want to deal with the mess’ ICPP Big Data Panel (09/12/2012)

  4. Impact on Leadership Class Systems • Unlike HPC, commercial Sector has a lot of experience in `Big-Data’ • Facebook, Google • They seem to do fine with large fault-tolerant commodity clusters • `Big-Data’ might create a push back from memory / I/O Bound architecture trends • Might make journey to Exascale harder though • `Big-data’ problems should certainly be considered while addressing fault-tolerance and power challenges ICPP Big Data Panel (09/12/2012)

  5. Open Questions • How do we develop parallel data analysis solutions? • Hadoop? • MPI + file I/O calls? • SciDB – array analytics? • Parallel R? • Desiderata • No reloading of data (rules out SciDBand Hadoop) • Performance while implementing new algorithms (rules out parallel R) • Transparency with respect to data layouts and parallel architectures ICPP Big Data Panel (09/12/2012)

  6. Our Ongoing Work: MATE++ • A very efficient Map-Reduce-like System for Scientific Data Analytics • MapReduce and another reduction based API • Can plug and play with different data formats • No reloading of data • Flexibly use different forms of parallelism • GPUs, Fusion Architecture … ICPP Big Data Panel (09/12/2012)

  7. Data Management/Reduction Solutions • Must provide Server-side data sub-setting, aggregation and sampling • Without reloading data into a `system’ • Our Approach: Light-weight data management solutions • Automatic Data Virtualization • Support virtual (e.g. relational) view over NetCDF, HDF5 etc. • Support sub-setting and aggregation using a high-level language • A new sampling approach based on bit-vector • Create lower-resolutions representative datasets • Measure loss of information with respect to key statistical measures ICPP Big Data Panel (09/12/2012)

More Related