1 / 16

Update on Damasc

Update on Damasc. Joe Buck. October 19th, 2010. A year later. Last year: we outlined our vision Next year: Carlos and Alkis covered that Today: Where we’re at. What’s in a name?. Last year I presented on Dice (Data Intensive Computation Environment)

faunia
Download Presentation

Update on Damasc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Update on Damasc • Joe Buck October 19th, 2010

  2. A year later • Last year: we outlined our vision • Next year: Carlos and Alkis covered that • Today: Where we’re at

  3. What’s in a name? • Last year I presented on Dice (Data Intensive Computation Environment) • We’ve change the name to Damasc, which incorporates parts of DICE but is more focused on data management

  4. Goal of Damasc • To allow applications to express their internal data structure to the storage system • Enable more intelligent storage layout which leads to increased functionality in the storage system

  5. Application data-element alignment in parallel FS • Created traces for common access patterns over scientific data • Mapped those traces onto a theoretical parallel file system configuration • Analyzed traces to quantify IO savings from aligning data to application data element boundaries

  6. Application data-element alignment in parallel FS - cont. We want to go from this We want to go from this

  7. MapReduce over scientific data • Goal was to implement NetCDF Operators (NCO) as MapReduce programs • Base NetCDF file decomposed via C++ application. Constituent parts stored in HDFS • Currently being worked on

  8. MapReduce over scientific data - continued We want to go from this To this

  9. MapReduce over scientific data - continued We want to go from this Or better yet

  10. Tracing of scientific application data access • Created a tracing layer for ParaView that logged data access from the application’s perspective • Noah will talk more about tracing

  11. Scientific data in a key-value store • Project to enable NetCDF ingestion into HBase

  12. Scientific data in a key-value store - continued

  13. Declarative queries over NetCDF • Integration of NetCDF format into Zorba query engine • Enabling XML queries over NetCDF • Incremental parsing to avoid loading entire file • Future work: NetCDF methods in XML Query

  14. Declarative queries over NetCDF - continued

  15. Conclusion • Last year was about exploring the problem space • Applying lessons learned, moving forward

  16. Questions • Thank you for your time • buck@soe.ucsc.edu • srl.ucsc.edu/projects/damasc

More Related