1 / 22

A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability. File Formats: Image Files. Organize and store digital images that are composed of either pixel or vector (geometric) data Bitmap-based Created by scanner and digital camera TIF, JPG, BMP Vector-based

flavio
Download Presentation

A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Domain-Specific Modeling Language forScientific Data Composition and Interoperability

  2. File Formats: Image Files • Organize and store digital images that are composed of either pixel or vector (geometric) data • Bitmap-based • Created by scanner and digital camera • TIF, JPG, BMP • Vector-based • Geometric description + Bitmap • Resolution Independent & Infinitely scalable • Font, DRW, CGM

  3. File Formats: Music and Audio Files • Storing audio data that are produced by audio-to-digital converters • Key Parameters • Sample Rate, Resolution, Number of channels • Uncompressed formats • WAV, AIFF and AU • Lossless compression Formats • FLAC, Lossless Windows Media Audio (WMA) • Lossy compression Formats • MP3, Lossy Windows Media Audio (WMA)

  4. File Formats: Text Files • File formats that are structured as plain text, representing a sequence of lines • ASCII, TXT

  5. File Formats: Compound File Formats • Used to structure the contents of a document in the file • Contain a number of independent data streams that are organized in a hierarchy • Stream: files in a file system • Storage: sub-directories in a file system • MS Office, OpenOffice

  6. Characteristics of Generic File Formats • Can handle one or two data types • Numeric data or alphanumeric data • May have a limitation of the file size • Mostly limited to a maximum file size of 2GB • May increase file I/O time linearly as the file size grows An In-Depth Examination of Java I/O Performance and Possible Tuning Strategies http://pages.cs.wisc.edu/~remzi/Classes/736/Fall2000/Project-Writeups/KaiHongfei.html

  7. Characteristics of Generic File Formats • Can handle one or two data type • Numeric data or alphanumeric data • May have a limitation of the file size • Mostly limited to a maximum file size of 2GB • May increase file I/O time linearly as the file size is grew These generic file formats are not appropriate for storing and retrieving scientific data because the files were not designed to maintain high volume of complex scientific data, such as high resolution images, massive numerical data, and graphs. An In-Depth Examination of Java I/O Performance and Possible Tuning Strategies http://pages.cs.wisc.edu/~remzi/Classes/736/Fall2000/Project-Writeups/KaiHongfei.html

  8. Scientific Data Format: NetCDF3 • Network Common Data Format • Machine-independent file format • Support a wide variety of platformsincluding Linux, MacOS, & Windows • Representing multi-dimensional arrayswith ancillary data … Time = 1 Time = n

  9. Scientific Data Format: HDF5 • Hierarchical Data Format • File format for managing any kind of data • Support high volume and/or complex data • Platform-independent • Flexible, efficient storageand I/O

  10. Characteristics of the Scientific Data File Formats • Self-Descriptive • Contain metadata to inform the contained data type and their organization • Directly Accessible • Can access arbitrary data through APIs • Concurrently Accessible • Multiple threads or processes can access data simultaneously • Enable high performance computing and speedier access • Archivable • Have their own archiving mechanism to backup and restore a high volume of data

  11. Challenges in Using the Scientific Data File Formats • Use different representations to organize the file structure • Each file format needs its own data visualization and composition • It is difficult to exchange data between two or more scientific data formats • Manage the evolution of APIs • Challenging to verify that APIs are evolved in accordance with the evolution of file specification • Maintain stability of existing applications from API evolution • User applications are subject to change of APIs • Limited support for data integration among heterogeneous scientific data formats

  12. Framework for Scientific Data File Management

  13. NEW SLIDES NEEDED HERE TO INTRODUCE DSM!

  14. Model-Driven Engineering (MDE) and Domain-Specific Modeling (DSM) • MDE:specifies and generates software systems based on high-level models • Domain-Specific Modeling (DSM): a paradigm of MDE that uses notations and rules from an application domain • Metamodel:defines a Domain-specific Modeling language (DSML) by specifying the entities and their relationships in an application domain • Model:aninstance of the metamodel • Model Transformation:a process that converts one or more models to various levels of software artifacts (e.g., other models, source code)

  15. Unifying the representation of file structure organization • Adapt a DSML to build a tool for visualizing & composing the scientific file format in a unified way Analyze data model of each scientific file format Common Data Model Feature Model Variable Data Model Define DSML from Feature Model Grammar & Syntax Implement DSML DSML Tool

  16. Unifying the representation of file structure organization • Feature Model for Scientific File Format • Describe some highlights here • And here

  17. Unifying the representation of file structure organization • Content Composer • DSML Modeling tool for scientific data file • Implemented by using GEMS

  18. API Abstraction Layer • Help to protect user applications from the evolution of APIs

  19. Integrating data among heterogeneous data formats • Content Mapper • Define rules how to map data from a scientific data format to another • Content Verifier • Verify the correctness of the file composition • Verify the correctness of mapping rule

  20. Summary • From the prototype of the framework • A DSML can help to build a graphical tool to compose and support interoperability across scientific file structures • Adoption of the layered architecture in the framework can help to maintain the independence of each layer • Both the API abstraction layer and the layered architecture are essential to develop and maintain user applications • Further works • Create metamodels that include full specification of each scientific file • Categorizing APIs in accordance to their intended use for API abstraction layer • Develop metamodels for managing API evolution

  21. Thank you!

  22. Example of Scientific Data Format: OPeNDAP • Client-server protocol for scientific data access • Targeted oceanographic data management

More Related