1 / 23

XML for Science Data Access

XML for Science Data Access. R. Suresh (NASA/MTECH) ( suresh@mayurtech.com ) Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov). CEOS Joint Sub-Group Meeting, Frascati, Italy. Introduction. Earth Science data is exploding in resolution complexity heterogeneity volume

ulla
Download Presentation

XML for Science Data Access

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com) Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov) CEOS Joint Sub-Group Meeting, Frascati, Italy

  2. Introduction • Earth Science data is exploding in • resolution • complexity • heterogeneity • volume • Access to data collection is not a mere website • Data Access needs to provide data services across the user community • XML related technologies can provide building blocks to improve data access

  3. XML Technologies • XML is really a set of closely-related technologies, including • XML: generalized markup • XLink and URI: interobject reference and linking • XML-Schema: document model definition • XSL: transformation and presentation • RDF: metadata and and inference • XQuery: retrieval from XML documents • SOAP: remote procedure calling • Key commonalities: • draft standards from WWW consortium • text-based • extensible/portable

  4. XML Technologies • Suitable for metadata and "light data" • Structured • Hierarchical • Limited graph-like relationships (e.g. ID's) • Portable across • languages • operating systems • Becoming ubiquitous • standard parser API's (DOM, SAX) • parsers available in all major languages, platforms

  5. XML Issues • No semantics associated with markup • No random-access • No non-textual content • Document Type Definition • Not itself encoded in XML • No constraints on element content • Context-free • Syntax of element contents independent of element’s position in document tree • No cardinality constraints

  6. XML for Scientific Data Access • Good because it supports more than one data collection across: • discipline or sub-discipline (Ocean, atmosphere, Land) • multiple data types (e.g. satellite swath, Grid, point, vector, raster) • access modality (e.g. browsing, search, visualization, simulation) • Requires the generation of use scenarios • input from scientific community • Develop ontologies • Identify requirements

  7. How to Use XML for Scientific Data Access(cont.) • Develop data and metadata models to enable the scenarios • identify community-wide data semantics • formal, incremental process • ongoing review and documentation • target key semantics for scenarios • use extensible data modeling technologies (e.g. XML, RDF, HDF) to implement data models • Link scenarios to build network of data services • Other concerns • security • intellectual property • data preservation

  8. Building Blocks • XML • Translators • Description Languages • Applications • Advantages • Foster Evolution • Preserves interoperability • Internationalized text (unicode) • Structured text

  9. XML based data format for interoperability FITS CEOS netCDF XML CDF HDF BUFR GRIB SDTS

  10. Extensible Data Format (XDF) • What is XDF? • XDF is developed at the NASA GSFC • XML-based language for encapsulating scientific data. XDF aims to be the (mathematical) kernel of other fully-featured, discipline-oriented scientific formats written in XML. • key features: • Hierarchical data structures • Any dimensional arrays merged with coordinate information • High dimensional tables merged with field information, variable resolution • Easy wrapping of existing data • User specified coordinate systems • Searchable ASCII metadata • Extensibility to new features.

  11. XDF Features • Structures, arrays, parameters, axes • Clear coordinate information • Unrestrictive binary and ASCII formats. • Examples: EOS, astronomy, biology, etc. • OO Perl and Java application interfaces • FITSML - adopt FITS keywords and an XML kernel • Converters between FITS, FITSML, HDF, and CDF. • XDF home page: http://tarantella.gsfc.nasa.gov/xml/XDF_home.html

  12. A simplified structure with an image <XDF> <structure> <array> <axis name="X-axis"> <values> a list of values along one dimension</values> </axis> <axis name="Y-axis"> <values> a list of values along other dimension</values> </axis> <read> info on the ordering of the data values and record format. <recordFormat>...</recordFormat> </read> <data> The Data goes here </data> </array> <array> Some other array of data... </array> </structure> </XDF>

  13. Advantages of XML based translators • Universal acceptance • Separation of information and presentation • Automatic validation • File inclusion (Internal and External Entities) • Hierarchical • Parsers • Stylesheet languages • Field specific languages • Extensible namespace

  14. Earth Science Markup Language (ESML) • ESML is currently developed at the University of Alabama, Huntsville under a NASA grant. • Specialized Markup language for Earth Science Metadata based on XML • Machine readable and interpretable • Representation of the structure and content of any data file, regardless of data format • Human readable • External metadata files that can be generated by either data producer or consumer (at collection, data set or granule level) • Supports data/service interoperability

  15. ESML • Users can describe and publish files using ESML • Users can describe ASCII and Binary data • ESML will facilitate data discovery • Metadata can be indexed and searched by web search engines • Allows users to utilize internet search engines to locate data • Web site: http//esml.itsc.uah.edu

  16. ODL – XML Translator • A stand alone Java program • Extracts ODL metadata from HDF file • Displays metadata using style sheet • This program will be useful to build a metadata catalog • system in XML

  17. HDF EOS Metadata Each HDF file contains three metadata elements: Inventory, archive and structural HDF- EOS Grid HDF- EOS Point HDF EOS has three file types or objects. Each file type will contain all three metadata elements HDF-EOS Swath XML

  18. Metadata Tools & Systems– XML • Global Change Master Directory (GCMD) - NASA • The Earth System Markup Language (ESML, University of Alabama-Huntsville); • The DIstributed MEtadata System (DIMES, George Mason University); • The aggregation data catalog that is part of the Distributed Oceanographic Data System (DODS, University of Rhode Island); • GDLIP, General Digital Library Interchange Protocol (Alexandria Digital Library); • Digital Library for Earth System Education (DLESE); and • Web Mapping Testbed (OGC, Digital Earth).

  19. Tools and Systems • VISAD infrastructure from SSEC http://www.ssec.wisc.edu/~billh/visad.html; • Live access server – PMEL http://www.ferret.noaa.gov/nopp/main.pl? • WXWise applets University of Wisconsin-Madison http://itg1.meteor.wisc.edu/wxwise/ • The Virtual Exploratorium http://www.unidata.ucar.edu/workshops/ShapingFuture/Presentations/Mohan_files/frame.htm • EDMI (Earth Data Multimedia Instrument, Bruce Caron, New Media Studio); and • WorldWatcher from Northwestern University University of Northern Colorado http://www.worldwatcher.nwu.edu/

  20. Unified Access to Metadata User/system User/system User/system XML layer (database, access tool) Conceptual/physical layer Various Schemas (describing various “types” of metadata) Meta/Data System Meta/Data System Meta/Data System Meta/Data System

  21. New Technologies: The Semantic Web • Multiple metadata objects (RDF documents) linked together • Ontologies • Taxonomies • Inference rules • Promise: agents can synthesize information from multiple documents • Like a world-wide ORDBMS T. Berners-Lee et al, Scientific American, May 2001

  22. Semantic Web • "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web • In the Semantic Web we will need: • Machines talking to machines – semantics need to be unambiguously declared • Joined-up data – enabling complex tasks based on information from various sources • Wide scope – from, say, home to government to commerce • Trust – both in data and who is saying it • This is not going to be easily achieved

  23. Conclusion • XML usage has increased in scientific data applications • Usage is not common across the systems • Web Services and Data Services • Semantic web for scientific applications is in infancy

More Related