1 / 1

Summary

MySQL Metadata Warehouse. ORNL. XML Files. Swish-e Spider. Parsing Script (using XML::Simple). ASCII Output File. XML Files. CPTEC. Perl Script using DBI. Perl Script using DBI. Perl Script using DBI. Perl Script using DBI. Perl Script using DBI. URL to Region/ Datasets

duaa
Download Presentation

Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MySQL Metadata Warehouse ORNL XML Files Swish-e Spider Parsing Script (using XML::Simple) ASCII Output File XML Files CPTEC Perl Script using DBI Perl Script using DBI Perl Script using DBI Perl Script using DBI Perl Script using DBI URL to Region/ Datasets Profile URL to Registered Dataset Titles URL from Title to Dataset Profile URL to Region Profile Investigation Profile (Figure 1) Dataset Profile (Figure 2) Region Profile (Figure 3) Region/ Datasets Profile (Figure 4) Registered Dataset Titles Enhancing Linkages Between Projects and Datasets: Examples from LBA-ECO for NACPLisa Wilcox, lwilcox@pop900.gsfc.nasa.gov; Amy L. Morrell, amorrell@pop900.gsfc.nasa.gov; Peter C. Griffith, peter.griffith@gsfc.nasa.govOrganization:Science Systems & Applications, Inc. and the Carbon Cycle and Ecosystems Office, NASA Goddard Space Flight Center Summary The Carbon Cycle and Ecosystems Office is developing ideas to build a warehouse of metadata resulting from the North American Carbon Program. The resulting warehouse and applications would be very similar to those developed for the LBA-ECO Project (www.lbaeco.org)1. The harvested metadata would be used to create dynamically generated reports, available at www.nacarbon.org2, which would facilitate access to NACP datasets. Our primary goal is to, as much as possible, associate harvested metadata with its corresponding project group profile. This also addresses high-priority goal #4 of the NACP Data System Task Force to "link the dataset metadata index with the project metadata index generated and maintained by the NACP Office“3. The benefit of achieving this goal will be the maximization of data discovery by association of each dataset with its corresponding NACP project group profile. This provides a greater understanding of the scientific and social context of each dataset. This will be challenging, because the datasets exist in many different formats, residing in many thematic data centers and also distributed among hundreds of investigators. Among other things, this situation creates a lack of consistency in how associated metadata is composed, thereby limiting our ability to fully automate metadata harvesting as well as dynamic generation of a wide variety of associated reports This presentation will demonstrate what we can do for NACP by looking at what we have already done for LBA-ECO. Example Profiles from the LBA-ECO Project Figure 1: LBA-ECO Investigation Profile: • The LBA-ECO website provides a profile (Figure 1) for each LBA-ECO investigation including: • Participants • Abstract(s) • Study sites • Publications • These profiles are very similar to NACP project profiles. Linked from each LBA-ECO profile is a list of associated registered dataset titles, each of which link to a dataset profile (Figure 2) that describes the metadata in a user-friendly way. Technical Overview of What We are Currently Doing for LBA-ECO Metadata for LBA-ECO datasets reside at two repositories: the Centro de Previsão de Tempo e Estudos Climáticos (CPTEC) in Brazil and the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC) in the United States. Each of these metadata records is available online as an XML file. A list containing a file location for each record is provided to the system by a separate process. The harvesting process inputs this list to the Swish-e4 spider and retrieves the files into a single ASCII output file. After the harvesting process is complete, our parsing script ingests the contents of the output file into the MySQL5 database developed for the metadata warehouse. The database was designed to be compliant with the Content Standard for Digital Geospatial Metadata (CSDGM), a widely accepted metadata standard for geospatial data6. The parsing script is based on a crosswalk that maps the harvested metadata to this metadata standard. It is written in PERL, and makes extensive use of the XML::Simple CPAN module7 to parse the metadata. All of the scripts that create our dynamically generated reports are also written in PERL, using PERL’s DBI interface to access our MySQL database. We currently only harvest metadata that is in an XML format. However, not all NACP datasets have corresponding metadata in XML. Therefore, we will need to expand upon our current capabilities by creating harvest and ingest scripts that can extract metadata in other formats. Figure 2: Dataset Profile: The LBA-ECO dataset profiles are dynamically generated from the harvested metadata stored in our MySQL database. Because of the consistency among LBA-ECO metadata records, we have been able to cross-link controlled vocabulary terms (such as geospatial region) with associated region and site profile reports. Additionally, each dataset profile contains hyperlinks to each associated data file at its home data repository. There are also links to other associated information, such as abstracts of related publications. We also use the harvested metadata from the LBA Project in administrative applications to assist quality assurance efforts. These include processes to check for broken hyperlinks to data files, automated emails that inform our administrators when critical metadata fields are updated, dynamically generated reports of metadata records that link to datasets with questionable file formats, and dynamically generated region/site coordinate quality assurance reports. These applications are as important as those that facilitate access to information because they help ensure a high standard of quality for the information. Where possible, we hope to create similar reports for NACP. • References • LBA-ECO, a component of the Large-Scale Biosphere Atmosphere Experiment in Amazonia, http://www.lbaeco.org. • North American Carbon Program (NACP), http://www.nacarbon.org. • Prioritized list of recommendations to CCIWG regarding NACP Data Central, NACP Data System Task Force, July 24, 2006. • Simple Web Indexing System for Humans – Enhanced (Swish-e), http://swish-e.org/. • MySQL, http://www.mysql.com. • Content Standard for Digital Geospatial Metadata (CSDGM), http://www.fgdc.gov/metadata/contstan.html. • Comprehensive Perl Archive Network, http://www.cpan.org/. • Acknowledgements • Our CPTEC and ORNL colleagues: Luiz Horta (CPTEC), Merilyn Gentry (ORNL), and Bob Cook (ORNL). • Scott Dennis Miller, Humberto Ribeiro da Rocha and all of the investigators of the LBA-ECO investigation CD-01. • The generous support given to us by NASA’s Terrestrial Ecology Program.

More Related