Science Metadata. Viv Hutchison Core Science Analytics and Synthesis US Geological Survey Denver, CO [email protected] Data Management Practices for Early Career Scientists NACP All Investigator Meeting February 3, 2013. Topics. Examine information included in a metadata record
Core Science Analytics and Synthesis
US Geological Survey
Data Management Practices for Early Career Scientists
NACP All Investigator Meeting
February 3, 2013
CC image by Alec Couros on Flickr
CC image by Justin See on Flickr
CC image by CIMMYT on Flickr
CC image by SEDAC on Flickr
CC image by acordova on Flickr
CC image by ISAS on Flickr
CC image by kukkurovaca on Flickr
Average Temperature of Observation for Each Species
CC image by Heather Kennedy on Flickr
CC image by I like on Flickr
“Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it.
Several times, I've seen colleagues called into court in order to testify about conditions they have observed.
Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble able to produce field notes, data approval records, and the like under cross-examination. Instead, they were, to back up their testimony.
It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.”
Nelson Williams Eastern Region
Senior climatologists were accused of manipulating important global temperature data
The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work.
Investigations emphasized need for data to be more open to ensure credibility and avoid future misguided controversy
Metadata aids in open science
Metadata: Why Care?
A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters
“The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble.
…Metadata is critical in maintaining data in archives – for understanding data you discover
CC image by US Embassy Guyana on Flickr
CC image by ASEE on Flickr
CC image by mambol on Flickr
Time of data development
Specific details about problems with individual items or specific dates are lost relatively rapidly
General details about datasets are lost through time
Retirement or career change makes access to “mental storage” difficult or unlikely
Accident or technology change may make data unusable
Loss of data developer leads to loss of remaining information
(From Michener et al 1997)
i checked my 2002 email archives, and here is what i found out: it appears that the current 3rd generation algorithm was implemented into operations around Oct-Nov 2002 time frame. cannot say more precisely, as all email correspondence i am looking at, talks about this indirectly. (maybe it's what's refered to as the Phase II algorithm.) At the same time, we had implemented quite a few other changes fixing data bugs and formats: view angle problem, increased digitization in all channel's reflectances and AODs, etc. The jump is deemed due to introducing 3rd generation algorithm, which replaced the 2nd generation. The new numbers (~0.08) look more realistic than the previous ones (~0.05 or so). The changes seen in the data is close to the expected effect of this change. The 3rd gen alg takes into account the exact spectral response, whereas the 2nd gen is generic ("one size fits all"). hopefully this settles the issue..
50% change in global average
Sound information management, including metadata development, can arrest the loss of dataset detail.
Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data.
CC image by waterlilysage on Flickr
Many standards collect similar information…factors to consider:
Your data type:
- Consider the FGDC Content Standard for Digital Geospatial Metadata with one of its profiles: the Biological Data Profile or the Shoreline Data Profile.
Examples of Tools:
Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats
CC image by on Google Images
CC image by Shelly Munkberg on Flickr
CC image by mujalifah on
CC image by kruuscht on Flickr
Titles, Titles, Titles…
Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps (1961-1983) (when)
CC image by dolfi on
Vague: We checked our work and it looks complete.
Specific: We checked our work using a random sample of 5 monitoring sites reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections.
CC image by PNASH on Flickr
Example: USGS Biocomplexity Thesaurus (over 9,500 terms)
CC image by Marco Arment on Flickr
CC image by Ben on Google Images
Seven Major Metadata Sections:
Section 1 - Identification Information*
Section 2 - Data Quality Information
Section 3 - Spatial Data Information
Section 4 - Spatial Reference Information
Section 5 - Entity and Attribute Information
Section 6 - Distribution Information
Section 7 - Metadata Information*
Three Supporting Sections:
Section 8 - Citation Information*
Section 9 - Time Period Information*
Section 10 - Contact Information*
* Minimum required metadata
Examples of metadata search portals:
CC image by RGB12 on Flickr
Creating robust metadata is in your OWN best interest!