Science metadata
1 / 39

Science Metadata - PowerPoint PPT Presentation

  • Uploaded on

Science Metadata. Viv Hutchison Core Science Analytics and Synthesis US Geological Survey Denver, CO [email protected] Data Management Practices for Early Career Scientists NACP All Investigator Meeting February 3, 2013. Topics. Examine information included in a metadata record

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Science Metadata' - sumana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Science metadata

Science Metadata

Viv Hutchison

Core Science Analytics and Synthesis

US Geological Survey

Denver, CO

[email protected]

Data Management Practices for Early Career Scientists

NACP All Investigator Meeting

February 3, 2013


  • Examine information included in a metadata record

  • Examples of metadata standards and how to choose one to use

  • Illustrate the value of metadata to data users, data providers, and organizations

  • Tips on how to write quality metadata records

  • Publishing metadata

CC image by Alec Couros on Flickr

Data collection
Data Collection

CC image by Justin See on Flickr

CC image by CIMMYT on Flickr

CC image by SEDAC on Flickr

CC image by acordova on Flickr

CC image by ISAS on Flickr

CC image by kukkurovaca on Flickr

From field notes to datasets
From Field Notes to Datasets

Average Temperature of Observation for Each Species

From datasets to published papers
From Datasets to Published Papers

CC image by Heather Kennedy on Flickr

Think about scenarios in working with data
Think about Scenarios in Working with Data

  • Providing data to another researcher:

    • Why were the data created?

    • What limitations, if any, do the data have?

    • What does the data mean?

    • How should the data be cited if it is re-used in a new study?

  • Receiving data from another researcher:

    • What are the data gaps?

    • What processes were used for creating the data?

    • Are there any fees associated with the data?

    • In what scale were the data created?

    • What do the values in the tables mean?

    • What software do I need in order to read the data?

    • What projection are the data in?

    • Can I give these data to someone else?

Why care about metadata
Why Care About Metadata?

  • Fourth Paradigm: scientific breakthroughs will increasingly be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.

  • “Metadata must be preserved when scientific data is generated…” -- Jim Gray, The Fourth Paradigm

  • Further the time/space distance between data producer and re-use, the more detailed metadata that is required.

Metadata why care
Metadata: Why Care?

“Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it.

Several times, I've seen colleagues called into court in order to testify about conditions they have observed.

Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble able to produce field notes, data approval records, and the like under cross-examination. Instead, they were, to back up their testimony.

It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.”

Nelson Williams Eastern Region

USGS Water

Metadata why care1
Metadata: Why Care?

Senior climatologists were accused of manipulating important global temperature data

The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work.

Investigations emphasized need for data to be more open to ensure credibility and avoid future misguided controversy

Metadata aids in open science

Planet hidden in hubble archives science news feb 27 2009

Metadata: Why Care?

“Planet hidden in Hubble archives”Science News (Feb. 27, 2009)

A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters

“The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble.

…Metadata is critical in maintaining data in archives – for understanding data you discover

The value of metadata








The Value of Metadata

What is the value to data developers
What is the Value to Data Developers?

  • Metadata allows data developers to:

    • Avoid data duplication

    • Share reliable information

    • Publicize efforts – promote the work of a scientist and his/her contributions to a field of study

    • Reduce Workload

CC image by US Embassy Guyana on Flickr

What is the value to data users
What is the Value to Data Users?

  • Metadata gives a user the ability to:

    • Search, retrieve, and evaluate data set information from both inside and outside an organization

    • Find data: Determine what data exists for a geographic location and/or topic

    • Determine applicability: Decide if a data set meets a particular need

    • Discover how to acquire the dataset you identified; process and use the dataset

CC image by ASEE on Flickr

What is the value to organizations
What is the Value to Organizations?

  • Metadata helps ensure an organization’s investment in data:

    • Documentation of data processing steps, quality control, definitions, data uses, and restrictions

    • Ability to use data after initial intended purpose

  • Transcends people and time:

    • Offers data permanence

    • Creates institutional memory

  • Advertises an organization’s research:

    • Creates possible new partnerships and collaborations through data sharing

CC image by mambol on Flickr

Information entropy
Information Entropy

Time of data development

Specific details about problems with individual items or specific dates are lost relatively rapidly

General details about datasets are lost through time


Retirement or career change makes access to “mental storage” difficult or unlikely

Accident or technology change may make data unusable

Loss of data developer leads to loss of remaining information


(From Michener et al 1997)

Memory check
Memory Check

i checked my 2002 email archives, and here is what i found out: it appears that the current 3rd generation algorithm was implemented into operations around Oct-Nov 2002 time frame. cannot say more precisely, as all email correspondence i am looking at, talks about this indirectly. (maybe it's what's refered to as the Phase II algorithm.) At the same time, we had implemented quite a few other changes fixing data bugs and formats: view angle problem, increased digitization in all channel's reflectances and AODs, etc. The jump is deemed due to introducing 3rd generation algorithm, which replaced the 2nd generation. The new numbers (~0.08) look more realistic than the previous ones (~0.05 or so). The changes seen in the data is close to the expected effect of this change. The 3rd gen alg takes into account the exact spectral response, whereas the 2nd gen is generic ("one size fits all"). hopefully this settles the issue..


50% change in global average

Information entropy1
Information Entropy

Sound information management, including metadata development, can arrest the loss of dataset detail.



Still there are occasional concerns about creating metadata
Still…There are Occasional Concerns About Creating Metadata

Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data.

CC image by waterlilysage on Flickr

Let s address these concerns
Let Metadata’s Address these Concerns…

Choosing a metadata standard
Choosing a Metadata Standard Metadata

Many standards collect similar information…factors to consider:

Your data type:

  • Are you working mainly with GIS data? Rastor/vector or point data? Do you have biological or shoreline information in your dataset?

    - Consider the FGDC Content Standard for Digital Geospatial Metadata with one of its profiles: the Biological Data Profile or the Shoreline Data Profile.

  • Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling?

    • If so, then consider using the ISO 19115-2 standard

  • Are you mainly working with ecological data?

    • Consider Ecological Metadata Language (EML)

Choosing a metadata standard1
Choosing a Metadata Standard Metadata

  • Your organization’s policies: do they state which standard to use?

  • What tools are available to create metadata?

    Examples of Tools:


    • Mermaid (NOAA)

    • Metavist (Forest Service) -- Online Metadata Editor (USGS)


    • - Morpho (KNB)ISO: -- XML Spy or Oxygen

      --- CatMD

      Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats

Steps to create quality metadata
Steps to Create Quality Metadata Metadata

  • Organize your information

    • Did you write a project abstract to obtain funding for your proposal? Re-use it in your metadata!

    • Did you use a lab notebook or other notes during the data development process that define measurements and other parameters?

    • Do you have the contact information for colleagues you worked with?

    • What about citations for other data sources you used in your project?

CC image by on Google Images

Steps to create quality metadata1
Steps to Create Quality Metadata Metadata

  • Write your metadata using a metadata tool

Steps to create quality metadata2
Steps to Create Quality Metadata Metadata

  • Review for accuracy and completeness

  • Have someone else read your record

  • Revise the record, based on comments from your reviewer

  • Review once more before you publish

CC image by Shelly Munkberg on Flickr

CC image by mujalifah on


Tips for writing quality metadata
Tips for Writing Quality Metadata Metadata

  • Do not use jargon -- define technical terms

    and acronyms:

    • CA, LA, GPS, GIS : what do these mean?

  • Clearly state data limitations

    • E.g., data set omissions, completeness of data

    • Express considerations for appropriate re-use of the data

  • Use “none” or “unknown” meaningfully

    • None usually means that you knew about data and nothing existed (e.g., a “0” cubic feet per second discharge value)

    • Unknown means that you don’t know whether that data existed or not (e.g., a null value)

CC image by kruuscht on Flickr

Tips for writing quality metadata1
Tips for Writing Quality Metadata Metadata

Titles, Titles, Titles…

  • Titles are critical in helping readers find your data

    • While individuals are searching for the most appropriate data sets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs.

    • Treat the title as the opportunity to sell your dataset.

  • A complete title includes: What, Where, When, Who, and Scale

  • An informative title includes: topic, timeliness of the data, specific information about place and geography

Tips for writing quality metadata2
Tips for Writing Quality Metadata Metadata

  • A Clear Choice: Which title is better?

  • Rivers


  • Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps (1961-1983)

    Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps (1961-1983) (when)

CC image by dolfi on


Tips for writing quality metadata3
Tips for Writing Quality Metadata Metadata

  • Be specific and quantify when you can! The goal of a metadata record is to give the user enough information to know if they can use the data without contacting the dataset owner.

    Vague: We checked our work and it looks complete.

    Specific: We checked our work using a random sample of 5 monitoring sites reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections.

CC image by PNASH on Flickr

Tips for writing quality metadata4
Tips for Writing Quality Metadata Metadata

  • Use descriptive and clear writing

  • Fully qualify geographic locations

  • Select keywords wisely - use thesauri for keywords whenever possible

    Example: USGS Biocomplexity Thesaurus (over 9,500 terms)

CC image by Marco Arment on Flickr

Tips for writing quality metadata5
Tips for Writing Quality Metadata Metadata

  • Remember: a computer will read your metadata

  • Do not use symbols that could be misinterpreted: Examples: ! @ # % { } | / \ < > ~

  • Do not use tabs, indents, or line feeds/carriage returns

  • When copying and pasting from other sources, use a text editor (e.g., Notepad) to eliminate hidden characters

CC image by Ben on Google Images

Tips for writing quality metadata6
Tips for Writing Quality Metadata Metadata

  • Fully define entities, attributes, units of measure

  • Ignore temptation to only fill in mandatory fields in the standard -- skipping sections of metadata standard labeled “mandatory if applicable” or “optional” are often critical portions of the standard

    • Example:

Seven Major Metadata Sections:

Section 1 - Identification Information*

Section 2 - Data Quality Information

Section 3 - Spatial Data Information

Section 4 - Spatial Reference Information

Section 5 - Entity and Attribute Information

Section 6 - Distribution Information

Section 7 - Metadata Information*

Three Supporting Sections:

Section 8 - Citation Information*

Section 9 - Time Period Information*

Section 10 - Contact Information*

* Minimum required metadata

Share your metadata distribution
Share Your Metadata: Distribution Metadata

  • Share your metadata with other researchers

    Examples of metadata search portals:


      • Federal e-gov geospatial data portal

    • Metacat

      • Repository for data and metadata


    • US Geological Survey

      • USGS Core Science Metadata Clearinghouse:

    • ArcGIS Online

      • ESRI sponsored national geospatial data portal

CC image by RGB12 on Flickr

Dataone search
DataONE Search Metadata

Summary Metadata

  • Metadata is documentation of data

  • A metadata record captures critical information about the content of a dataset

  • Metadata allows data to be discovered, accessed, and re-used

  • A metadata standard provides structure and consistency to data documentation

  • Standards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resources

  • Metadata is of critical importance to data developers, data users, and organizations

  • Writing quality metadata is important because records are expected to last with the data over decades

  • Metadata completes a dataset.

    Creating robust metadata is in your OWN best interest!

Additional slides

Additional Slides Metadata

Science Metadata