Science Metadata - PowerPoint PPT Presentation

Science metadata
1 / 39

  • Uploaded on
  • Presentation posted in: General

Science Metadata. Viv Hutchison Core Science Analytics and Synthesis US Geological Survey Denver, CO Data Management Practices for Early Career Scientists NACP All Investigator Meeting February 3, 2013. Topics. Examine information included in a metadata record

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Science Metadata

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Science metadata

Science Metadata

Viv Hutchison

Core Science Analytics and Synthesis

US Geological Survey

Denver, CO

Data Management Practices for Early Career Scientists

NACP All Investigator Meeting

February 3, 2013



  • Examine information included in a metadata record

  • Examples of metadata standards and how to choose one to use

  • Illustrate the value of metadata to data users, data providers, and organizations

  • Tips on how to write quality metadata records

  • Publishing metadata

CC image by Alec Couros on Flickr

The data life cycle

The Data Life Cycle

Data collection

Data Collection

CC image by Justin See on Flickr

CC image by CIMMYT on Flickr

CC image by SEDAC on Flickr

CC image by acordova on Flickr

CC image by ISAS on Flickr

CC image by kukkurovaca on Flickr

From field notes to datasets

From Field Notes to Datasets

Average Temperature of Observation for Each Species

From datasets to published papers

From Datasets to Published Papers

CC image by Heather Kennedy on Flickr

Metadata is a critical part of the data picture

Metadata is a critical part of the data picture

CC image by I like on Flickr

Think about scenarios in working with data

Think about Scenarios in Working with Data

  • Providing data to another researcher:

    • Why were the data created?

    • What limitations, if any, do the data have?

    • What does the data mean?

    • How should the data be cited if it is re-used in a new study?

  • Receiving data from another researcher:

    • What are the data gaps?

    • What processes were used for creating the data?

    • Are there any fees associated with the data?

    • In what scale were the data created?

    • What do the values in the tables mean?

    • What software do I need in order to read the data?

    • What projection are the data in?

    • Can I give these data to someone else?

Why care about metadata

Why Care About Metadata?

  • Fourth Paradigm: scientific breakthroughs will increasingly be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.

  • “Metadata must be preserved when scientific data is generated…” -- Jim Gray, The Fourth Paradigm

  • Further the time/space distance between data producer and re-use, the more detailed metadata that is required.

Metadata why care

Metadata: Why Care?

“Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it.

Several times, I've seen colleagues called into court in order to testify about conditions they have observed.

Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble able to produce field notes, data approval records, and the like under cross-examination. Instead, they were, to back up their testimony.

It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.”

Nelson Williams Eastern Region

USGS Water

Metadata why care1

Metadata: Why Care?

Senior climatologists were accused of manipulating important global temperature data

The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work.

Investigations emphasized need for data to be more open to ensure credibility and avoid future misguided controversy

Metadata aids in open science

Planet hidden in hubble archives science news feb 27 2009

Metadata: Why Care?

“Planet hidden in Hubble archives”Science News (Feb. 27, 2009)

A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters

“The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble.

…Metadata is critical in maintaining data in archives – for understanding data you discover

The value of metadata








The Value of Metadata

What is the value to data developers

What is the Value to Data Developers?

  • Metadata allows data developers to:

    • Avoid data duplication

    • Share reliable information

    • Publicize efforts – promote the work of a scientist and his/her contributions to a field of study

    • Reduce Workload

CC image by US Embassy Guyana on Flickr

What is the value to data users

What is the Value to Data Users?

  • Metadata gives a user the ability to:

    • Search, retrieve, and evaluate data set information from both inside and outside an organization

    • Find data: Determine what data exists for a geographic location and/or topic

    • Determine applicability: Decide if a data set meets a particular need

    • Discover how to acquire the dataset you identified; process and use the dataset

CC image by ASEE on Flickr

What is the value to organizations

What is the Value to Organizations?

  • Metadata helps ensure an organization’s investment in data:

    • Documentation of data processing steps, quality control, definitions, data uses, and restrictions

    • Ability to use data after initial intended purpose

  • Transcends people and time:

    • Offers data permanence

    • Creates institutional memory

  • Advertises an organization’s research:

    • Creates possible new partnerships and collaborations through data sharing

CC image by mambol on Flickr

Information entropy

Information Entropy

Time of data development

Specific details about problems with individual items or specific dates are lost relatively rapidly

General details about datasets are lost through time


Retirement or career change makes access to “mental storage” difficult or unlikely

Accident or technology change may make data unusable

Loss of data developer leads to loss of remaining information


(From Michener et al 1997)

Memory check

Memory Check

i checked my 2002 email archives, and here is what i found out: it appears that the current 3rd generation algorithm was implemented into operations around Oct-Nov 2002 time frame. cannot say more precisely, as all email correspondence i am looking at, talks about this indirectly. (maybe it's what's refered to as the Phase II algorithm.) At the same time, we had implemented quite a few other changes fixing data bugs and formats: view angle problem, increased digitization in all channel's reflectances and AODs, etc. The jump is deemed due to introducing 3rd generation algorithm, which replaced the 2nd generation. The new numbers (~0.08) look more realistic than the previous ones (~0.05 or so). The changes seen in the data is close to the expected effect of this change. The 3rd gen alg takes into account the exact spectral response, whereas the 2nd gen is generic ("one size fits all"). hopefully this settles the issue..


50% change in global average

Information entropy1

Information Entropy

Sound information management, including metadata development, can arrest the loss of dataset detail.



Still there are occasional concerns about creating metadata

Still…There are Occasional Concerns About Creating Metadata

Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data.

CC image by waterlilysage on Flickr

Let s address these concerns

Let’s Address these Concerns…

Selecting a standard

Selecting a Standard

Choosing a metadata standard

Choosing a Metadata Standard

Many standards collect similar information…factors to consider:

Your data type:

  • Are you working mainly with GIS data? Rastor/vector or point data? Do you have biological or shoreline information in your dataset?

    - Consider the FGDC Content Standard for Digital Geospatial Metadata with one of its profiles: the Biological Data Profile or the Shoreline Data Profile.

  • Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling?

    • If so, then consider using the ISO 19115-2 standard

  • Are you mainly working with ecological data?

    • Consider Ecological Metadata Language (EML)

Choosing a metadata standard1

Choosing a Metadata Standard

  • Your organization’s policies: do they state which standard to use?

  • What tools are available to create metadata?

    Examples of Tools:


    • Mermaid (NOAA)

    • Metavist (Forest Service) -- Online Metadata Editor (USGS)


    • - Morpho (KNB)ISO: -- XML Spy or Oxygen

      --- CatMD

      Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats

Writing quality metadata

Writing Quality Metadata

Steps to create quality metadata

Steps to Create Quality Metadata

  • Organize your information

    • Did you write a project abstract to obtain funding for your proposal? Re-use it in your metadata!

    • Did you use a lab notebook or other notes during the data development process that define measurements and other parameters?

    • Do you have the contact information for colleagues you worked with?

    • What about citations for other data sources you used in your project?

CC image by on Google Images

Steps to create quality metadata1

Steps to Create Quality Metadata

  • Write your metadata using a metadata tool

Steps to create quality metadata2

Steps to Create Quality Metadata

  • Review for accuracy and completeness

  • Have someone else read your record

  • Revise the record, based on comments from your reviewer

  • Review once more before you publish

CC image by Shelly Munkberg on Flickr

CC image by mujalifah on


Tips for writing quality metadata

Tips for Writing Quality Metadata

  • Do not use jargon -- define technical terms

    and acronyms:

    • CA, LA, GPS, GIS : what do these mean?

  • Clearly state data limitations

    • E.g., data set omissions, completeness of data

    • Express considerations for appropriate re-use of the data

  • Use “none” or “unknown” meaningfully

    • None usually means that you knew about data and nothing existed (e.g., a “0” cubic feet per second discharge value)

    • Unknown means that you don’t know whether that data existed or not (e.g., a null value)

CC image by kruuscht on Flickr

Tips for writing quality metadata1

Tips for Writing Quality Metadata

Titles, Titles, Titles…

  • Titles are critical in helping readers find your data

    • While individuals are searching for the most appropriate data sets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs.

    • Treat the title as the opportunity to sell your dataset.

  • A complete title includes: What, Where, When, Who, and Scale

  • An informative title includes: topic, timeliness of the data, specific information about place and geography

Tips for writing quality metadata2

Tips for Writing Quality Metadata

  • A Clear Choice: Which title is better?

  • Rivers


  • Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps (1961-1983)

    Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps (1961-1983) (when)

CC image by dolfi on


Tips for writing quality metadata3

Tips for Writing Quality Metadata

  • Be specific and quantify when you can! The goal of a metadata record is to give the user enough information to know if they can use the data without contacting the dataset owner.

    Vague: We checked our work and it looks complete.

    Specific: We checked our work using a random sample of 5 monitoring sites reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections.

CC image by PNASH on Flickr

Tips for writing quality metadata4

Tips for Writing Quality Metadata

  • Use descriptive and clear writing

  • Fully qualify geographic locations

  • Select keywords wisely - use thesauri for keywords whenever possible

    Example: USGS Biocomplexity Thesaurus (over 9,500 terms)

CC image by Marco Arment on Flickr

Tips for writing quality metadata5

Tips for Writing Quality Metadata

  • Remember: a computer will read your metadata

  • Do not use symbols that could be misinterpreted: Examples: ! @ # % { } | / \ < > ~

  • Do not use tabs, indents, or line feeds/carriage returns

  • When copying and pasting from other sources, use a text editor (e.g., Notepad) to eliminate hidden characters

CC image by Ben on Google Images

Tips for writing quality metadata6

Tips for Writing Quality Metadata

  • Fully define entities, attributes, units of measure

  • Ignore temptation to only fill in mandatory fields in the standard -- skipping sections of metadata standard labeled “mandatory if applicable” or “optional” are often critical portions of the standard

    • Example:

Seven Major Metadata Sections:

Section 1 - Identification Information*

Section 2 - Data Quality Information

Section 3 - Spatial Data Information

Section 4 - Spatial Reference Information

Section 5 - Entity and Attribute Information

Section 6 - Distribution Information

Section 7 - Metadata Information*

Three Supporting Sections:

Section 8 - Citation Information*

Section 9 - Time Period Information*

Section 10 - Contact Information*

* Minimum required metadata

Share your metadata distribution

Share Your Metadata: Distribution

  • Share your metadata with other researchers

    Examples of metadata search portals:


      • Federal e-gov geospatial data portal

    • Metacat

      • Repository for data and metadata


    • US Geological Survey

      • USGS Core Science Metadata Clearinghouse:

    • ArcGIS Online

      • ESRI sponsored national geospatial data portal

CC image by RGB12 on Flickr

Dataone search

DataONE Search



  • Metadata is documentation of data

  • A metadata record captures critical information about the content of a dataset

  • Metadata allows data to be discovered, accessed, and re-used

  • A metadata standard provides structure and consistency to data documentation

  • Standards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resources

  • Metadata is of critical importance to data developers, data users, and organizations

  • Writing quality metadata is important because records are expected to last with the data over decades

  • Metadata completes a dataset.

    Creating robust metadata is in your OWN best interest!

Additional slides

Additional Slides

Science Metadata

  • Login