Steve morris north carolina state university libraries
1 / 15

Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project - PowerPoint PPT Presentation

  • Uploaded on

Steve Morris North Carolina State University Libraries. Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project. How the Data is Received. Data is delivered as is – no control over organization of received data Contributing organizations

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project' - kasie

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Steve morris north carolina state university libraries l.jpg

Steve Morris

North Carolina State University Libraries

Ingest Workflow Issues:MetadataNorth Carolina Geospatial Data Archiving Project

How the data is received l.jpg
How the Data is Received

  • Data is delivered as is – no control over organization of received data

  • Contributing organizations

    • County and municipal agencies

    • State agencies

    • Regional councils of government

  • Data transfer modes

    • CD/DVD, External Drive

    • FTP or Web Download

Ingest challenges general l.jpg
Ingest Challenges: General

Data consists of multi-file, multi-format objects

Ancillary data files can be shared by datasets

Some formats require conversion now

Some format conversions involve one-to-many relationships

Compressed archive files are common and behave unpredictably

And all the usual challenges: format validation, validity checking, threat scanning,…

Ingest challenges metadata l.jpg
Ingest Challenges: Metadata

  • Metadata is encoded in a variety or ways

    • The FGDC content standard for metadata lacked an encoding standard (arrived pre-XML), will soon be addressed in ISO 19115/19139 FGDC implementation

    • XML (varied schemas), TXT, HTML

  • Metadata is missing

    • Only about 25% of local agencies use FGDC

  • Metadata is wrong

    • Metadata is commonly asynchronous with the data

  • Inconsistent use of dataset naming, etc.

Some key decisions l.jpg
Some Key Decisions

  • Capture “transfer set” metadata

  • Normalize, synchronize, and remediate existing metadata, and retain original metadata record

  • Treat contact information as archival

  • Update metadata with format conversions

  • Use ESRI Profile of FGDC

    • added technical and administrative elements

    • Has an XML schema

    • ArcCatalog tool support

  • Use simple rights encoding scheme

  • Record metadata in a workflow management database

What is transfer set metadata l.jpg
What is Transfer Set Metadata?

  • Administrative and technical metadata associated with a transfer device or download

  • Propagates to individual data objects

PHP Application Interface for

Transfer Set Metadata Capture

If no metadata what then l.jpg
If No Metadata, What Then?

  • Autoextract a subset of technical and descriptive metadata through ArcCatalog

  • Apply an agency-specific metadata template (many elements are static within the context of the agency)

  • Acquire information from the NC OneMap Inventory

    • Data Source

    • Contact Info

    • Datum, Coordinate System

  • Acquire information from agency web site

  • Avoid direct inquiries to local agencies (“contact fatigue”)

What gets remediated and why l.jpg
What Gets Remediated and Why?

  • Key technical elements that are wrong

    • Datum, coordinate system, format, …

  • Title

    • Qualify to the agency (e.g. “Streets” becomes “Henderson County Streets”)

  • Keywords

    • Add ISO keywords

    • NCSU GIS Lookup terms added later if needed for access

These are basic requirements for accessand use

Metadata tools l.jpg
Metadata Tools

  • ArcCatalog

    • Automated metadata extraction

  • ArcGIS Toolbar

    • Metadata synchronization, normalization, templating

  • cns and mp

    • Raw text handling

  • Python classes

    • Ingest workflow

Source metadata translation l.jpg
Source Metadata Translation

  • Hub-and-spoke model a la Echo DEPository

    • repository agnostic

    • modular conversion hub

    • facilitate repository software migration & inter-archive exchange

What is the rights encoding l.jpg
What is the Rights Encoding?

  • Purpose: Define a basic set of codes to hold dataset rights information in a script-actionable form. To assign related text for use in constructing brief rights statements. Propagates to individual data objects

  • Structure: Codes are assigned on a fixed string position basis. Rights assigned to particular user types are grouped after a flag character for that user group.

  • Initial User Groups:

    • NCSU Faculty/Staff/Students (Code “N”)

    • General Public (Code “P”)

    • Library of Congress (Code “L”)

  • Initial Rights Types:

    • Use

    • Redistribute

    • Commercial Use

Sample rights record l.jpg
Sample Rights Record


Interpretation: This dataset was acquired in a mediated transaction directly from the data producer (acquired on media or via arranged download). There is no data agreement but there is a data disclaimer. NCSU, General Public, and LC all can use and redistribute the data but commercial use is not allowed.

Deferred activities l.jpg
Deferred Activities

  • Implementing METS and PreMIS

  • Developing a serial object metadata scheme

Ongoing challenges l.jpg
Ongoing Challenges

  • When to automate and when not to

    • Learn first from human intervention

    • Minimizing risk of error related to human intervention

  • Accepting that ingest packages used will evolve over time (implications for archive?)

  • Handling post-ingest migrations

Engagement opportunities l.jpg
Engagement Opportunities

  • NCGDAP partner NCCGIA runs the NC OneMap Metadata Outreach Program

  • Provide feedback to spatial data infrastructure about metadata inconsistencies, lack of adherence to best practices

  • Partner with industry and standards organizations on addressing metadata issues such as poor standards support for versioned data (e.g., through OGC Data Preservation Working Group)