steve morris north carolina state university libraries
Download
Skip this Video
Download Presentation
Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project

Loading in 2 Seconds...

play fullscreen
1 / 15

Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Steve Morris North Carolina State University Libraries. Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project. How the Data is Received. Data is delivered as is – no control over organization of received data Contributing organizations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project' - kasie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
steve morris north carolina state university libraries
Steve Morris

North Carolina State University Libraries

Ingest Workflow Issues:MetadataNorth Carolina Geospatial Data Archiving Project
how the data is received
How the Data is Received
  • Data is delivered as is – no control over organization of received data
  • Contributing organizations
    • County and municipal agencies
    • State agencies
    • Regional councils of government
  • Data transfer modes
    • CD/DVD, External Drive
    • FTP or Web Download
ingest challenges general
Ingest Challenges: General

Data consists of multi-file, multi-format objects

Ancillary data files can be shared by datasets

Some formats require conversion now

Some format conversions involve one-to-many relationships

Compressed archive files are common and behave unpredictably

And all the usual challenges: format validation, validity checking, threat scanning,…

ingest challenges metadata
Ingest Challenges: Metadata
  • Metadata is encoded in a variety or ways
    • The FGDC content standard for metadata lacked an encoding standard (arrived pre-XML), will soon be addressed in ISO 19115/19139 FGDC implementation
    • XML (varied schemas), TXT, HTML
  • Metadata is missing
    • Only about 25% of local agencies use FGDC
  • Metadata is wrong
    • Metadata is commonly asynchronous with the data
  • Inconsistent use of dataset naming, etc.
some key decisions
Some Key Decisions
  • Capture “transfer set” metadata
  • Normalize, synchronize, and remediate existing metadata, and retain original metadata record
  • Treat contact information as archival
  • Update metadata with format conversions
  • Use ESRI Profile of FGDC
    • added technical and administrative elements
    • Has an XML schema
    • ArcCatalog tool support
  • Use simple rights encoding scheme
  • Record metadata in a workflow management database
what is transfer set metadata
What is Transfer Set Metadata?
  • Administrative and technical metadata associated with a transfer device or download
  • Propagates to individual data objects

PHP Application Interface for

Transfer Set Metadata Capture

if no metadata what then
If No Metadata, What Then?
  • Autoextract a subset of technical and descriptive metadata through ArcCatalog
  • Apply an agency-specific metadata template (many elements are static within the context of the agency)
  • Acquire information from the NC OneMap Inventory
    • Data Source
    • Contact Info
    • Datum, Coordinate System
  • Acquire information from agency web site
  • Avoid direct inquiries to local agencies (“contact fatigue”)
what gets remediated and why
What Gets Remediated and Why?
  • Key technical elements that are wrong
    • Datum, coordinate system, format, …
  • Title
    • Qualify to the agency (e.g. “Streets” becomes “Henderson County Streets”)
  • Keywords
    • Add ISO keywords
    • NCSU GIS Lookup terms added later if needed for access

These are basic requirements for accessand use

metadata tools
Metadata Tools
  • ArcCatalog
    • Automated metadata extraction
  • ArcGIS Toolbar
    • Metadata synchronization, normalization, templating
  • cns and mp
    • Raw text handling
  • Python classes
    • Ingest workflow
source metadata translation
Source Metadata Translation
  • Hub-and-spoke model a la Echo DEPository
    • repository agnostic
    • modular conversion hub
    • facilitate repository software migration & inter-archive exchange
what is the rights encoding
What is the Rights Encoding?
  • Purpose: Define a basic set of codes to hold dataset rights information in a script-actionable form. To assign related text for use in constructing brief rights statements. Propagates to individual data objects
  • Structure: Codes are assigned on a fixed string position basis. Rights assigned to particular user types are grouped after a flag character for that user group.
  • Initial User Groups:
    • NCSU Faculty/Staff/Students (Code “N”)
    • General Public (Code “P”)
    • Library of Congress (Code “L”)
  • Initial Rights Types:
    • Use
    • Redistribute
    • Commercial Use
sample rights record
Sample Rights Record

M01N110P110L110

Interpretation: This dataset was acquired in a mediated transaction directly from the data producer (acquired on media or via arranged download). There is no data agreement but there is a data disclaimer. NCSU, General Public, and LC all can use and redistribute the data but commercial use is not allowed.

deferred activities
Deferred Activities
  • Implementing METS and PreMIS
  • Developing a serial object metadata scheme
ongoing challenges
Ongoing Challenges
  • When to automate and when not to
    • Learn first from human intervention
    • Minimizing risk of error related to human intervention
  • Accepting that ingest packages used will evolve over time (implications for archive?)
  • Handling post-ingest migrations
engagement opportunities
Engagement Opportunities
  • NCGDAP partner NCCGIA runs the NC OneMap Metadata Outreach Program
  • Provide feedback to spatial data infrastructure about metadata inconsistencies, lack of adherence to best practices
  • Partner with industry and standards organizations on addressing metadata issues such as poor standards support for versioned data (e.g., through OGC Data Preservation Working Group)
ad