1 / 15

Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project

Steve Morris North Carolina State University Libraries. Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project. How the Data is Received. Data is delivered as is – no control over organization of received data Contributing organizations

kasie
Download Presentation

Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Steve Morris North Carolina State University Libraries Ingest Workflow Issues:MetadataNorth Carolina Geospatial Data Archiving Project

  2. How the Data is Received • Data is delivered as is – no control over organization of received data • Contributing organizations • County and municipal agencies • State agencies • Regional councils of government • Data transfer modes • CD/DVD, External Drive • FTP or Web Download

  3. Ingest Challenges: General Data consists of multi-file, multi-format objects Ancillary data files can be shared by datasets Some formats require conversion now Some format conversions involve one-to-many relationships Compressed archive files are common and behave unpredictably And all the usual challenges: format validation, validity checking, threat scanning,…

  4. Ingest Challenges: Metadata • Metadata is encoded in a variety or ways • The FGDC content standard for metadata lacked an encoding standard (arrived pre-XML), will soon be addressed in ISO 19115/19139 FGDC implementation • XML (varied schemas), TXT, HTML • Metadata is missing • Only about 25% of local agencies use FGDC • Metadata is wrong • Metadata is commonly asynchronous with the data • Inconsistent use of dataset naming, etc.

  5. Some Key Decisions • Capture “transfer set” metadata • Normalize, synchronize, and remediate existing metadata, and retain original metadata record • Treat contact information as archival • Update metadata with format conversions • Use ESRI Profile of FGDC • added technical and administrative elements • Has an XML schema • ArcCatalog tool support • Use simple rights encoding scheme • Record metadata in a workflow management database

  6. What is Transfer Set Metadata? • Administrative and technical metadata associated with a transfer device or download • Propagates to individual data objects PHP Application Interface for Transfer Set Metadata Capture

  7. If No Metadata, What Then? • Autoextract a subset of technical and descriptive metadata through ArcCatalog • Apply an agency-specific metadata template (many elements are static within the context of the agency) • Acquire information from the NC OneMap Inventory • Data Source • Contact Info • Datum, Coordinate System • Acquire information from agency web site • Avoid direct inquiries to local agencies (“contact fatigue”)

  8. What Gets Remediated and Why? • Key technical elements that are wrong • Datum, coordinate system, format, … • Title • Qualify to the agency (e.g. “Streets” becomes “Henderson County Streets”) • Keywords • Add ISO keywords • NCSU GIS Lookup terms added later if needed for access These are basic requirements for accessand use

  9. Metadata Tools • ArcCatalog • Automated metadata extraction • ArcGIS Toolbar • Metadata synchronization, normalization, templating • cns and mp • Raw text handling • Python classes • Ingest workflow

  10. Source Metadata Translation • Hub-and-spoke model a la Echo DEPository • repository agnostic • modular conversion hub • facilitate repository software migration & inter-archive exchange

  11. What is the Rights Encoding? • Purpose: Define a basic set of codes to hold dataset rights information in a script-actionable form. To assign related text for use in constructing brief rights statements. Propagates to individual data objects • Structure: Codes are assigned on a fixed string position basis. Rights assigned to particular user types are grouped after a flag character for that user group. • Initial User Groups: • NCSU Faculty/Staff/Students (Code “N”) • General Public (Code “P”) • Library of Congress (Code “L”) • Initial Rights Types: • Use • Redistribute • Commercial Use

  12. Sample Rights Record M01N110P110L110 Interpretation: This dataset was acquired in a mediated transaction directly from the data producer (acquired on media or via arranged download). There is no data agreement but there is a data disclaimer. NCSU, General Public, and LC all can use and redistribute the data but commercial use is not allowed.

  13. Deferred Activities • Implementing METS and PreMIS • Developing a serial object metadata scheme

  14. Ongoing Challenges • When to automate and when not to • Learn first from human intervention • Minimizing risk of error related to human intervention • Accepting that ingest packages used will evolve over time (implications for archive?) • Handling post-ingest migrations

  15. Engagement Opportunities • NCGDAP partner NCCGIA runs the NC OneMap Metadata Outreach Program • Provide feedback to spatial data infrastructure about metadata inconsistencies, lack of adherence to best practices • Partner with industry and standards organizations on addressing metadata issues such as poor standards support for versioned data (e.g., through OGC Data Preservation Working Group)

More Related