310 likes | 329 Views
This study explores the suitability of geodatabases as long-term archival formats, focusing on North Carolina's geospatial data. It delves into the preservation challenges of vector data formats and different approaches for data retention. The research discusses technical solutions and cultural/organizational strategies for preserving geospatial content effectively, while emphasizing the importance of metadata and best practices for archiving. The study also highlights the collaboration between state agencies and archives to ensure long-term access to valuable geospatial data.
E N D
Are Geodatabases a Suitable Long-Term Archival Format?Jeff Essic, Matt SumnerNorth Carolina State University Libraries 2009 ESRI International Users Conference July 14, 2009
NC Geospatial Data Archiving Project (NCGDAP) • Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP) • Focus on state and local geospatial content in North Carolina (statedemonstration) • Website: http://www.lib.ncsu.edu/ncgdap
Geospatial Data Preservation Challenge:Vector Data Formats No widely-supported, open vector formats for geospatial data Spatial Data Transfer Standard (SDTS) not widely supported Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access” Spatial Databases The whole is more than the sum of the parts, and the whole is very difficult to preserve Can export individual data layers for curation, but relationships and other context are lost
Challenge: Other Data Types Cartographic Representation Software Project Files, PDFs, GeoPDFs, WMS images Web 2.0 content Street views, Mashups Oblique Imagery 3D Models
Different Ways to Approach Preservation Technical solutions: How do we preserve acquired content over the long term? Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production? Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata
Question: Frequency of Capture? Content Exchange – Getting Data in Motion Repository Development Repository of Temporal Data Snapshots
Repository Development Downloading or acquiring “low hanging fruit” Tapping into current data flows Developing our own metadata when necessary Converting and preserving vector data in shapefile format
Data Preservation Like Fruit Desiccation? Complex data representations can be made more preservable (yet less useful) through simplification. Conversion of various formats to shp Image outputs (web services, PDF maps, map image files) Open GeoPDF standard Analogous to paper maps Combines data, symbology, annotation More data intelligence than simple image PDF content retained in addition to, NOT instead of data
Archival and Long Term Access Working Group Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to data Federal, state, regional, and local agency representation Key focus Best practices for data snapshots and retention State Archives processes: appraisal, selection, retention schedules, etc. Valuable outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.
Archival and Long Term Access Working Group Final Report approved by NC GICC in November, 2008 Best Practices for: Archiving Schedule Inventory Storage Medium Formats Naming http://www.ncgicc.org/ Wake County adopted, providing archived data online http://www.wakegov.com/gis/download_data.htm Metadata Distribution Periodic Review Data Integrity Publicity
NDIIPP Multi-State Geospatial Project Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC Partners: Leading state geospatial organizations of Kentucky and Utah State Archives of Kentucky and Utah NCSU Libraries in catalytic/advisory role State-to-state and geo-to-Archives collaboration Archives as part of statewide Spatial Data Infrastructure
Geodatabase Curation Study: Overview • Three types of Geodatases: Personal, File, SDE • Curation/Conversion options: • Archive GDB object • Export to: XML, shapefiles, GML Simple Features (open published formats) • Consideration given to objects and export files created in older ArcGIS versions - Will they be compatible with newer versions?
Caveats Only tested what appeared to be the most reasonable and logical conversion options. Numerous other possibilities not tested. Some conversions required running overnight. Limited time for testing multiple datasets and scenarios. Didn’t explore GDB’s with rasters. Very limited geodatabase experience or expertise.
Personal Geodatabase • Not ideal archival object • Very proprietary – ArcGIS / MS Access formats • ESRI now recommends using File GDB instead http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Types_of_geodatabases • Archive export formats: XML, shapefiles
File Geodatabase • Potential archival object • Kentucky KYGEONET • ESRI working on low-level (non ArcObjects based) API (http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/ andhttp://events.esri.com/uc/QandA/index.cfm?fuseaction=answer&conferenceId=2A8E2713-1422-2418-7F20BB7C186B5B83&questionId=2578) • Folder/File structure • Can see “under the hood” • Requires knowledge of component parts • Archive export formats: XML, shapefiles, GML
File Geodatabase KYGEONET: “Snapshot File Format – Kentucky has chosen to archive its data in the form of an ESRI’s file-based geodatabase (fGDB). This file-based relational database format will allow the entire archive set to exist within it’s own container with groupings of data based upon the FGDC Metadata model (same as groupings on KYGEONET and GOS). This file format is appropriate for the storage of both raster and vector data and allows for compression. Additionally, the fGDB allows for vector topology, the inclusions of route data, and other advanced relationships that cannot be supported with the old Shapefile format.” http://www.geomapp.net/docs/ky_geoarchives_procedures.pdf
SDE Geodatabase Stored in RDBMS, so can’t be archived as a stand-alone object unless exported Supports Historical Archiving Commonly used among local govts. for enterprise data management Archive export format: XML, fGDB, shapefiles
Questions for Testing Will pGDB XML export files round-trip between 9.1 and 9.3.1? Will fGDB XML export files round-trip between 9.2 and 9.3.1? Will fGDB GML round-trip within 9.3.1? Do GDB’s have added value that is not represented in shapefile exports?
Personal and File GDB Export Export to XML interface Export to shapefiles Export to XML
pGDB Import of 9.1 XML 9.3.1 Failure Message 9.2 Failure Message Import in progress
pGDB Export to Shapefiles Sub-domain attribute text is lost in the conversion to shapefile
GML Export GML “Simple Features Profile” now supported by 9.3 ArcToolbox/Data Interoperability Tools: GML support available out-of-the-box to all users
Conclusions • For archival, pGDB must be regularly upgraded, exported to shapefiles (including relational tables), and/or imported to a fGDB. • Stand alone fGDB may be safe archival format, following KYGEONET’s lead. • Risk: format newness & unknown future • Will feel safer after ESRI release of API.
Future Study Needs • Round-trip fGDB via XML- Are complex functions, properties, and relationships preserved? • SDE Export Options – Best practices to preserve as much as possible via XML, fGDB, and/or shapefiles? • What’s the problem with the GML import?
Slide Presentation http://www.lib.ncsu.edu/ncgdap /presentations.html Jeff Essic, Matt Sumner Data Services Librarians NCSU Libraries jeff_essic@ncsu.edu, matt_sumner@ncsu.edu