1 / 31

2009 ESRI International Users Conference

Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries. 2009 ESRI International Users Conference. July 14, 2009. NC Geospatial Data Archiving Project (NCGDAP).

ptrout
Download Presentation

2009 ESRI International Users Conference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Are Geodatabases a Suitable Long-Term Archival Format?Jeff Essic, Matt SumnerNorth Carolina State University Libraries 2009 ESRI International Users Conference July 14, 2009

  2. NC Geospatial Data Archiving Project (NCGDAP) • Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP) • Focus on state and local geospatial content in North Carolina (statedemonstration) • Website: http://www.lib.ncsu.edu/ncgdap

  3. Geospatial Data Preservation Challenge:Vector Data Formats No widely-supported, open vector formats for geospatial data Spatial Data Transfer Standard (SDTS) not widely supported Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access” Spatial Databases The whole is more than the sum of the parts, and the whole is very difficult to preserve Can export individual data layers for curation, but relationships and other context are lost

  4. Challenge: Other Data Types Cartographic Representation Software Project Files, PDFs, GeoPDFs, WMS images Web 2.0 content Street views, Mashups Oblique Imagery 3D Models

  5. Different Ways to Approach Preservation Technical solutions: How do we preserve acquired content over the long term? Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production? Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata

  6. Question: Frequency of Capture? Content Exchange – Getting Data in Motion Repository Development Repository of Temporal Data Snapshots

  7. Repository Development Downloading or acquiring “low hanging fruit” Tapping into current data flows Developing our own metadata when necessary Converting and preserving vector data in shapefile format

  8. Data Preservation Like Fruit Desiccation? Complex data representations can be made more preservable (yet less useful) through simplification. Conversion of various formats to shp Image outputs (web services, PDF maps, map image files) Open GeoPDF standard Analogous to paper maps Combines data, symbology, annotation More data intelligence than simple image PDF content retained in addition to, NOT instead of data

  9. Archival and Long Term Access Working Group Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to data Federal, state, regional, and local agency representation Key focus Best practices for data snapshots and retention State Archives processes: appraisal, selection, retention schedules, etc. Valuable outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.

  10. Archival and Long Term Access Working Group Final Report approved by NC GICC in November, 2008 Best Practices for: Archiving Schedule Inventory Storage Medium Formats Naming http://www.ncgicc.org/ Wake County adopted, providing archived data online http://www.wakegov.com/gis/download_data.htm Metadata Distribution Periodic Review Data Integrity Publicity

  11. NDIIPP Multi-State Geospatial Project Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC Partners: Leading state geospatial organizations of Kentucky and Utah State Archives of Kentucky and Utah NCSU Libraries in catalytic/advisory role State-to-state and geo-to-Archives collaboration Archives as part of statewide Spatial Data Infrastructure

  12. Geodatabase Curation Study: Overview • Three types of Geodatases: Personal, File, SDE • Curation/Conversion options: • Archive GDB object • Export to: XML, shapefiles, GML Simple Features (open published formats) • Consideration given to objects and export files created in older ArcGIS versions - Will they be compatible with newer versions?

  13. Caveats Only tested what appeared to be the most reasonable and logical conversion options. Numerous other possibilities not tested. Some conversions required running overnight. Limited time for testing multiple datasets and scenarios. Didn’t explore GDB’s with rasters. Very limited geodatabase experience or expertise.

  14. Personal Geodatabase • Not ideal archival object • Very proprietary – ArcGIS / MS Access formats • ESRI now recommends using File GDB instead http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Types_of_geodatabases • Archive export formats: XML, shapefiles

  15. File Geodatabase • Potential archival object • Kentucky KYGEONET • ESRI working on low-level (non ArcObjects based) API (http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/ andhttp://events.esri.com/uc/QandA/index.cfm?fuseaction=answer&conferenceId=2A8E2713-1422-2418-7F20BB7C186B5B83&questionId=2578) • Folder/File structure • Can see “under the hood” • Requires knowledge of component parts • Archive export formats: XML, shapefiles, GML

  16. File Geodatabase KYGEONET: “Snapshot File Format – Kentucky has chosen to archive its data in the form of an ESRI’s file-based geodatabase (fGDB). This file-based relational database format will allow the entire archive set to exist within it’s own container with groupings of data based upon the FGDC Metadata model (same as groupings on KYGEONET and GOS). This file format is appropriate for the storage of both raster and vector data and allows for compression. Additionally, the fGDB allows for vector topology, the inclusions of route data, and other advanced relationships that cannot be supported with the old Shapefile format.” http://www.geomapp.net/docs/ky_geoarchives_procedures.pdf

  17. SDE Geodatabase Stored in RDBMS, so can’t be archived as a stand-alone object unless exported Supports Historical Archiving Commonly used among local govts. for enterprise data management Archive export format: XML, fGDB, shapefiles

  18. Questions for Testing Will pGDB XML export files round-trip between 9.1 and 9.3.1? Will fGDB XML export files round-trip between 9.2 and 9.3.1? Will fGDB GML round-trip within 9.3.1? Do GDB’s have added value that is not represented in shapefile exports?

  19. Personal and File GDB Export Export to XML interface Export to shapefiles Export to XML

  20. Personal GDB Tests

  21. pGDB Import of 9.1 XML 9.3.1 Failure Message 9.2 Failure Message Import in progress

  22. pGDB Export to Shapefiles Sub-domain attribute text is lost in the conversion to shapefile

  23. pGDB Upgrade to 9.3.1

  24. pGDB conversion to fGDB

  25. File GDB Tests

  26. GML Export GML “Simple Features Profile” now supported by 9.3 ArcToolbox/Data Interoperability Tools: GML support available out-of-the-box to all users

  27. File GDB/GML Test

  28. Conclusions • For archival, pGDB must be regularly upgraded, exported to shapefiles (including relational tables), and/or imported to a fGDB. • Stand alone fGDB may be safe archival format, following KYGEONET’s lead. • Risk: format newness & unknown future • Will feel safer after ESRI release of API.

  29. Future Study Needs • Round-trip fGDB via XML- Are complex functions, properties, and relationships preserved? • SDE Export Options – Best practices to preserve as much as possible via XML, fGDB, and/or shapefiles? • What’s the problem with the GML import?

  30. Slide Presentation http://www.lib.ncsu.edu/ncgdap /presentations.html Jeff Essic, Matt Sumner Data Services Librarians NCSU Libraries jeff_essic@ncsu.edu, matt_sumner@ncsu.edu

More Related