1 / 34

Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for

Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award Stephen Miller Scripps Institution of Oceanography Bob Detrick Woods Hole Oceanographic Institution John Helly

kalea
Download Presentation

Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award Stephen Miller Scripps Institution of Oceanography Bob Detrick Woods Hole Oceanographic Institution John Helly San Diego Supercomputer Center

  2. Our DIGARCH project website SIO/WHOI/SDSC Plone-driven All members can upload documents http://gdc.ucsd.edu:8080/digarch

  3. Why are NSF and the Library of Congress interested? Digital archiving and preservation

  4. A long history of backup and recovery Capital burned August 19, 1814 Library of Congress offsite recovery Thomas Jefferson’s Library

  5. What is the national DIGARCH program? Bill Lefurgy Library of Congress Larry Brandt NSF/CISE 10 new awards “Produce results within 1 year” http://gdc.ucsd.edu:8080/digarch/about-project/DIGARCH-Initiatiative/ http://gdc.ucsd.edu:8080/digarch/about-project/DIGARCH-Initiatiative/

  6. Alignment: SIO/WHOI needs Match DIGARCH interests

  7. 1. Community Goals 2. Barriers to advances 3. Cyber-capabilities

  8. 1. Community Goals • Broad support • Across disciplines • And institutions • Research • And education

  9. Guaranteelong term preservation Gulf of California 1939 Expedition, R/V E W Scripps

  10. Need more than data storage • Need metadata • Enable re-use • Also need infrastructure • Networked community tools, archives, understanding

  11. Why re-use data? • New ship time expensive ($22K/day) • Use archives for: • 1. Regional synthesis projects • 3. Support other disciplines • 3. Monitor environmental changes through time • Before and after • Earthquakes, slumps, seeps • Volcanoes …

  12. 2. Barriers to advances

  13. Data from a firehose • Can we keep up? • Shipboard data rates – yes • Satellite links – maybe • depends on heading • Metadata – yes, but • not widely implemented • Preservation – maybe • Community usage –help needed from Cyberinfrastructure Tiffany Houghton, SDSC, on R/V Sproul

  14. We can archive from paper documents Track plots Cruise reports Handwritten and printed data

  15. But digital preservation is risky business • Endangered Species • 9-track tapes • Exabytes fail • Even CDs fail • RAIDS fail “Shoe-box” archiving not to be trusted

  16. Solution: Active Archiving • “Don’t trust any media, person or process” • Actively monitor status • Migrate to new storage media • Mirror on multiple systems daily • Backup to independent sites Technology makes this possible, just need to do it

  17. 3. Emerging Cyber-capabilities • SIOExplorer digital library • Design for scalability • Automate harvesting • Collection Builder’s Toolkit for other projects • Crossing institutional boundaries • Multi-Institution Testbed • SIO, WHOI, SDSC

  18. SIOExplorer • Digital Library • Community access • Data • Images • Documents • 647 cruises • 150,000 objects • Multiple federated collections http://SIOExplorer.ucsd.edu

  19. Collection status board • Live on web • Auto-updated • Monitor status of 800 cruises, work in progress • 4000 files, 10 GB per cruise Click for individual cruise status

  20. Issue for future use: • Access to complete cruise collections • Current practice hit-or-miss • Only selected data streams archived • Cyberinfrastructure allows comprehensive solution • Auto-harvesting and archiving • Alvin and Jason data in context of entire cruise Claim: Very little additional cost to archive everything

  21. Design to Overcome Project Barriers Build scalable digital library Federate independent authorities 4 Operational collections 3 Work-in-progress John Helly, IT Architect, SDSC

  22. Multiple access methods • Google • No interface • Just type name of cruise • Basic web form • Text-based search for experts • Java CruiseViewer • Full graphical search • Web services • Computer-to-computer • Enable next generation interoperability

  23. Java CruiseViewer • Full graphical search • All capabilities • Any combination of collections • Metadata • Oracle or PostgreSQL • Data • Storage Resource Broker • User • Graphical search • Keyword search Discover content Browse metadata View or download objects Don Sutton, SDSC Search results for visualization objects

  24. Launch visualization experiences • Visualization of multibeam • seafloor mapping • swath sonar data • 300 cruises since 1982 • 20-km wide swaths • Sonar quality control • Geological research • Education Download free viewer http://www.ivs.unb.ca/products/iview3d/

  25. Other organizations using mtf technology • CUAHSI Consortium of Universities for Advanced Hydrologic Science, Inc. • Major technology co-development • 95 institutional members • WHOI – DIGARCH Multi-Institution Testbed project • Bob Detrick • CCOM/UNH cruise and multibeam archives • Jim Case, Larry Mayer • MBARI – Monterey Bay Aquarium Research Institute collection building in progress • Dave Caress, Andrew Chase • SOEST/HAWAII – April 4-26, 2005 realtime digital library testing R/V Kilo Moana • NIWA – Digital-Library-in-a-Box tested on R/V Tangaroa in New Zealand • John Helly, Don Robertson • Arctic DMS - Data Management System under development • Margo Edwards (Hawaii), Dawn Wright (Oregon State)

  26. Closely related project – IODP Site Survey Data Bank • 6-9 years of support • Digital Library Technology • Modular metadata tools • Webform user interfaces • Reliable servers and storage • IODP interested in access to • SIO and WHOI collections • Cruise • Alvin • Jason

  27. Multi-Institution Testbed • for • Scalable Digital Archiving • Extend SIOExplorer approach to WHOI • Integrate SIO, SDSC and WHOI tools and data • 30 years of WHOI cruise data • 4098 Alvin submersible dives • Jason ROV surveys (200 DVD per cruise) Results from 1600 NSF awards online

  28. WHOI cruises 800 cruises since 1930

  29. 4098 Alvin dives Since June 26, 1964

  30. Project Challenges • Auto-harvest data, metadata • “Shoe-box archives” only • prior to 2002 • Build distributed digital library • Both institutions • Ships and submersibles • Extend WHOI data exploration tools • Persistent digital library objects Interoperability across institutions

  31. Project Facilities • UCSD server • San Diego Supercomputer Center • Dell PowerEdge 2850 server • Dell PowerVault 220S SCSI storage (4 TB) • basalt.sdsc.edu • Staging and backup area • Geological Data Center, SIO • Dell PowerEdge 2850 server • Dell PowerVault 220S SCSI storage (2 TB) • gdcdb.ucsd.edu • Also Sun workstations • 4 RAID systems • WHOI server • Dell PowerEdge • Storage Dru Clark, Uta Peckman at GDC

  32. Project Identity Decision • Do we maintain separate identities? • SIOExplorer • WHOIexplorer • Or create new integrated system • OceanExplorer (or other name) • Select collections SIO or WHOI • Future expansion LDEO, UH, UW, NGDC, even IFREMER • In either case archives will be distributed and replicated

  33. What do we need to accomplish this year? • Proof of concept for Library of Congress / NSF • Working multi-institution testbed for archiving • Define achievable goals • Presentations • AGU • Abstracts due Sept 8, meeting Dec 5-9 (San Francisco) • DIGARCH All-PI and digital government conference • May 21-24 2006 (Marina del Rey?) • Preparation for continued effort • Identify sources of funding

  34. Future plans • 1 year no-cost extension • Complete the prototype testbed • New support for • Harvesting at-risk legacy data • Cruises, Alvin, Jason • Harvesting data from new cruises • Other ideas? • Datasets to add • Technology for archiving and display • Partnerships

More Related