1 / 50

Bridging the Gap between Libraries and Data Archives: Progress Report

Bridging the Gap between Libraries and Data Archives: Progress Report. Roger Revelle, Gulf of California Expedition, 1939. JISC/NSF Digital Libraries Initiative All Projects Meeting 24-25 June 2002, Edinburgh. Two new NSF Projects …. “Bridging the Gap between Libraries and Data Archives”

neena
Download Presentation

Bridging the Gap between Libraries and Data Archives: Progress Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 • JISC/NSF Digital Libraries Initiative All Projects Meeting • 24-25 June 2002, Edinburgh

  2. Two new NSF Projects … • “Bridging the Gap between Libraries and Data Archives” • NSDL Collections Track • “SIOExplorer: Web Exploration of Seagoing Archives” • Information Technology Research (ITR) • Started October 2001

  3. Collaborative effort • UCSD Libraries • Scripps Institution of Oceanography • San Diego Supercomputer Center • Advisory Board • NOAA • US Naval Oceanographic Office • Private Industry • Other oceanographic institutions

  4. Combine … • Data • 50 years of digital data • Growing 200 GB per year • Images • 99 years of SIO Archives • Documents • Reports, publications, books … into one digital library

  5. Data in the collection …

  6. Approx. 3000 cruise legs online at SIO • Bathymetry, magnetics, gravity • Gathered from worldwide sources • 795 SIO cruise legs • Swath bathymetry since 1981

  7. Multibeam sonar revolutionizes seafloor understanding • Map a wide swath • Not just a single profile • SeaBeam Classic, 1981-1992 • 16 beams • SeaBeam 2000, 1992- • 121 beams • SeaBeam 2100, 1996-2000 • 151 beams • Simrad EM120, 2001- • 191 beams • 150 degree swath width • Also backscatter • Determine bottom type • Sediment • Lava flow Realtime swath 20 km across-track

  8. SIO Swath Mapping Expeditions • 244 swath mapping cruises on vessels, since 1981 • Thomas WashingtonMelvilleRevelle • 600 GB multibeam holdings • Adding 200 GB/year

  9. Deliver sampling information • Sample index, 1968- • 100,000 entries • 500 types • Dredged rocks, cores • Biological trawls • Water samples • CTD • Build on www.EarthRef.org • Seamount catalog Roger Revelle, MidPac, 1950 (Amelia Earhart)

  10. Images in the collection …

  11. Access Voyages of Discovery • Encourage inquiry • “What’s this?” links from image • Data (“What”) • Instruments (“How”) • Other voyages • Dual use • Research and education Naga Expedition, 1959-61 (artist’s illustrations from logbook)

  12. R/V Albatross departed SIO 1904 Sigsbee sounding machine

  13. Voyages of Discovery in the Pacific • La Perouse 1780’s • R/V Revelle • “La Perouse Expedition” • Departed June 8 • R/V Melville • “Cook Expedition” • Returns July 17 James Cook By Nathaniel Dance, 1776 Special Collections, UCSD Library

  14. Voyages of Discovery in the Pacific • 1950’s Ed Hamilton, MidPac, 1950 Samoa, Capricorn, 1952

  15. Query for ideas and careers • Not just data R/V Spencer F. Baird L to R back row: Dick Von Herzen, Roger Revelle, Willard Bascom, Ted Folsom, Alan Jones, Gustaf Arrhenius, Henri Rotschi, Robert Livingston, Russell Raitt. Seated: Dick Blumberg, Ronald Mason, Bob Dill, Art Maxwell, Winter Horton, Walter Munk, Helen Raitt Capricorn Expedition, 1952-53 • Track a scientist’s expeditions and publications

  16. Documents in the collection …

  17. Full text of publications • The Challenger Expedition • 30,000 scanned pages • Anatomy of an Expedition • Bill Menard, 1967 Nova Expedition • Link to 1998 Avon Expedition • Exploring the Deep Pacific • Helen Raitt, 1952 Capricorn Expedition

  18. Cruise reports • 50 years available • Scan older versions • Currently generate .pdf automatically Page with swath bathymetry every 6 hours

  19. Bridging the Gap: Progress Report

  20. The Problem • Archives are search-impaired • Content not a problem • Material exists in great abundance • Data archives • Historical archives • But it is hard to get • Litany of woes …

  21. Litany of archive woes • Magnetic media at risk • Need to migrate to new storage • Local access only • Some online, but sprawling directories • Tapes and CDs in drawers • Inconsistent naming over 30 years • Home-grown software • Pre-database technology • Minimal documentation • Formal metadata non-existent • Creators now retired • What to do? Shipboard archives for one recent cruise

  22. Steps toward a Solution • Seek professional help • Computer scientists • Advisory Board • (Similar problems faced in many fields) • Review the problem • Seven issues from national workshop • Analyze the dataflow • Build a prototype • Test the prototype • New Zealand – Samoa Expedition

  23. Review archive problems • NSF/ONR Marine Geology and Geophysics Workshop

  24. Dataflow • First, create a conceptual data model • Spend time to review with all participants • Design a robust model • Define common categories • 9 basic directories • Specific subdirectories • Controlled design document • Map existing digital objects to categories • Both documents and data • Accommodate variations • Data types and names over 50 years • Valid for future developments • Result “CCDS” – Canonical Cruise Data Structure

  25. Second, organize domain-specific content • Work inside a “Staging Area” • Deal with complexity • Extract from 3 archive levels • Shipboard (tape, CD) • Post-processing lab (tape) • Current online content • (not always “best”) • Opportunity for data cleanup • Apply corrections • Weed out intermediate and duplicate versions • Gather information for metadata

  26. Third, load the “CCDS” • Clear transition in activities • Domain specialists final approval • IT team takes over • Early mistake • “Pushed” content from legacy data directories • Complex, vary over years • Revised to “pull” into Canonical Structure • IT lesson learned • Dataflow needs to be “template-driven” • Template can incorporate • Rules for automatic loading • Adaptive choice among multiple alternatives • Maintain flexibility as project evolves • Team members negotiate content of template

  27. Fourth, load the data • Persistent data archive management • Use the “Storage Resource Broker” • San Diego Supercomputer Center product • Fifth, load the metadata • Harvest metadata from data files, automatically • Provide tools for metadata editing • Load into Oracle

  28. Building a Collection Developer’s Toolkit

  29. Collection Developer’s Toolkit • Make it easy to build, and maintain • Not just for IT experts • Portable and scalable for other projects • Integrate • Metadata tools • Data tools • Interactive search and display console

  30. Make use of existing resources • Alexandria Digital Library • Geospatial content • OAI-compliant server • Environmental data archive and delivery tools • John Helly, http://ceed.sdsc.edu/ • Storage Resource Broker • http://www.npaci.edu/DICE/SRB/index.html/ • Domain-specific toolkits • GMT, MB-System, ARC/IMS

  31. Build metadata tools • Automate • Bulk harvesting from data files • Bulk loading into Oracle database • Use NSDL community standards • Dublin Core + “ADN” metadata • Alexandria Digital Library (UCSB) • DLESE (Digital Library for Earth System Education) • NASA • Controlled vocabularies • Science themes • Geographic names • Embed domain-specific metadata into standards • Multibeam, cruise, sampling

  32. MOBE • Metadata Object Browser and Editor • Inherit metadata from • Dublin Core • Cruise • Flexible • Expand for projects as needed • Generic ascii metadata interchange format “MIF” • Export to xml • Java

  33. Search interface • Design for alternative approaches • Geospatial • Lat, lon • Temporal • “1995-2000” • Keyword • Region “Samoa” • Vessel “Melville” • Cruise “AVON02MV” • Data type “dredge” • Scientist “Staudigel” • Expert-level • Research, teacher, student, public Prototype search interface

  34. CruiseViewer • Interactive browser and query interface • Display tracks and samples • Download library objects • Java

  35. Manage interfaces for multiple projects • Both data and metadata

  36. Lessons learned (so far)…

  37. Make it easier to collaborate • Interactions between groups • Not just a technology project • Diverse goals, vocabularies and audiences • Interoperate • Each domain has own sphere of responsibility • Don’t engineer someone else’s domain • Work through interfaces • Re-negotiate as needed • Avoid long-term maintenance headaches between domains

  38. Build tools for collaborative projects • 3 “cultures” in this project • Oceanographers • Computer scientists • Librarians • Example: bridge vocabularies between separate domains • Use metadata “triples,” not “pairs” • Reduce phone calls by including narrative label

  39. Adding newprojects to SIOExplorer • Make use of • Collection Developer’s Toolkit • NSDL server • Metadata interchange • Query processing • SDSC • Managed storage • Web service

  40. Test the prototype Melville departs Lyttelton harbor

  41. Floating Digital Library Workshop • R/V Melville • March 7-21, New Zealand to Samoa • Realtime acquisition of library objects? • Load metadata into swath files • At acquisition time • Specify cruise metadata • Sensor documentation database • Load the CCDS • Learn from a common experience

  42. A good day at 51° S Renewed appreciation for the collection of field data

  43. Common experience • Librarians • Computer scientists • Oceanographers • Royal New Zealand Navy Collaboration between SIO and RNZN Melville in Lyttelton

  44. Floating Digital Library Workshop Librarian at sea Computer scientist in galley Oceanographer holding onto computer

  45. Bollons Gap survey • New Zealand Law of the Sea Claim Librarian at sea Visualization of swath bathymetry, looking north

  46. Heading for Samoa • Crossing the Louisville Ridge • Tonga Trench • Osbourn Trough (ancient spreading center) Visualization of Global Topography, looking north

  47. Relate cruise to SIO holdings • Display search results • Red • SIO multibeam • Black • Other cruises • Yellow • SIO dredged rock samples • Also • Volcanoes • Earthquakes • Plate boundaries • Typical research support product • Make it available on web • Select cruises for further study • Export for ArcView • Related NSF/ITR project

  48. Next steps • Data Publishing Toolkit for Digital Library Interoperability: Integrating the Albatross Cruise Holdings into SIOExplorer • NSF Division of Biological Infrastructure • Collaboration with Smithsonian Institution • Biogeography and Geology of the Oceans: SIO Collections Gateway for the NSDL • NSF NSDL Collections Track Track of the Albatross, 1884-1921

  49. SIOExplorer: Expedition Planner • Open research data for student discovery • Leverage Digital Library efforts • Students design a virtual expedition • Explore relationships • Depth, Sediment thickness, Crustal age • More … • Earthquakes, volcanoes, trenches • Wind, waves, currents • Climate • Students publish expedition report • On the web • Teacher workshops • At the Birch Aquarium Global Topography Sediment thickness Crustal Age

  50. SIO 100th Anniversary • September 26, 2003 R/V Alexander Agassiz, 1907 SIO, 1909 • http://SIOExplorer.ucsd.edu

More Related