1 / 45

Breaking down the walls

Breaking down the walls. Moving libraries from collectors to portals. Carl Lagoze Cornell University lagoze@cs.cornell.edu.

veradis
Download Presentation

Breaking down the walls

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Breaking down the walls Moving libraries from collectors to portals Carl LagozeCornell Universitylagoze@cs.cornell.edu

  2. The Library should selectively adopt the portal model for targeted program areas. By creating links from the Library’s Web site, this approach would make available the ever-increasing body of research materials distributed across the Internet. The Library would be responsible for carefully selecting and arranging for access to licensed commercial resources for its users, but it would not house local copies of materials or assume responsibility for long-term preservation. LC21: Digital Strategy for the Library of Congresspage 5

  3. LC21: Digital Strategy for the Library of Congresspage 5

  4. Towards a Virtual Control Zone Some of the most fundamental aspects of library operations entail the existence of a border, across which objects of information are transferred and maintained. Such a parameter, demarcating a single, distributed digital library (the "control zone"), needs to be created and managed by the academic library community at the earliest opportunity. Ross AtkinsonLibrary Quarterly, 1996

  5. Why distributed collections? • Scale of the Web • Prevalence of new publishing models and agents • Increasing complexity of licensing and access management • Dynamic nature of content

  6. Towards Hybrid Portals • Traditional portal (e.g., Yahoo!) • linkage without responsibility • Hybrid Portal • assertion of (some semblance) of curatorial role over linked objects

  7. New models have cultural/organizational ramifications… • Performance and ranking metrics – "bigger is better" • Levels of confidence • Trust

  8. …that can be assisted by new technical foundations • Digital object architectures • that enable aggregating and customizing content for local access and management • Metadata frameworks • that model changes of objects and their management over time • OAI Harvesting Protocol • for exchange of structured information • Preservation models • that enable non-cooperative and cooperative offsite monitoring

  9. Digital Object Architectures:aggregating & localizing distributed content Acknowledgements: Naomi Dushay Sandy Payette Thorton Staples (U. Va.) Ross Wayland (U. Va.)

  10. From Mediators to Value-Added Surrogates • Wiederhold – mediators between raw data and end-user applications for integration and transformation • Paepcke – mediators as foundation for digital library interoperability • Payette and Lagoze – mediators (V-A surrogates) to aggregate and create a localized service layer for distributed resources

  11. FEDORA Digital Object Model

  12. Establishing a Virtual Control Zone

  13. V-A Surrogate Applications • Access management • Shared responsibility among trusted partners • Enhanced and customized functionality • Examples: reference linking, format translation, special needs • Preservation • Monitoring "significant" events and acting on them

  14. DigitalObject A: • Get Transcript of Audio • Search for keyword • Get Slides translated to French Tool Tool Tool Context Broker B Structural Characteristics Tool DigitalObject A Powerpoint presentation SMIL synchronization metadata Realaudio video • DigitalObject A: • View Slides • View Video • View synchronized presentation using applet Context Broker A

  15. Where we are now… • Ongoing FEDORA reference prototype • http://www.cs.cornell.edu/cdlrg/FEDORA.html • Policy enforcement research • Content mediation • Proposed joint deployment with University of Virginia • Open source scalable implementation of FEDORA architecture • Testing and deployment with a number of research library partners.

  16. Event-Aware Metadata Frameworks:describing changes over time • Acknowledgements: • Dan Brickley (ILRT, Bristol) • Martin Doer (FORTH, Crete) • Jane Hunter (DSTC, Brisbane)

  17. Distributed ContentThe Metadata Challenge • From fixed, contained physical artifacts to fluid, distributed digital objects • Need for basis of trust and authenticity in network environment • Decentralization and specialization of resource description and need for mapping formalisms

  18. Photographer Computer artist Camera type Software Multi-entity nature of object description

  19. subject implied verb metadata noun literal metadata adjective Playwright “Shakespeare” dc:creator.playwright R1 dc:title “Hamlet” Attribute/Value approaches to metadata… The playwright of Hamlet was Shakespeare Hamlet has a creator Shakespeare

  20. “Shakespeare” dc:creator.playwright R1 dc:creator.birthplace “Stratford” …run into problems for richer descriptions… The playwright of Hamlet was Shakespeare,who was born in Stratford Hamlet has a creator Stratford birthplace

  21. …because of their failure to model entity distinctions “Shakespeare” name R1 R2 creator birthplace title “Stratford” “Hamlet”

  22. ABC/Harmony Event-aware metadata model • Recognizing inherent lifecycle aspects of description (esp. of digital content) • Modeling incorporates time (events and situations) as first-class objects • Supplies clear attachment points for agents, roles, occurrent properties • Resource description as a “story-telling” activity

  23. ? Resource-centric Metadata

  24. “Orest Vereisky” “Leo Tolstoy” “Margaret Wettlin” "Moscow" “illustrator” “author” “translator” “1828” “1877” “1978” “creation” “translation” “Russian” “English” “Tragic adultery andthe search for meaningfullove” “Anna Karenina”

  25. Queries over descriptive graphs Rudolf Squish – http://swordfish.rdfweb.org/rdfquery List details of events where Lagoze is a participating agent SELECT ?title, ?type, ?time, ?place, ?name FROM http://ilrt.org/discovery/harmony/oai.rdf WHERE (web::type ?event abc::Event) (abc::context ?event ?context) ….. AND ?name ~ lagoze USING web FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#

  26. Where we are now • Stabilization of model • Collaboration with museum/CIDOC community for joint modeling principles • Plans • RDF api for model elements • UI for metadata creation • Query engine testing

  27. Open Archives Initiative:facilitating exchange of structured information • Acknowledgements: • Herbert Van de Sompel • OAI Steering and Technical Committees

  28. Open Archives Initiative • Testing the hypotheses • exposing metadata in various forms will facilitate creation of value-added services • key to deployable DL infrastructure is low-entry cost • Individual communities can/will customize common infrastructure

  29. Where we’ve come from • Late 1999 Santa Fe UPS meeting – increase impact of eprint initiatives through federation • Santa Fe Convention – metadata harvesting among eprint archives • Increasing interest outside the eprint community • Research libraries • Museums • Publishers

  30. Progress over the past year • OAI workshops at US and EC DL conferences • Organizational stability • Executive committee and steering committee • September 2000 technical meeting • Reframe and rethink technical solutions for broader domain • Extensive testing and refinement of technical infrastructure

  31. Technical Infrastructure – key technical features • Deploy now technology – 80/20 rule • Two-party model – providers and consumers • Simple HTTP encoding • XML schema for some degree of protocol conformance • Extensibility • Multiple item-level metadata • Collection level metadata

  32. repos i tory harves ter OAI protocol requests service provider data provider • Supporting protocol requests: • Identify • ListMetadataFormats • ListSets • Harvesting protocol requests: • ListRecords • ListIdentifiers • GetRecord

  33. Where we are now • “Stable” 1.0 protocol specification • Hopefully, self-documenting infrastructure • http://www.openarchives.org • 27 registered data providers • Increasing number of tools available • Research initiatives • NSF-funded NSDL • EC-funded Cyclades • Andrew W. Mellon service proposals • EC-funded community building

  34. Where do we go from here • Controlling the stampede • Maintaining the organizational model – lean and mean while encouraging community-specific exploitation • Encouraging testing especially through deployment and especially service development • Encouraging metadata diversification – this isn’t just above Dublin Core!!! • Preservation • Document access • Authentication

  35. OAI & Metadata Research • Dictionary of metadata terms (Tom Baker) • Mandating usage rules has only limited effectiveness • Compiling usage of those terms is vital to machine understanding and interoperability • Provide context heuristics for search engine and indexer processing • Large-scale deployment of OAI and web crawling enables (partial) automation of usage compilation (e.g., data mining of term usage)

  36. Preservation Models:monitoring threats to distributed content • Acknowledgements: • Bill Arms • Peter Botticelli (CUL) • Anne Kenney (CUL)

  37. Preservation & Remote Control • Organization Issues • “assured preservation” may not be possible without direct custodial control. • what are the levels of acceptability and for which types of resources? • Technical Issues • what are the technologies for remote control at the various levels of assurance deemed acceptable by the library? • what is the probability of a reasonable level of preservation in the context of such technologies?

  38. Cost vs. Functionality

  39. Leveraging Current Work • Event-based metadata • Metadata harvesting • Longevity and threats to digital resources

  40. Level 0 Experiment

  41. Level 1 Experiment

  42. One of Six Core Integration Demonstration Projects for the NSDL

  43. How Big might the NSDL be? The NSDL aims to be comprehensive -- all branches of science, all levels of education, very broadly defined. Five year targets: 1,000,000 different users 10,000,000 digital objects 100,000 independent sites Requires: low-cost, scalable, technology automated collection building and maintenance

  44. Levels of Interoperability:Metadata Harvesting Agreements on simple protocol and metadata standard(s) Example: Metadata harvesting protocol of the Open Archives Initiative (MHP) • Moderate-quality services • Low cost of entry to participating sites Moderately large numbers of loosely collaborating sites Promising but still an emerging approach

  45. Levels of Interoperability:Gathering Robots gather collections automatically with no participation from individual sites Examples: Web search services (e.g., Google) CiteSeer (a.k.a. ResearchIndex) • Restricted but useful services • Zero cost of entry to gathered sites Very large numbers of independent sites Only suitable for open access collections

More Related