1 / 20

Awareness Services for Digital Libraries

Awareness Services for Digital Libraries. Arturo Crespo Hector Garcia-Molina Stanford University. Motivation. Our Objective : create the next generation Data Repositories tailored to Digital Libraries needs:

hallie
Download Presentation

Awareness Services for Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University

  2. Motivation • Our Objective: create the next generation Data Repositories tailored to Digital Libraries needs: • Persistence, Distribution, Intellectual Property, Indexing and Cataloging, Replication, ... Data Storage Clients Naming Indexers Replica Data Storage

  3. Data Stores and Clients DB Tech Reports DB Indexer AI Tech Reports CS Indexer HCI Tech Reports Data Stores Clients

  4. Data Store Services • Object access • Via a handle • Object awareness • Clients must be aware of changes at the store

  5. A Case Study: CS-TR and SIFT • SIFT: a selective dissemination service • CS-TR: A digital library of technical reports from about 50 universities • Awareness based on timestamps • Problems: • File system timestamps • Application timestamps • Deletions

  6. The Problem How can a Data Storage Client detect the changes that have happened in remote Data Storages since the last update • There is not a “Perfect Algorithm”: • The best algorithm for solving this problem depends on the characteristics of the relation between the Data Storage and the client

  7. The Design Space • Ratio of Data Storages per Client • Statefull versus Stateless Data Storages in relation with the Clients • Push versus Pull Model • Update Frequency{ • Client awareness of Data Storages • Complexity of the Algorithm How often the repository changes How often the client is updated

  8. Standard Mechanisms for Client Updating • Key Query Algorithm • Snapshot Differential Algorithm • Timestamps and Versions • Logs • Triggers • Signatures

  9. Contributions • Survey of the spectrum of awareness options • Advantages and disadvantages of each one • All mechanisms can be capture by a single algorithm: the UNI-AWARE algorithm • Enhancements for signature-based schemes • Reduced computation • Reduced communication costs

  10. Related Work • Database replica maintenance • Remote file comparison • Deployment of programs over the network

  11. The UNI-AWARE Algorithm • A unified algorithm that “covers” known schemes: • Snapshot algorithm • Timestamps and versions • Logs • Triggers • Signatures • Algorithm is tailored to a specific scheme through the definition of “custom functions”

  12. UNI-AWARE: Signature Algorithm • Signature: a token associated with each document that has a high probability of being unique and changes when the content of the object changes • Example: CRC, checksums • Advantages: • Robust: as it does not require metadata maintenance • Easy to manage consistently when store fails or object migrates

  13. UNI-AWARE: Signature Algorithm All signatures transferred Data Store Client Document Signature Request Documents

  14. DIST-UNI-AWARE Algorithm • Objective: reduce amount of data exchanged between data store and clients • DIST-UNI-AWARE: • Unified algorithm that can be tailored to different schemes: • Hierarchical signatures • Hierarchical timestamps

  15. DIST-UNI-AWARE Signatures of Buckets transferred Data Store Client Request more Signatures Request Documents Document Signature

  16. Advantages of Signature Algorithms • Support the push and pull models • No need for reliable storage of additional data structures: if signatures are lost or corrupted, they can be recomputed • Efficient in usage of network resources, clients and data stores • Scales well in number of clients and documents

  17. DIST-UNI-AWARE: Enhancements • Increase group split factor • Client sends additional information at split time • Clustering of changed objects

  18. Conclusions • Awareness mechanism for digital libraries • Separation of storage functionality and other services • Awareness schemes must be resilient to computer environment changes and bugs • UNI-AWARE and DIST-UNI-AWARE

  19. Reference • Arturo Crespo, Hector Garcia-Molina. "Awareness Services for Digital Libraries." ECDL'97. http://www-db.stanford.edu/~crespo/publications/

  20. Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University

More Related