1 / 18

Awareness Services for Digital Libraries

Awareness Services for Digital Libraries. Arturo Crespo Hector Garcia-Molina Stanford University. Awareness Services for Digital Libraries. Digital library repository: Data store Other components: Indexers Name manager Replica manager Etc. Data Stores and Clients. DB Tech Reports.

venus
Download Presentation

Awareness Services for Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University

  2. Awareness Services for Digital Libraries Digital library repository: • Data store • Other components: • Indexers • Name manager • Replica manager • Etc

  3. Data Stores and Clients DB Tech Reports DB Indexer AI Tech Reports CS Indexer HCI Tech Reports Data Stores Clients

  4. Data Store Services • Object access • Via a handle • Object awareness • Clients must be aware of changes at the store

  5. A Case Study: CS-TR and SIFT • SIFT: a selective dissemination service • CS-TR: A digital library of technical reports from about 50 universities • Awareness based on timestamps • Problems: • File system timestamps • Application timestamps • Deletions

  6. Contributions • Survey of the spectrum of awareness options • Advantages and disadvantages of each one • All mechanisms can be capture by a single algorithm: the UNI-AWARE algorithm • Enhancements for signature-based schemes • Reduced computation • Reduced communication costs

  7. Related Work • Database replica maintenance • Remote file comparison • Deployment of programs over the network

  8. The Client-store Design Space • Push vs. Pull • Statefull versus stateless stores and clients • Cognizant clients and sources • Number of clients per data store

  9. The UNI-AWARE Algorithm • A unified algorithm that “covers” known schemes: • Snapshot algorithm • Timestamps and versions • Logs • Triggers • Signatures • Algorithm is tailored to a specific scheme through the definition of “custom functions”

  10. UNI-AWARE: Signature Algorithm • Signature: a token associated with each document that has a high probability of being unique and changes when the content of the object changes • Example: CRC, checksums • Advantages: • Robust: as it does not require metadata maintenance • Easy to manage consistently when store fails or object migrates

  11. UNI-AWARE: Signature Algorithm All signatures transferred Data Store Client Document Signature Request Documents

  12. DIST-UNI-AWARE Algorithm • Objective: reduce amount of data exchanged between data store and clients • DIST-UNI-AWARE: • Unified algorithm that can be tailored to different schemes: • Hierarchical signatures • Hierarchical timestamps

  13. DIST-UNI-AWARE Signatures of Buckets transferred Data Store Client Request more Signatures Request Documents Document Signature

  14. Advantages of Signature Algorithms • Support the push and pull models • No need for reliable storage of additional data structures: if signatures are lost or corrupted, they can be recomputed • Efficient in usage of network resources, clients and data stores • Scales well in number of clients and documents

  15. DIST-UNI-AWARE: Performance • Performance depends on number of changes: • No changes: only one round is required • Single change: log2n rounds • 2 changes: log2n rounds, but twice as much data … • Eventually, DIST-UNI-AWARE starts behaving worse than UNI-AWARE

  16. DIST-UNI-AWARE: Enhancements • Increase group split factor • Client sends additional information at split time • Clustering of changed objects

  17. Conclusions • Awareness mechanism for digital libraries • Separation of storage functionality and other services • Awareness schemes must be resilient to computer environment changes and bugs • UNI-AWARE and DIST-UNI-AWARE

  18. Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University

More Related