1 / 29

The Design Of A Web Document Snapshots Delivery System

The Design Of A Web Document Snapshots Delivery System. David Chao College of Business San Francisco State University. What Is A Web Document Snapshot?. A web document snapshot is the state of a web document at a point in time (snaptime). Applications Of Web Document Snapshots.

Download Presentation

The Design Of A Web Document Snapshots Delivery System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Design Of A Web Document Snapshots Delivery System David Chao College of Business San Francisco State University

  2. What Is A Web Document Snapshot? • A web document snapshot is the state of a web document at a point in time (snaptime).

  3. Applications Of Web Document Snapshots • It enables an organization to audit a web document’s contents in the past. • Perform business analyses with historical information recorded in it. • It is also an archived copy of a web document when it is changed.

  4. The State Of A Web Document At The Snaptime • The code that creates the contents • The rendering of the web document as displayed with a browser: • Dynamic web documents

  5. Factors Affecting The State Of A Web Document • Web document code • The state of internal resources it references: • Internal resources are files managed by a web site and are available in creating the web site’s contents. • Images, style sheet, components, script files, databases, etc. • The state of external resources it references: • External resources are files not managed by the web site but can be referenced in creating the web site’s contents. • Web site host environment variables: • System clock

  6. Four Levels Of Web Document Snapshot • Level 1 snapshot: A web document snapshot is the state of web document code at snaptime. • Creating level 1 snapshot enables a web site to trace the changes to the web document code over time. • Level 2 snapshot: A level 2 snapshot is a level 1 snapshot with the additional requirement that all the internal resources it references are at least level 1 snapshots at the same snaptime. • Referencing database snapshots

  7. Level 3 snapshot: A level 3 snapshot is a level 2 snapshot with the additional requirement that all the external resources it references are at least level 2 snapshots at the snaptime. • Level 4 snapshot: A level 4 snapshot is a level 3 snapshot with the additional requirement that all the web site host’s environment variables are reset to their values at snaptime.

  8. Objective Of This Research • The level 3 and level 4 snapshots involve resources that are not managed by the web site and it’s difficult, if not impossible, for the web site to keep track of changes to these resources. This research develops a web document snapshot management system to deliver web documents’ level 1 and level 2 snapshots.

  9. The Design Of The Web Document Snapshot Delivery System • The system consists of two components: • Database Snapshot Manager for maintaining database snapshots • Web Document Snapshot Manager for maintaining web document snapshots. • The system is designed to deliver snapshots with any snaptime requirement, and snapshots are created only when requested.

  10. Database Snapshot Manager • The objective of this module is to provide a database snapshot at any snaptime requested by users. • This requires recording all updates in a log. The log uses time stamp to record update time, and use flags to indicate deletions and insertions where a modification is treated as the deletion of the old version followed by an insertion of the new version.

  11. Database Snapshot Management • Defining snapshots: CREATE SNAPSHOT snapshotname AS query AS OF snaptime • Refreshing snapshots: REFRESH SNAPSHOT snapshotname AS OF new snaptime

  12. Web Document Snapshot Manager • The objective of the Web Document Snapshot Manager is to generate level 2 snapshots for all internal non-database files.

  13. The M:M Relationship Between A URL And A Web Documdent

  14. Historical Links • The historical links of a web site include the URLs invalidated due to: • web site reorganization • document removal, renaming or relocation • and links to document snapshots: • document’s contents as of a specific point in time.

  15. Logging Scheme • The log, named TemporalURLLog, is designed to keep the history of changes to web documents. It has four fields: • URL: document’s URL • PublishDate: document publish time • ExpireDate: document URL expire date • NewURL: document’s new URL if any

  16. TemporalURLLog Maintenance Algorithm • New document: An entry is entered with its URL and PublishDate; ExpireDate and NewURL are null. • Deleted document: The ExpireDate of the document’s entry is changed to its deletion time. • Modified document: The ExpireDate of the document’s entry is changed to its modification time and a new entry is entered with its URL and modification time as PublishDate; ExpireDate and NewURL are null. • Renamed document: The ExpireDate of the document’s entry is changed to the time it is renamed and the NewURL is changed to its new URL. Then, it adds a new entry with its new URL and the PublishDate is set to the time the document is renamed.

  17. Archiving Scheme • Deleted document: The deleted document in the Archive with URL + PublishDate as file name. • Modified document: The old version is saved in the Archive with URL + PublishDate as file name.

  18. With the scheme we can determine: • A URL P2 valid between T0 and T1 is deleted • 2. A URL P3 has been modified repeatedly and is eventually deleted. • 3. An old URL P5 is now renamed to P7. It has been modified at T3 • 4. The log is able to determine that a historical link P1 is now renamed to P8. • 5. A URL P12 has never existed in the web site.

  19. Patterns of Log Entries for a URL • 1. If a URL has a log entry with a non-null PublishDate and null ExpireDate field then it is a current URL; such as P6 in figure 3. • 2. If all entries of a URL have a non-null PublishDate, ExpireDate and null NewURL field, then this URL is deleted from the web site; such as P3 in Figure 3. • 3. If a URL has a log entry with a non-null NewURL field, then it has been renamed, and the log entries for the new URL may again have these three patterns of changes.

  20. Backward Search For A Document’s Snapshot At A Specific Time • This algorithm processes log entries backward starting from a document’s current entry to trace back its changes in order to locate the snapshot at time T. If the current URL’s PublishDate is less than T, then the current document itself is its snapshot at T. Otherwise the backward search starts.

  21. An entry’s predecessor has one of the following properties: • 1. If the entry has a null NewURL then its successor must have the same URL and the successor’s PublishDate must equal the entry’s ExpireDate. If no such successor is found, then this entry must have been generated due to a deletion. •  2. If the entry has a non-null NewURL then it must have a successor with a URL equal to the NewURL and the PublishDate equals to the entry’s ExpireDate. Renaming or relocation must have generated the successor.

  22. Retrieving Web Document P7’s Snapshot at time=T2 • Entries processed: • (P5, T1, T3, Null) • (P5, T3, T4, P7) • (P7, T4, Null, Null) • Document retrieved: • Archive(P5 + T1)

  23. Retrieving Web Document P8’s Snapshot at time =T0 • Entries processed: • (P1, T0, T1, P4) • (P4, T1, T4, P8) • (P8, T4, Null, Null) • P8 has been renamed at T1 andT4. It has not been modified since T0. • Document retrieved: • P8

  24. Requesting Web Document Snapshot with Temporal URL • A temporal URL is a URL submitted with temporal requirements of which the documents associated with the URL must meet. • Entering temporal requirements with QueryString • Example: • URL?SnapshotAsOf=date

  25. Requesting Web Document Snapshot With A Web Service • A web service is an application logic accessible via the Internet. • The snapshot-retrieving algorithm can be implemented as a web service that takes URL and snaptime as its inputs and returns the document snapshot as output.

  26. Creating A Web Document Snapshots Management Site • This is a site designed to manage web document snapshots and handle the requests for snapshot. Its objectives are: • (1) maintaining web document snapshots. • (2) providing interface for users to enter request for snapshots. • (3) educating users about the web document snapshot systems.

  27. Summary • This paper has two contributions: • 1. It presents an analysis for defining four levels of snapshots for a web document. • 2. It presents a design of the web document snapshot management system that is capable of dynamically creating web document snapshots. • Future research: • Designing a web document snapshots management site.

More Related