1 / 32

Web Site Management Based on Declarative Specifications

Web Site Management Based on Declarative Specifications. Alon Levy University of Washington Joint work with: Strudel: Dana Florescu (INRIA), Mary Fernandez, Dan Suciu (AT&T), Khaled Yagoub (INRIA) Tiramisu: Corin Anderson and Dan Weld (UW). Problem: Building Web sites.

juliet
Download Presentation

Web Site Management Based on Declarative Specifications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Site Management Based on Declarative Specifications Alon Levy University of Washington Joint work with: Strudel: Dana Florescu (INRIA), Mary Fernandez, Dan Suciu (AT&T), Khaled Yagoub (INRIA) Tiramisu: Corin Anderson and Dan Weld (UW)

  2. Problem: Building Web sites • Building Web sites involves three tasks: • Selecting and managing the site’s content • Organizing the site’s structure (pages and links) • Designing the graphical presentation of pages. • In current tools, these tasks are (mostly) interdependent. • Strudel’s key ideas: • Separate the three tasks. • Manage content and structure declaratively.

  3. Content Management and Graphical Presentation • Content may be derived from multiple sources: • Databases: relational, object-oriented • Semi-structured sources (XML, Word, Excel, bibtex). Classical data integration problem! (see Tsimmis, Garlic, Information Manifold, Tukwila) • Graphical presentation: • Need to integrate with tools that create animations, images, Java applets. • Create sets of similar HTML pages using templates.

  4. Web-Site Structure • The structure includes: • Set of pages and contents of each page, and • Links between the pages.

  5. Current practice • Current tools separate only content management from presentation: • Content managed by database: • Embed queries in HTML templates • Simple tools to view and modify structure at the extensional • level. • WYSIWYG tools for managing presentation. • But they still cannot: • explicitly manage site's global structure, or • flexibly choose content-management system • As a result it’s hard to: • modify the structure of a web-site, build multiple versions for • different classes of users, enforce integrity constraints.

  6. Talk Outline • Problem definition • Strudel architecture • Advantages of declarative specifications: • Specifying and verifying integrity constraints. • Automatic generation of run-time plans for managing data-intensive web sites. • Tiramisu: • Separating the design tool from the implementation. • Using a collection of tools to build a site.

  7. Strudel Evolution Strudel (Nov. 96)[AT&T] Strudel AT&T Release Strudel-R (INRIA) http://www.research.att.com/sw/tools/strudel Tiramisu (Sept. 98) (U. Washington)

  8. Strudel Architecture and System

  9. Strudel • Features: • Integrates content from multiple sources. • High-level declarative language for managing site’s structure (StruQL). • Advantages: • Derives multiple sites from the same data. • Supports easy restructuring and modification. • Provides platform for: • Enforcing integrity constraints • Designing policies for efficient run-time management of sites.

  10. Strudel Architecture

  11. Data Model • Strudel is based on a semi-structured data model: • labeled directed graphs. • nodes in the graph represent objects, • labels on arcs represent attribute names, • named collections. • Why semi-structured data? • raw data is often semi-structured (and I don’t mean that it’s • embedded in HTML) • convenient for data integration (a` la TSIMMIS) • web-sites are ultimately graphs.

  12. The StruQL Query Language • A StruQL query is a function from a set of input graphs to an • output graph. • A StruQL expression contains two parts: • A query component, and • A restructuring component. • Formally: • INPUT graph names • WHERE conjunction of regular path expression atoms • CREATEname the nodes in the output graph using Skolem functions • LINKspecify the links in the resulting graph. • StruQL evolved into XML-QL, (see WWW8 Conference)

  13. Article 1: Date: 8/1/97 Title: “Clinton announces new …” Priority: Headline Category: USA News Images: im1.gif, im.gif Text: “President Clinton announced…” Related article: article2 Article 2: Date: 8/2/97 Title: “FDA approves new cure for…” Priority: Top Story Category: Health Video: vid1.avi Text: “The Federal Drug Administration…” Example Raw Data

  14. CNN Web-site Query (part 1) Input graph of articles INPUT CNN-ARTICLES Create web page for each article WHERE Articles(a), note arc variable: l art -> l -> t, l in { "Title", "Abstract", "Date", "Text", "Image", "Topimage", "RelatedSite"}, a -> "Category" -> c CREATE ArticlePage(a) LINK ArticlePage(a) -> l -> t {WHERE a -> "RelatedArticle" -> r LINKArticlePage(a) -> "RelatedArticle" -> ArticlePage(r)}

  15. CNN Site Schema RootPage() a-> priority-> “headline” a-> category->c CategoryEntry(c) RootPageEntry(a) Data(t):- a -> l ->t l in {“title”, “top-image”} CategoryPage(c) a ->category->c ArticlePage(a) Data(t): a -> l -> t, l in { "Title", "Abstract",…}

  16. CNN Web-site Query (part #2) CREATE RootPage {WHERE a -> "Priority" -> "headline", l in { "Title", "Date", "Topimage"} CREATERootEntry(a) LINK RootPage -> "HeadlineStory" -> RootEntry(a), Link each headline story to its title, date, top image and full article RootEntry(a) -> "FullStory" -> ArticlePage(a), RootEntry(a) -> l -> t}

  17. HTML Templates <h1> <SFMT title EMBED> </h1> <h2> <SFMT date EMBED> </h2> <SIF top-image>, <SFMT top-image EMBED> <SFMT text EMBED> </SIF> <SFOR a IN related-article ORDER=descend KEY=date> <SFMT @a LINK=title> </SFOR> <BR>

  18. CNN Sports Query INPUT CNN WHERETopCategory(c), c -> "CategoryName" -> cn, cn="Sports", c -> "SubTopic" -> top, Articles(a), a -> l -> t, l in { "Title", "Abstract", "Date", "Text", "Image", "Topimage", "RelatedSite"}, a -> "Category" -> c, c=top CREATE ArticlePage(a) LINK ArticlePage(a) -> l -> t

  19. StruQL Details • Regular path expressions are constructed by a grammar: • R <- “a” |e | R1.R2 |R1|R2 |R1* | L| _ • Atoms in the WHERE clause are of the form X -> R -> Y or C(X) • The LINK clause includes atoms of the form: • LINK f(X) --> “new link” --> g(X) or • LINK f(X) --> L --> g(X) • Queries can be nested, inheriting the WHERE clauses of • their outer blocks. • Note separation between querying part and restructuring part!

  20. More on StruQL • Bare bones language for semi-structured data: includes the essential features. • More expressive than Lorel or UnQL (e.g., can reverse graphs) • Conceptually and in practice: separation between query component and restructuring component is important. • Containment is decidable for StruQL-WHERE (Florescu, Levy & Suciu, PODS-98)

  21. Advantages of Declarative Specifications

  22. Enforcing Integrity Constraints • We often want to verify some constraints on site structure: • all articles from the last two days are reachable from the root • all paths to confidential data must go through an authentication node • Good site design principles are summarized as integrity constraints [Lohse, CACM, 98]. • When site specs are long, constraints are hard to enforce. • Want to verify constraints intentionally.

  23. Intentional IC Verification • Formally, we want to check whether: S(D) |= IC • S is the site specification (e.g., StruQL Query) • IC is a formula describing the constraint: a, Article(a) & date(a) > today-2 => Root -> * -> ArticlePage(a). • for any instance D of the underlying data. • Results: • Sound and complete algorithms for verification of a class of integrity constraints (path constraints). • Algorithms will also propose corrections when IC’s are violated.

  24. Run-time Management of Sites • When do we compute web pages? • Static approach: completely precompute site • Doesn’t work for large sites, forms, hard to update. • Dynamic approach: compute pages on request • Users may wait, a lot of repeated computation, structure of the site is not exploited. • Current tools use one of the extremes, or specify policy per collection of pages. • The specification is implicit in code. • Our goal: use site specification to automatically find optimal strategy.

  25. Possible Run-time Optimizations • View materialization • Function caching: • when web sites represent hierarchically structured data, successive queries in the site differ only in their projected attributes. • Simplification under preconditions: • previous queries on the path may have already verified some conditions for current query. • Lookahead computation: • often it is possible with little cost to compute the data necessary for subsequent pages.

  26. Problem Statement • Given: • site specification • knowledge about browsing patterns • cost function • Produce: • Operational plan: operational schema + a set of queries to compute on a given page request. • Results: (in Strudel-R): framework + • Performance study of the optimizations. • Algorithm for generating operational plans. • Identification of many open problems.

  27. Strudel Experience --> Tiramisu

  28. Experiences with Strudel(except for the lousy GUI) • Integrating data from multiple sources when building a Web site • is a prime concern. Sources are semi-structured! • Declarative specification of site structure is very important • because: • site creation is a highly iterative process • site owners often need redesign after experience from • deployment • we often generate multiple versions of sites from the • same data. • Design of web-sites is done in a top-down fashion. • Strudel can’t be the all encompassing web-site management tool.

  29. Tiramisu: the Second Generation • Strudel and its siblings (Araneus, YAT, WebOQL, WIRM) force the design and implementation of the site to be done in the same tool. • Furthermore, there will always be tools that are specialized for specific tasks. • Tiramisu: • Separate design phase from implementation. • Allow the implementation to be done by a set of cooperating tools.

  30. Tiramisu Architecture mediator data source E/R style diagram of site (site schema) data source web site Implementation manager data source wrapper wrapper wrapper Tool (ASP) Tool (FrontPage) Tool (Strudel)

  31. Screenshot of a TERD

  32. Conclusions • Web-site management is an important area for Database research. • First-generation systems (Strudel, Araneus, YAT, WebOQL) offer important advantages: • Easy modification, creation of multiple versions • enforcing constraints, run-time management • Second generation: (Tiramisu) • Emphasize design phase of site • Implement with a collection of cooperating tools.

More Related