1 / 17

Modeling and Querying Web Data A Survey

Modeling and Querying Web Data A Survey. By Li Lu. Overview. Introduction Data Representation for Querying the Web Modeling and Querying the Web Summary and Future. Introduction. Background

shepherde
Download Presentation

Modeling and Querying Web Data A Survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling and Querying Web DataA Survey By Li Lu

  2. Overview • Introduction • Data Representation for Querying the Web • Modeling and Querying the Web • Summary and Future

  3. Introduction • Background • The most common techniques used in searching information from the Web are based on sending information retrieval requests to index servers. • Use web query techniques to locate, filter and present web information. • Challenges • Difficult to build a common model for Web. • Hard to extract information from web data.

  4. Data Representation for Querying the Web • Graph Data Models • Based on a labeled graph in which the nodes represent web pages, edges represent links between web pages, and the labels on the edges can be attribute names • Capable of express navigational queries over the graph structure. • Semistructured Data Models • Based on labeled directed graphs. There is no restriction on the number of edges that can go out from a given node, or on the type of attribute value. • Be able to query the schema or the labels on the edges of the graph

  5. Data Representation for Querying the Web(cont.) A Hypertree Containing a Publications Database (WebOQL) [AM98]

  6. Data Representation for Querying the Web(cont.) • Semantic Web Data Models • Semantic Web is a Web whose content can be annotated by metadata and be processed automatically by machines. • The formulation of semantic assertions of semantic Web is based on Resource Description Framework (RDF) model [LS99], which can be viewed as a partially labeled directed graph. • They have the ability to exploit the semantics of the Web content and can provide better query result than their counterpart that based on the content and structure of the Web data.

  7. Data Representation for Querying the Web(cont.) An Example RDF Graph [WWW1]

  8. Modeling and Querying the Web • Query Languages for Graph Representation of Website • The query languages combine both the content-based queries and structure-based queries. Therefore, they are able to formulate regular path expression queries and to express navigational queries over the graph structure. • WebSQL [MMM97], W3QL [KS95], WebLog [LSS96] • Example: WebSQL [MMM97]

  9. Modeling and Querying the Web (cont.) WebSQL [MMM97] • Model of Web as a relational database with two virtual relations: Document and Anchor. “Document[url, title, text, type, length, modif]” “Anchor[base, href, label]” • To map onto the graph structure of the WWW, each document in the Document relation is mapped to a node object in the graph and each hypertext link between two documents in Anchor relation is represented by a link object.

  10. Modeling and Querying the Web (cont.) • Sample query [FLM98]: to find a list of tuples of the form (d1, d2, label), where d1 is a document stored at local site, d2 is a document stored somewhere else, and d1 points to d2 by a link labeled label. Suppose all the local documents are reachable from www.mysite.start. “SELECT d.url, e.url, a.label FROM Document d SUCH THAT www.mysite.start* d, Document e SUCH THAT d => e, Anchor a SUCH THAT a.base = d.url WHERE a.href = e.url”

  11. Modeling and Querying the Web (cont.) • Query Languages for Semi-Structured Representation of Website • To discover the implicit structure within the semistructured Web data and then recast the Web data to fit into the discovered structure • WG-Log [CDPT98], ULIXES and PENELOPE [AMM97a], WebOQL [AM98] • Example: WebOQL [AM98]

  12. Modeling and Querying the Web (cont.) WebOQL [AM98] • Introduced a hypertree data structure. Hypertree is an ordered arc-labeled tree with two kinds of arcs, internal arcs and external arcs. Internal arcs are used to indicate structured objects and external arcs are used to indicate hyperlinks among objects. Arcs are labeled with records. A Hypertree Containing a Publications Database (WebOQL) [AM98]

  13. Modeling and Querying the Web (cont.) • Represent web pages by hypertree and mapping function. Mapping function is used to map URLs to corresponding hypertrees. The hypertree and mapping function are also called schema and browsing function of the Web respectively. • Sample query [FLM98]: to extract the title and URL of the full version of papers authored by “Smith” from the csPapers database. “SELECT [y.Title, y’.Url] FROM x in csPapers, y in x’ WHERE y.Authors ~ “Smith” ”

  14. Modeling and Querying the Web • Query Languages for Semantic Web • Semantic web is a web whose content can be annotated by metadata and be processed automatically by machines. • Semantic query has the ability to exploit the semantics of the Web content. • RQL [KACPS02], SquishQL [MSR02] , TRIPLE [SBAHKW02].

  15. Summary and Future • Summary • Web data models are divided into three main categories: graph data model, semistructured data model and semantic web data model. • Based on these data models, Web query languages are also classified into three primary groups. • Future • To develop techniques to manipulate dynamic pages could be beneficial to Web query application and it may be a promising direction for future research. • To combine the query result from different resource on the Web, especially the result from both structured and unstructured data sources also pose some challenges for future research.

  16. References [AM98] G. Arocena, A. Mendelzon, “WebOQL: Restructuring Documents, Databases, and Webs”, Proc. ICDE'98, Orlando, Florida, Feb. 1998. [CDPT98] S. Comai, E. Damiani, R. Posenato, L. Tanca, “A Schema-based Approach to Modeling and Querying WWW Data”, Proc. of FQAS'98, Roskilde, May 1998, LNAI 1495. [AMM97a] P. Atzeni, G. Mecca, P. Merialdo, “To Weave the Web”, International Conference on Very Large Data Bases (VLDB'97), Athens, Greece, August 26-29, 1997, pages 206-215. [FLM98] D. Florescu, A. Levy, A. Mendelzon, “Database Techniques for the World-Wide Web: A Survey”, SIGMOD Record 27, 3 (1998), 59-74. [KACPS02] G. Karvounarakis, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl, “RQL: A Declarative Query Language for RDF”, WWW2002, May 2002, Honolulu, Hawaii. [KS95] D. Konopnicki and O. Shmueli, “W3QS: A query system for the World Wide Web”, In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 54-65, Zurich, Switzerland, 1995. [LSS96] L. V. S. Lakshmanan, F. Sadri, L. N. Subramanian, “A declarative language for querying and restructuring the Web”, In Proc. of the sixth International Workshop on Research Issues in Data Engineering, RIDE’96, New Orleans, February 1996. [MM97] A. O. Mendelzon, T. Milo, “Formal Models of Web Queries”, Proceedings of the Sixteenth ACM Symposium on Principles of Database Systems, 134-143, 1997. [MMM97] A. Mendelzon, G. Mihaila, T. Milo, “Querying the world wide web”, International Journal on Digital Libraries, 1(1):54-67, 1997.

  17. [MSR02] L. Miller, A. Seaborne, A. Reggiori, “Three Implementations of SquishQL, a Simple RDF Query Language”, Proceedings of 1st International Semantic Web Conference. ISWC2002, Sardinia, Italy, June 9-12, 2002 [SBAHKW02]A. Sheth, C. Bertram, D. Avant, B. Hammond, K. Kochut, Y. Warke, “Semantic Content Management for Enterprises and the Web”, IEEE Internet Computing, July/August 2002, pp.80-87, 2002. [WWW1] http://www.amk.ca/talks/semweb-intro, “Introduction to the Semantic Web and RDF”

More Related