690 likes | 835 Views
Schema-Free XQuery. Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan. Presented by Gil Barash in the course SDBI 05 ’. Content. What is XQuery The problem of Schema-Based queries MLCAS Integrating MLCAS with XQuery Conclusion. XQuery.
 
                
                E N D
Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan Presented by Gil Barash in the course SDBI 05’
Content • What is XQuery • The problem of Schema-Based queries • MLCAS • Integrating MLCAS with XQuery • Conclusion
XQuery • XQuery is an XML Query Language. • Sometimes referred as the SQL of XML files. • It is built on XPath expressions. • It is supported by all major database engines. • It will soon become a W3C standard.
XPath • XPath is used to navigate through XML documents. In order for us to write an XQuery query, we should first get familiar with XPath…
Bibliography XML (version 1) bibliography <bibliography> <bib> <year> 1999 </year> <book> <title> SQL </title> <author> Bob </author> </book> <article> <title> XML </title> <author> Mary </author> </article> </bib> … … </bibliography> bib year book article 1999 title author title author SQL Bob XML Mary bib year book article 2000 title author title author D.B. David .NET Bill
XPath - example <bibliography> <bib> <year> 1999 </year> <book> <title> SQL </title> <author> Bob </author> </book> <article> <title> XML </title> <author> Mary </author> </article> </bib> <bib> … … </bib> </bibliography> The expression: /bibliograph/bib/* Will return the nodes: <year> , <book> and <article> / bibliograph/bib /* Look from the root of the document Under the path “bibliography/bib” For all child nodes
XPath - example <bibliography> <bib> <year> 1999 </year> <book> <title> SQL </title> <author> Bob </author> </book> <article> <title> XML </title> <author> Mary </author> </article> </bib> <bib> … … </bib> </bibliography> The expression: /bibliography//title Will return both the titles “SQL” and “XML” /bibliography//title For all child nodes of the root which are named “bibliography” Look for any descendent (not only direct children) For the nodes named “title”
XPath - example <bibliography> <bib> <year> 1999 </year> <book> <title> SQL </title> <author> Bob </author> </book> <article> <title> XML </title> <author> Mary </author> </article> </bib> <bib> … … </bib> </bibliography> The expression: //bib[1] Will return the sub tree rooted by the first ‘bib’ // bib[1] Look somewhere in the document For the 1st bib node
XQuery queries • Suppose we want to find the title of the book of which Mary is an author. • Our Query will be: FOR $x IN doc(“doc.xml”)/bibliography/bib/book WHERE $x/author/text()=“Mary” RETURN $x/title
XQuery - example • FOR $x IN doc(“doc.xml”)/bibliography/bib/book For all sub trees (marked as $x) in the document “doc.xml” under the XPath: /bibliograyph/bib/book • WHERE $x/author/text()=“Mary” If in the sub tree $x there is a path /author/ and the text of the node at the end of the path is “Mary”.
XQuery - example • RETURN $x/title Return the node which is under the path /title from the $x sub tree.
Bibliography XML (version 1) bibliography bib bib year book article year book article 2000 title author title author 1999 title author title author D.B. David .NET Bill SQL Mary XML Mary FOR $x IN doc(“doc.xml”)/bibliography/bib/book WHERE $x/author/text()=“Mary” RETURN $x/title
XQuery - example bibliography • Suppose we want to find the authors that wrote a book with Mary. bib bib year book article year book 2000 title author title author 1999 title author author D.B. David XML Mary SQL Mary Bill
XQuery - example • Suppose we want to find the authors that wrote a book with Mary. FOR $b IN doc(“doc.xml”)/bibliography/bib/book, $a IN $b/author WHERE $b/author/text()=“Mary”AND $a/text() != “Mary” RETURN $a
XQuery - example • FOR $b IN doc(“doc.xml”)/bibliography/bib/book, $a IN $b/author • For all sub trees (marked as $b) in the document “doc.xml” under the XPath: /bibliograyph/bib/book • And all sub trees (marked as $a) in the tree $b under the XPath: /author Ahhh…$b is a book and $a is an author of the book
XQuery - example • WHERE $b/author/text()=“Mary”AND $a/text() != “Mary” • If $b contains a path /author ending with “Mary” • And $a isn’t “Mary” • RETURN $a Return the sub tree $a
Content • What is XQuery • The problem of Schema-Based queries • MLCAS • Integrating MLCAS with XQuery • Conclusion
The Schema-Based problem • Remember the first query? • We wanted to find a title of a book of which Mary is an author. • We never said that it will be under the path /bibliography/bib/book FOR $x IN doc(“doc.xml”)/bibliography/bib/book WHERE $x/author/text()=“Mary” RETURN $x/title
The Schema-Based problem • Furthermore Suppose we want to get the year of the book that Mary wrote… <bibliography> <bib> <year> 1999 </year> <book> <title> SQL </title> <author> Mary </author> </book> <article> … Notice that the year of the book IS NOT a descendent node of the book node, but of the bib node
The Schema-Based problem (getting the title) • Before: FOR $x IN doc(“doc.xml”)/bibliography/bib/book WHERE $x/author/text()=“Mary” RETURN $x/title • After: (getting the year) FOR $x in doc(“doc.xml”)/bibliography/bib/ WHERE $x/book/author/text()=“Mary” RETURN $x/year $x is now the bib node. If there exists a book written by Mary under that bib then the year of that bib is returned
The Schema-Based problem • We could have never written that query without knowledge about the structure of the XML file. • The query we wrote will not work on other files, even if they represent the same data, under a different structure.
Bibliography XML (version 2) Before After bibliography <bibliography> <bib> <book> <year> 1999 </year> <title> SQL </title> <author> Bob </author> </book> <book> <year> 2000 </year> <title> D.B. </title> <author> David </author> </article> </bib> … … </bibliography> bib book year title author 1999 SQL Bob book year title author 2000 D.B. David bib
The Schema-Based problem bibliography bib bib book book year title author year title author 2000 D.B. David 1999 SQL Bob • Our query (getting the year) from before: FOR $x in doc(“doc.xml”)/bibliography/bib/ WHERE $x/book/author/text()=“Mary” RETURN $x/year $x is a ‘bib’ node, and it has no child named year
3 kinds of people… • If the user has FULL knowledge of the structure, she can simply use XQuery. • If the user has NO knowledge of the structure, she can use keyword based queries (like XKeyword) • If the user has PARTIAL knowledge of the structure, she can use schema-free queries, and make good use of her knowledge.
Partial knowledge • Suppose you want to search all the books about Albert Einstein… • If you will be using a keyword based search. You will enter the keyword “Albert Einstein”. • Now, what if you want all the books written by Albert Einstein? • Your query will not change. Even though you know what you are really looking for.
XQuery with partial knowledge Suppose we want to find the title and year of the publications of which Mary is an author: FOR $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//year WHERE $a/text()=“Mary” RETURN { $b , $c } All we know are the names of the nodes which we are looking for
XQuery with partial knowledge bibliography bib bib year book article year book article 2000 title author title author 1999 title author title author D.B. David .NET Bill SQL Mary XML Mary FOR $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//year WHERE $a/text()=“Mary” RETURN { $b , $c }
Content • What is XQuery • The problem of Schema-Based queries • MLCAS • LCA • MLCA • MLCAS • Integrating MLCAS with XQuery • Conclusion
LCA • We would like to guess which part of the XML document is relevant for our search. • By reducing the XML tree, we would get more precise answers and avoid wrong ones. bibliography bib bib year book article year book article 2000 title author title author 1999 title author title author D.B. David .NET Bill SQL Mary XML Mary
LCA • Lowest Common Ancestor bibliography bib year book article 1999 title author title author SQL Bob XML Mary • What is the LCA of “title” and “author”?
LCA • Lowest Common Ancestor The LCA of “author” and “title” bibliography bib year book article 1999 title author title author SQL Bob XML Mary “book” is the root of the tree we should look within.
LCA • Lowest Common Ancestor The LCA of “author” and “title” bibliography bib year book article 1999 title author title author SQL Bob XML Mary “bib” doesn’t help us refine our search
Content • What is XQuery • The problem of Schema-Based queries • MLCAS • LCA • MLCA • MLCAS • Integrating MLCAS with XQuery • Conclusion
MLCA • Blindly computing the LCA might bring undesired results. • What we are looking for is:Meaningful Lowest Common Ancestor
Entity Type • A Type of a node is it’s tag name bibliography bib year book article 1999 title author title author SQL Bob XML Mary Nodes of the “title” type
Meaningfully Related • Consider two nodes “A” and “B”, of type “T1” and “T2” respectively. • If, we say that A and B are meaningfully related. • If, we say that A and B are related, being descendents of node C. • So far, this is much like LCA… A B C A B
D bib C book A Author B* Title B Title Meaningfully Related • There is an exception to the second case: Suppose that node B* is of the same type as B In this case, nodes “A” and “B” are NOT meaningfully related.
X D C A B* B MLCA • So we say that a node “D” is the MLCA of nodes “A” and “B” if: • “D” is a common ancestor of nodes “A” and “B”. • There is no node “C” that is the LCA of types “T1” and “T2” which is a descendent of node “D”
MLCA • For multiple nodes, we require that all the subsets will have a MLCA and that the MLCA of the whole set will be an ancestor of the MLCAs of the subsets. For example, if we are looking at the types: year, title and author bib is the MLCA of the types: year, title and author bib book is the MLCA of the types: title and author year book book 2000 title author title author D.B. David .NET Bill
MLCA • Lets’ try the query again… FOR $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//year WHERE $a/text()=“Mary” RETURN { $b , $c }
year year 1999 1999 title title SQL XML Bibliography XML FOR $a in doc(“doc.xml”)//author, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//year WHERE $a/text()=“Mary” RETURN { $b , $c } • “bib” is the MLCA of “author”, “title” and “year” bibliography bib bib year book article year book article 2000 title author title author 1999 title author title author D.B. David .NET Bill SQL Bob XML Mary • “author” = Mary
Content • What is XQuery • The problem of Schema-Based queries • MLCAS • LCA • MLCA • MLCAS • Integrating MLCAS with XQuery • Conclusion
year year 1999 1999 title title SQL XML MLCAS • The result of the query was almost right. • The problem was that “bib” is the MLCA of several groups of nodes which satisfy the query. • To solve this, we use:Meaningful Lowest Common Ancestor Structure bib Nodes requested: • Title • Author • Year year book article 1999 title author title author SQL Bob XML Mary
MLCAS • Given a set of types {t1…tm} from the query • MLCAS is a set of nodes {r, a1, … , am} • Where {a1… am} are nodes matching the types {t1…tm} • And r is the MLCA of {a1… am}
MLCAS example So this set is good for us So this set isn’t good for us • We are looking for the types: Author, Title and Year. • Set of nodes matching those types: • The MLCA of the set: {Mary, SQL, 1999} {David, SQL, 1999} {Bob, SQL, 1999} There is none bib[2] bib is the LCA of the nodes: Mary, SQL bibliography is the LCA of the nodes: David, SQL, 1999 bib nodes are the MLCA of the types: Author, Title, Year bibliography book is the MLCA of the types: Title, Author bib[1] bib[2] year book article year book book 2000 title author title author 1999 title author title author D.B. David .NET Bill SQL Bob XML Mary
year year 1999 1999 title title SQL XML author author Bob Mary MLCAS query example FOR $a in doc(“doc.xml”)//year, $b in doc(“doc.xml”)//title, $c in doc(“doc.xml”)//author WHERE $c/text()=“Mary” RETURN { $a , $b } bib bib bib year book article author author 1999 title title Bob Mary SQL XML
Other work on creating meaningful results • “Integrating Keyword Search into XML Query Processing (XML-QL)” - Daniela Florescu and Ioana Manolescu from INRIA Rocquencourt, France and Donald Kossmann from Univ. of Passau, Germany. • Use of hierarchical location in the XML (at what level the keyword should be). • Use of semantical location in the XML (tag name, CDATA, attribute …) • Use of the user’s knowledge of the structure of the XML file (Ex: if she knows that books are under the bib tag she can ask for those elements only).
bib book book title author title author D.B. David .NET Bill Other work on creating meaningful results • “XSEarch: A Semantic Search Engine for XML” - Sara Cohen, Jonathan Mamou, Yaron Kanza and Yehoshua Sagiv from the Hebrew University. • Enables the user to specify a tag name under which the keyword should be found. • Use of the fact that if the shortest path between two elements goes through the same tag name more than once, they are probably not meaningfully related. • Gives ranking to the results.
Content • What is XQuery • The problem of Schema-Based queries • MLCAS • Integrating MLCAS with XQuery • mlcas • Expand • Conclusion
Integrating MLCAS with XQuery • In order for us to integrate MLCAS into XQuery we will introduce a new function into XQuery: mlcas (surprising, isn't it?) • Whenever we want to make sure that the nodes exist in an MLCAS, we will add the condition: exists mlcas ($a, $b, $c)(exists is a keyword in XQuery)