1 / 51

Introduction To XML Algebra

Introduction To XML Algebra. Wan Liu Bintou Kane Advanced Database Instructor: Elka 2/11/2002 1. Outline. Reasons for XML algebra Niagara algebra AT&T Algebra. Data Model and Design. We need a clear framework to design a database

xuan
Download Presentation

Introduction To XML Algebra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction To XML Algebra Wan Liu Bintou Kane Advanced Database Instructor: Elka 2/11/2002 1

  2. Outline • Reasons for XML algebra • Niagara algebra • AT&T Algebra

  3. Data Model and Design • We need a clear framework to design a database • A data model is like creating different data structures for appropriate programming usage. It is a type system, it is abstract. • Relational database is implemented by tables, XML format is a new one method for information integration.

  4. Why XML Algebra? • It is common to translate a query language into the algebra. • First, the algebra is used to give a semantics for the query language. • Second, the algebra is used to support query optimization.

  5. XML Algebra History • Lore Algebra (August 1999) -- Stanford University • IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp • YAT Algebra (May 2000) • AT&T Algebra (June 2000) --AT&T; Bell Labs • Niagara Algebra (2001) -- University of Wisconsin -Madison

  6. NIAGARA • Title : Following the paths of XML Data: An algebraic framework for XML query evaluation By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

  7. OutLine • Concepts of Niagara Algebra • Operations • Optimization

  8. Goals of Niagara Algebra • Be independent of schema information • Query on both structure and content • Generate simple,flexible, yet powerful algebraic expressions • Allow re-use of traditional optimization techniques

  9. Example: XML Source Documents • Invoice.xml • <Invoice_Document> • <invoice No = 1> • <account_number>2 </account_number> • <carrier>AT&T</carrier> • <total>$0.25</total> • </invoice> • <invoice> • <account_number>1 </account_number> • <carrier>Sprint</carrier> • <total>$1.20</total> • </invoice> • <invoice> • <account_number>1 </account_number> • <carrier>AT&T</carrier> • <total>$0.75</total> • </invoice> • </Invoice_Document> Customer.xml <Customer_Document> <customer> <account>1 </account> <name>Tom</name> </customer > <customer> <account>2 </account> <name>George</name> </customer > </Customer _Document>

  10. XML Data Model and Tree Graph Example: Invoice_Document <Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice> <invoice> <number>1</number> <carrier>Sprint</carrier><total>$1.20</total> </invoice> </Invoice_Document> … Invoice Invoice number carrier number total total carrier 2 AT&T $0.25 1 $1.20 Sprint Ordered Tree Graph, Semi structured Data

  11. XML Data Model [GVDNM01] • Collection of bags of vertices. • Vertices in a bag have no order. • Example: Rootinvoice.xml invoice invoice.account_number < account_number > element-content </ account_number > <invoice> Invoice-element-content </invoice> [Root“invoice.xml”, invoice, invoice. account_number]

  12. Data Model • Bag elements are reachable by path expressions. • The path expression consists of two parts : • An entry point • A relative forward part • Example: account_number:invoice

  13. Operators • Source S , Follow, Select , Join , Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product.

  14. Source Operator S • Input :a list of documents • Output :a collection of singleton bags • Examples : S (*) All Known XML documents S (invoice*.xml) All XML documents whose filename matches “invoice*.xml S (*,schema.dtd) All known XML documents that conform to schema.dtd

  15. Follow operator  • Input :a path expression in entry point notation • Functionality : extracts vertices reachable by path expression • Output : a new bag that consist of the extracted vertex + all the contents of the original bag (in care of unnesting follow)

  16. Follow operator (Example*) {[Rootinvoice.xml , invoice, invoice.carrier]} Rootinvoice.xml invoice invoice.carrier <carrier> carrier -element-content </carrier > <invoice> Invoice-element-content </invoice> *Unnesting Follow (carrier:invoice) Rootinvoice.xml invoice <invoice> Invoice-element-content </invoice> {[Rootinvoice.xml , invoice]}

  17. Select operator  • Input: a set of bags • Functionality :filters the bags of a collection using a predicate • Output : a set of bags that conform to the predicate • Predicate:Logical operator (,,), or simple qualifications (,,,,,)

  18. Select operator (Example) {[Rootinvoice.xml , invoice],… } Rootinvoice.xml invoice <invoice> Invoice-element-content </invoice> invoice.carrier =Sprint Rootinvoice.xml invoice Rootinvoice.xml invoice <invoice> Invoice-element-content </invoice> <invoice> Invoice-element-content </invoice> {[Rootinvoice.xml , invoice], [Rootinvoice.xml , invoice], ……………}

  19. Join operator • Input:two collections of bags • Functionality:Joins the two collections based on a predicate • Output:the concatenation of pairs of pages that satisfy the predicate

  20. Join operator (Example) {[Rootinvoice.xml , invoice, Rootcustomer.xml , customer]} Rootinvoice.xml invoiceRootcustomer.xml customer <invoice> Invoice-element-content </invoice> <customer> customer-element-content </customer> account_number: invoice =number:customer Rootinvoice.xml invoice Rootcustomer.xml customer <invoice> Invoice-element-content </invoice> <customer> customer-element-content </customer> {[Rootinvoice.xml , invoice]} {[Rootcustomer.xml , customer]}

  21. Expose operator  • Input:a list of path expressions of vertices to be exposed • Output:a set of bags that contains vertices in the parameter list with the same order

  22. Expose operator (Example) {[Rootinvoice.xml , invoice.bill_period, invoice.carrier]} Rootinvoice.xml invoice. bill_period invoice.carrier <carrier> bill_period -element-content </carrier > <invoice> carrier-element-content </invoice> (bill_period,carrier) Rootinvoice.xml invoiceinvoice.carrier invoice.bill_period <carrier> bill_period -element-content </carrier > <invoice> Invoice-element-content </invoice> <invoice> carrier-element-content </invoice> {[Rootinvoice.xml , invoice, invoice.carrier, invoice.bill_period]}

  23. Vertex operator  • Creates the actual XML vertex that will encompass everything created by an expose operator • Example :  (Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

  24. Other operators • Group: is used for arbitrary grouping of elements based on their values • Aggregate functions can be used with the group operator (i.e. average) • Rename  :Changes the entry point annotation of the elements of a bag. • Example:(invoice.bill_period,date)

  25. Example: XML Source Documents • Invoice.xml • <Invoice_Document> • <invoice> • <account_number>2 </account_number> • <carrier>AT&T</carrier> • <total>$0.25</total> • </invoice> • <invoice> • <account_number>1 </account_number> • <carrier>Sprint</carrier> • <total>$1.20</total> • </invoice> • <invoice> • <account_number>1 </account_number> • <total>$0.75</total> • </invoice> • <auditor>maria</auditor> • </Invoice_Document> • Customer.xml • <Customer_Document> • <customer> • <account>1 </account> • <name>Tom</name> • </customer > • <customer> • <account>2 </account> • <name>George</name> • </customer > • </Customer _Document>

  26. List account number, customer name, and invoice total for all invoices that has carrier = “Sprint”. Xquery Example FOR $i in (invoices.xml)//invoice, $c in (customers.xml)//customer WHERE $i/carrier = “Sprint” and $i/account_number= $c/account RETURN <Sprint_invoices> $i/account_number, $c/name, $i/total </Sprint_invoices>

  27. Example: Xquery output <Sprint_Invoice> <account_number>1 </account_number> <name>Tom</name> <total>$1.20</total> </Sprint_Invoice >

  28. Algebra Tree Execution Account_number name total Expose (*.account_number , *.name, *.total ) invoice(2) customer(1) Join (*.invoice.account_number=*.customer.account) invoice (2) Select (carrier= “Sprint” ) Invoice (1) invoice (2) invoice (3) customer(1) customer (2) Follow (*.invoice) Follow (*.customer) Source (Invoices.xml) Source (cutomers.xml)

  29. Optimization with Niagara Optimizer based on the Niagara algebra • Use the operation more efficiently • Produce simpler expression by combining operations

  30. Language Convention • A and B are path expressions • A< B -- Path Expression A is prefix of B • AnB --- Common prefix of path A and B • AńB --- Greatest common of path A and B • ┴ --- Null path Expression

  31. Use of Rule 8.5 Make profit of rule 8.5 Allows optimization based on path selectivity When applying un-nesting follow operation Φμ

  32. Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)] True When Exist C / C <A && C < B C = AńB Or AnB = ┴ Interchangeability of Follow operation

  33. Application of 8.5 With Invoice Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] * ?= Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] ** Both Share the common prefix invoice Case AńB = invoice

  34. Benefit of Rule Application Note if: acc_Num required for each invoice Element carrier is not required for invoice Element Then using * Φμ(acc_Num:invoice)[Φμ(acc_Num:customer)] make more sense than ** Why?

  35. Reduction of Input Size on the first Sub-operation Φμ(carrier:invoice) Should we or can we apply the 8.5 below? Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)] Why?

  36. acc_Num:invoice and acc_Num:Customer are totally different path Case is:AnB = ┴ Then yes

  37. Rule 8.7 , 8.9 , 8.11 Interesting Helps identify When and where to use selection  to decrease size of input operation to subsequent operation Example Algebra tree slide 28 Selected before join.

  38. Addition would be Give computation for finding when rule can be applied automatically in a case and then apply it.

  39. AT&T Algebra

  40. AT&T Algebra Introduction • The algebra is derived from the nested relational algebra. • AT&T algebra makes heavy use of list comprehensions, a standard notation in the function programming community. • AT&T algebra uses the functional programming language Haskell as a notation from presenting the algebra.

  41. AT&T data model • The data model merges attribute and element nodes, and eliminates comments. • Declare Basic Type: Node. Text :: String ->node elem :: Tag -> [Node] ->node ref :: Node ->Node elem “bib” [ elem “book”[ elem “@year” [ text “1999” ], elem “title”[text “Data on the web” ] ]] <bib> <book year=“1999”> <title> Data on the Web</title> <year> 1999</year> </book> </bib>

  42. Basic Type Declarations • To find the type of a node, isText :: Node -> Bool isElem :: Node -> Bool isRef :: Node -> Bool • For a text node, string :: Node -> String • For an element node, 1)tag :: Node -> Tag 2)children :: Node -> [Node] • For a reference node, dereference :: Node -> Node

  43. Nested relational algebra… • In the nested relational approach, data is composed of tuples and lists. • Tuple values and tuple types are written in round brackets. (1999,"Data on theWeb",["Abiteboul"]) :: (Int,String,[String]) Decompose values: year :: (Int,String,[String]) year (x,y,l) = x

  44. Nested relational algebra… • Comprehensions: List comprehensions can be used to express fundamental query operations, navigation, cartesian product, nesting, joins. • Example: [ value x | x <- children book0, is "author" x ] ==> [ "Abiteboul" ] • Normal expression:[ exp | qual1,...,qualn ] • bool-exp • pat <- list-exp

  45. Nested relational algebra… • Using comprehensions to write queries. Navigate follow :: Tag -> Node -> [Node] follow t x = [ y | y <- children x, is t y ] Cartesian product [ (value y, value z) | x <- follow "book" bib0, y <- follow "title" x, z <- follow "author" x ] ==> [ ("Data on the Web", "Abiteboul")]

  46. Nested relational algebra… • Joins. elem "reviews" [ elem "book" [ elem "title" [ text"Data on the Web" ], elem "review" [ text "This is great!" ]] [ (value y, int (value z), value w) | x <- follow "book" bib0, y <- follow "title" x, z <- follow "@year" x, u <- follow "book" reviews0, v <- follow "title" u, w <- follow “@year" u, y == v ] ==> [("Data on the Web", 1999, "This is great!")] elem “bib” [ elem “book”[ elem “@year” [ text “1999” ], elem “title” [text “Data on the web” ] ]]

  47. Nested relational algebra… • Regular expression matching ( [ (x,y,u) | x <- item "@year", y <- item "title", u <- rep (item "author") ] ) :: Reg (Node,Node,[Node] ) Match :: Reg a -> Node-> [a] Result match reg0 book0 ==> [(elem "@year" [text "1999"], elem "title" [text "Data on the Web"], [elem "author" [text "Abiteboul"], elem "author" [text "Buneman"], elem "author" [text "Suciu"] ] ) ]

  48. Nested relational algebra… • Sorting. sortBy :: (a -> a -> Bool) -> [a] -> [a] sortBy (<=) [3,1,2,1] ==> [1,1,2,3] • Grouping groupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy (==) [3,1,2,1] == [[2],[1,1],[3]]

  49. Cross Comparisons of Algebra • Niagara and AT&T standalone XML algebras • Niagara proposed after W3C had selected proposed standard and has operators which operate on sets of bags • At&T algebra chosen as proposed standard by W3C -- expressions resemble high level query language -- latest version of document referred to as “Semantics of XML Query Language XQuery”

More Related