html5-img
1 / 34

Querying Streaming XML Data

Querying Streaming XML Data. Layout of the presentation. Introduction Common Problems faced Solution proposed Basic Building blocks of the solution How to build up a solution to a given query Features of the system. Streaming XML. XML – standard for information exchange.

sagira
Download Presentation

Querying Streaming XML Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying Streaming XML Data

  2. Layout of the presentation • Introduction • Common Problems faced • Solution proposed • Basic Building blocks of the solution • How to build up a solution to a given query • Features of the system

  3. Streaming XML • XML – standard for information exchange. • Some XML documents only available in streaming format. • Streaming is like reading data from a tape drive. • Used in Stock Market, News, Network Statistics. • Predecessor systems used to filter documents.

  4. Structure of an XPath Query • Consists of a Location path and an Output Expression (name). • Location path consists of closure axis(//), node test (book) and predicate (year>2000). • e.g. //book[year>2000]/name

  5. Features of our Approach • Efficient • Easy to understand design. • Design of BPDT is tricky

  6. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author

  7. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Element satisfies the path

  8. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path

  9. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path Test passed. But year=2002?

  10. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path Test passed. But year=2002? Buffer both A & B

  11. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path Test passed. But year=2002? Buffer both A & B Failed price<11. Remove

  12. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path Test passed. But year=2002? Buffer both A & B Failed price<11. Remove Test passed. Output

  13. Problems caused by closure axis • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name

  14. Problems caused by closure axis • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name Fails year=2002

  15. Problems caused by closure axis • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name Fails year=2002 Passes year=2002

  16. Problems caused by closure axis • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <author> B </author> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name Lets add author. Result? Fails year=2002 Passes year=2002

  17. Handling XML Stream • Input – well formed XML stream. • Use SAX API to parse XML. • Events belong to • Begin = {(a, attrs, d)} • End = {(/a, d)} • Text = {(a, text(), d)} • XML Stream: {e1,e2,…,ei,…} ¦ eiЄ Begin υ End υ Text

  18. Grammar for XPath Queries • Q  N+[/O] • N  [/¦//] tag [F] • F  [FO[OP constant]] • FO  @attribute ¦ tag [@attribute] ¦text() • O  @attribute ¦text() • OP > ¦≥ ¦ = ¦ < ¦ ≥ ¦ ≠ ¦ contains • XPath query of the form N1N2…Nn/O • Cant handle Reverse Axis, Positional Functions.

  19. Solution to Query Query: /pub[year=2002]/book[price<11]/author PDA PDT

  20. Basic PushDown Transducer (BPDT) • Similar to PushDown Automata • Actions defined on Transition Arcs • Finite set of states • A Start state • A set of final states • Set of input symbols • Set of Stack symbols

  21. Building a BPDT Query: /pub[year>2000]/book[author]/name/text() Consider location step: /book[author] • Book – Author: Buffer for future: Begin event of Author. • Book – Author: Remove from Buffer: End event of Book. • Book – Author: Output result if predicates true: Begin event of Author.

  22. Basic Building Blocks XPath Expression: /tag[child]

  23. Buffer Operations needed • Enqueue(x): Add x to the end of the queue. • Clear(): Removes all items from the queue. • Flush(): Outputs all items in the queue in FIFO order. • Upload(): Moves all items to the end of the queue of a parent BPDT. • No Dequeue operation needed.

  24. Basic Building Blocks XPath Expression: /tag[@attr=val]

  25. Basic Building Blocks XPath Expression: /tag[text()=val]

  26. Basic Building Blocks XPath Expression: /tag[child@attr=val]

  27. Basic Building Blocks XPath Expression: /tag[child=val]

  28. A sample BPDT Query: /pub[year>2000]

  29. Building a solution HPDT for Query: //pub[year>2000]//book[author]//name/text()

  30. HPDT Structure • Each BPDT in HPDT has: • Position • BPDT POSITION(l,K) :- l = depth of BPDT in HPDT, K = sequence # from right to left • BPDT Position (i-1,k) – has right child BPDT position (i,2k) – connected to NA state • BPDT Position(i-1,k) – has left child BPDT position (I,2k+1) – connected to True state. • BPDT Position (i, 2i – 1) – means predicates in higher level BPDT’s evaluate to true Buffer – potential results Stack – stack of elements (SAX) events Depth Vector

  31. Example Query • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name 3 paths from $1 to $14

  32. System Features

  33. Reference • Feng Peng and Sudarshan Chawate. XPath Queries on Streaming Data. In SIGMOD 2003.

  34. Thank You ???

More Related