querying streaming xml data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Querying Streaming XML Data PowerPoint Presentation
Download Presentation
Querying Streaming XML Data

Loading in 2 Seconds...

play fullscreen
1 / 34
sagira

Querying Streaming XML Data - PowerPoint PPT Presentation

99 Views
Download Presentation
Querying Streaming XML Data
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Querying Streaming XML Data

  2. Layout of the presentation • Introduction • Common Problems faced • Solution proposed • Basic Building blocks of the solution • How to build up a solution to a given query • Features of the system

  3. Streaming XML • XML – standard for information exchange. • Some XML documents only available in streaming format. • Streaming is like reading data from a tape drive. • Used in Stock Market, News, Network Statistics. • Predecessor systems used to filter documents.

  4. Structure of an XPath Query • Consists of a Location path and an Output Expression (name). • Location path consists of closure axis(//), node test (book) and predicate (year>2000). • e.g. //book[year>2000]/name

  5. Features of our Approach • Efficient • Easy to understand design. • Design of BPDT is tricky

  6. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author

  7. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Element satisfies the path

  8. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path

  9. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path Test passed. But year=2002?

  10. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path Test passed. But year=2002? Buffer both A & B

  11. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path Test passed. But year=2002? Buffer both A & B Failed price<11. Remove

  12. Common Problems faced • <root> • <pub> • <book id=”1”> • <price> 12.00 </price> • <name> First </name> • <author> A </author> • <price type=”discount”> 10.00 </price> • </book> • <book id=”2”> • <price> 14.00 </price> • <name> Second </name> • <author> A </author> • <author> B </author> • <price type=”discount”> 12.00 </price> • </book> • <year> 2002 </year> • </pub> • </root> Query: /pub[year=2002]/book[price<11]/author Failure?? Element satisfies the path Test passed. But year=2002? Buffer both A & B Failed price<11. Remove Test passed. Output

  13. Problems caused by closure axis • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name

  14. Problems caused by closure axis • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name Fails year=2002

  15. Problems caused by closure axis • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name Fails year=2002 Passes year=2002

  16. Problems caused by closure axis • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <author> B </author> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name Lets add author. Result? Fails year=2002 Passes year=2002

  17. Handling XML Stream • Input – well formed XML stream. • Use SAX API to parse XML. • Events belong to • Begin = {(a, attrs, d)} • End = {(/a, d)} • Text = {(a, text(), d)} • XML Stream: {e1,e2,…,ei,…} ¦ eiЄ Begin υ End υ Text

  18. Grammar for XPath Queries • Q  N+[/O] • N  [/¦//] tag [F] • F  [FO[OP constant]] • FO  @attribute ¦ tag [@attribute] ¦text() • O  @attribute ¦text() • OP > ¦≥ ¦ = ¦ < ¦ ≥ ¦ ≠ ¦ contains • XPath query of the form N1N2…Nn/O • Cant handle Reverse Axis, Positional Functions.

  19. Solution to Query Query: /pub[year=2002]/book[price<11]/author PDA PDT

  20. Basic PushDown Transducer (BPDT) • Similar to PushDown Automata • Actions defined on Transition Arcs • Finite set of states • A Start state • A set of final states • Set of input symbols • Set of Stack symbols

  21. Building a BPDT Query: /pub[year>2000]/book[author]/name/text() Consider location step: /book[author] • Book – Author: Buffer for future: Begin event of Author. • Book – Author: Remove from Buffer: End event of Book. • Book – Author: Output result if predicates true: Begin event of Author.

  22. Basic Building Blocks XPath Expression: /tag[child]

  23. Buffer Operations needed • Enqueue(x): Add x to the end of the queue. • Clear(): Removes all items from the queue. • Flush(): Outputs all items in the queue in FIFO order. • Upload(): Moves all items to the end of the queue of a parent BPDT. • No Dequeue operation needed.

  24. Basic Building Blocks XPath Expression: /tag[@attr=val]

  25. Basic Building Blocks XPath Expression: /tag[text()=val]

  26. Basic Building Blocks XPath Expression: /tag[child@attr=val]

  27. Basic Building Blocks XPath Expression: /tag[child=val]

  28. A sample BPDT Query: /pub[year>2000]

  29. Building a solution HPDT for Query: //pub[year>2000]//book[author]//name/text()

  30. HPDT Structure • Each BPDT in HPDT has: • Position • BPDT POSITION(l,K) :- l = depth of BPDT in HPDT, K = sequence # from right to left • BPDT Position (i-1,k) – has right child BPDT position (i,2k) – connected to NA state • BPDT Position(i-1,k) – has left child BPDT position (I,2k+1) – connected to True state. • BPDT Position (i, 2i – 1) – means predicates in higher level BPDT’s evaluate to true Buffer – potential results Stack – stack of elements (SAX) events Depth Vector

  31. Example Query • <root> • <pub> • <book> • <name> X </name> • <author> A </author> • </book> • <book> • <name> Y </name> • <pub> • <book> • <name> Z </name> • <author> B </author> • </book> • <year> 1999 </year> • </pub> • </book> • <year> 2002 </year> • </pub> • </root> Query: //pub[year=2002]//book[author]//name 3 paths from $1 to $14

  32. System Features

  33. Reference • Feng Peng and Sudarshan Chawate. XPath Queries on Streaming Data. In SIGMOD 2003.

  34. Thank You ???