1 / 40

FoXtrot : Distributed Structural & Value XML Filtering

FoXtrot : Distributed Structural & Value XML Filtering. Iris Miliaraki * Department of Informatics and Telecommunications National and Kapodistrian University of Athens. * Supported by Microsoft Research through European PhD Scholarship Programme. Outline of the talk.

dane
Download Presentation

FoXtrot : Distributed Structural & Value XML Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FoXtrot:Distributed Structural & Value XML Filtering Iris Miliaraki* Department of Informatics and Telecommunications National and Kapodistrian University of Athens *Supported by Microsoft Research through European PhD Scholarship Programme

  2. Outline of the talk • XML Filtering scenario • FoXtrot system • Distributed structural matching • Distributed value matching • Experimental evaluation • Sum up and future work

  3. XML Filtering scenario Centralized Distributed XML Filtering system XPath/XQuery ? YFilter Index-Filter Parallel/Hierarchical XTrie Subscriber Publisher ONYX Gong et al. [ICDE05] FiST XTrie XPath/XQuery ? Li et al. [ICDCS08] XPush Snoeren [SOSP 2001] Publisher Subscriber

  4. XML Filtering scenario Mesh or tree-based overlays XPath/XQuery ? • Load imbalances Subscriber Publisher XPath/XQuery ? • Potential bottlenecks due to centralized control Publisher Subscriber

  5. FoXtrot Filtering of XML data using structured overlay networks • Load balanced XPath/XQuery ? • Scalable DHT Subscriber Publisher XPath/XQuery ? • Fully distributed Publisher Subscriber

  6. XML data model - example <bib> <article title=“XML Filtering” conf=“VLDB” year=“2007”> <author institure=“Harvard”> John Smith </author> </article> </bib> <bib> <article title=“XML Filtering” conf=“VLDB” year=“2007”> <authorinstiture=“Harvard” John Smith </author> </article> </bib> <bib> <article title=“XML Filtering” conf=“VLDB” year=“2007”> <authorinstiture=“Harvard” John Smith </author> </article> </bib> Value Matching • Structural Matching Q1: /bib/*/author[text()="John Smith"] Q2: /bib/phdthesis[@published=2005]/author[@nationality=greek] Q3: /bib/article[@conf=www] Q4: /bib/article[@year=2009]/author[@degree-from="UOA"] Q1: /bib/*/author[text()="John Smith"] Q2: /bib/phdthesis[@published=2005]/author[@nationality=greek] Q3: /bib/article[@conf=www] Q4: /bib/article[@year=2009]/author[@degree-from="UOA"] Q1: /bib/*/author[text()="John Smith"] Q2: /bib/phdthesis[@published>2005]/author[@nationality=greek] Q3: /bib/article[@conf=www] Q4: /bib/article[@year=2009]/author[@degree-from="UOA"]

  7. Automata-based approaches • XFilter and YFilter, ONYX, XTrie, IndexFilter, FiST etc. • Main idea • Construct an automaton from a set of XPath/Xquery queries • Use it as a matching engine against the XML documents Structural matching!

  8. bib title Q3 Example NFA (YFilter) Q1: /bib/phdthesis/year = ‘2010’ Q2: /bib/proceedings/school = ‘Univ. of Athens’ Q3: /bib/proceedings/title = ‘XML Dissemination’ Q1 year 3 Q4: /bib/*/author = ‘Michael Smith’ 2 phdthesis Q5: //*/cite [@id = 12743] Q2 school proceedings 1 4 5 * 6 0 Q4 author 7 8 * ε Q5 cite 9 10 11 *

  9. Designing FoXtrot • Moving to a distributed solution • Utilize automata-based techniques • Instead of a single centralized automaton, the automaton is shared by the DHT peers • Design and employ methods for filtering of XML data against a distributed automaton

  10. Distributing the NFA on top of DHT P10 11 P9 2 P1 10 FoXtrot P8 P2 3 P3 P7 0 7 6 5 P4 P6 9 4 P5 1

  11. Distributing the NFA on top of DHT P10 11 P9 2 P1 10 FoXtrot P8 P2 3 P3 P7 0 7 6 5 P4 P6 9 4 P5 1 2 4 7

  12. Distributing the NFA on top of DHT ℓ=1 ℓ=0 P10 11 2 3 P9 P1 11 10 FoXtrot P8 P2 3 P3 P7 9 1 0 10 7 6 6 5 P4 P6 9 10 6 5 4 P5 2 4 7 1

  13. Load balancing in FoXtrot Static replication • Create a fixed number r of replicas for each state • Load previously suffered by 1 peer, will be now shared by r+1 peers

  14. Load balancing in FoXtrot cont. Assumption: the frequency f of visiting an NFA state during filtering is inversely proportional to the NFA depth d of this state. Dynamic replication • Create r/d replicas for each state where d is the NFA depth of the state

  15. bib Centralized NFA Execution (YFilter) These paths can be executed in parallel! Incoming XML document Start of document <bib> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings> </bib> school proceedings title * End of document ε * * 5 9 10 6 9 10 Runtime stack 4 7 9 10 1 9 10 0

  16. Distributed NFA execution – Iterative P10 P9 4 P1 P2 1 5 1 9 10 P8 2 P3 0 6 Publisher becomes overloaded! 3 Runtime stack 7 P7 P4 P6 P5

  17. Distributed NFA execution – Recursive P10 P9 P1 P2 1 2 P8 P3 3 4 P7 3 2 4 2 P4 Several parallel executions P6 3 P5

  18. Distributed NFA Structural matching! What about value matching? Miliaraki, Z. Kaoudi and M. Koubarakis. XML Data Dissemination using automata on top of structured overlay networks. In WWW 2008.

  19. What about value matching? • Automata-based approaches efficient for structural matching • Queries apart from defining a structural path also contain value-based predicates /bib/phdthesis[@year>2005]/author[@nationality=greek] • We want FoXtrot to scale for both the size of the query set and the number of predicates per query

  20. Definitions • Attribute predicates: element[@attr op value]e.g. /bib/phdthesis[@published=2007] • Textual predicates: element[text() op value]e.g. /bib/*/author[text()=“John Smith”] So, how can we deal with value matching along with structural matching?

  21. Direct evaluation with automaton/trie author 3 • Treat predicates as elements! author 5 Q1: /dblp/phdthesis[@year=2005]/author[@nationality=greek] year author nationality conference 3 8 10 2 7 phdthesis Huge increase of NFA states! author text() bib * 5 9 0 1 4 Q2: /bib/*/author[text()=Michael Smith] article Destroy sharing of path expressions! conference text() 7 11 6 Q3: /bib/article/conference[text()=WWW 2009]

  22. Bottom-up evaluation • Common rule in relational query optimization  apply selections as early as possible • Works well for relational query processing A lot of effort evaluating predicates while the structure may not be matched

  23. Top-down evaluation • Check predicates after structural matching depending on predicate selectivity number of false positives may be very large

  24. Step-by-step evaluation • XPath queries consist of distinct steps • Each step contains one or more value-based predicates • Perform value matching with structural matching in a stepwise manner Effort spent for evaluating predicates while the structure may not be fully matched

  25. Moving on to details • Parse XML document and generate a set of candidate predicates to perform predicate evaluation Enriched parsing events Candidate predicates CP1:article[@title="XML Filtering"] CP2:article[@conf=VLDB] CP3:article[@year=2007] CP4:author[text()="John Smith"] CP5:author[@institute=Harvard]

  26. Delay value matching after structural matching Top-down evaluation • Execute distributed NFA • Only check predicates if a final state is reached • Each peer uses a local index mapping predicates to the list of queries that contain them (hash index)

  27. Example Q1: /bib/phdthesis[@published=2005]/author[@nationality=greek] Q2: /bib/*/author[text()="John Smith"] Q3: /bib/article[@conf=www] Q4: /bib/article[@year=2009]/author[@degree-from="UOA"] Q5: /bib/article[@year=2009]/cite[@paper-id=2392] Q6: /bib/article/cite[@paper-id=2770] Candidate predicates CP1:article[@title="XML Filtering"] CP2:article[@conf=VLDB] CP3:article[@year=2007] CP4:author[text()="John Smith"] CP5:author[@institute=Harvard] author 2 5 phdthesis 6 author bib article 0 1 3 cite * 7 conference 4 8

  28. Top-down evaluation with pruning • At each step of the execution, part of the NFA is revealed • Applies on equality predicates 2 5 IDEA: Use a compact summary of predicate information to stop NFA execution (prune)if we can deduce that no match can be found phdthesis 6 author bib article 0 1 3 cite * 7 conference 4 8

  29. Experimental evaluation • Implemented FoXtrot in Java using FreePastry release (http://freepastry.org) • Environment • 400 peers in PlanetLab (http://www.planet-lab.org/) • 112 peers in a local shared cluster (http://www.grid.tuc.gr)

  30. Experimental evaluation – Datasets • Sets of 106 distinct XPath queries • depth 5-15 • predicates 1-3 • wildcard probability 0.2 • descendant axis probability 0.2 • 1000 XML documents • depth 5-25

  31. Indexing throughput

  32. Filtering latency & notifications

  33. Load balancing I – 10 most loaded peers

  34. Load balancing II – storage overhead

  35. Network size

  36. Parameter l

  37. Cluster (4 predicates per query)

  38. Sum up & future work • Overcome weaknesses of distributed XML filtering systems • Described methods to combine both structural and value XML filtering in a distributed environment • Future work • ….

  39. Other research Atlas system • Distributed RDF query processing

  40. Thank you for your attention Questions? References I. Miliaraki, Z. Kaoudi and M. Koubarakis. XML Data Dissemination using Automata on top of Structured Overlay Networks. 17th International World Wide Web Conference (WWW 2008), Beijing, China, April 21-25, 2008. I. Miliaraki, and M. Koubarakis. Distributed Structural and Value XML Filtering. 4th ACM International Conference on Distributed Event-Based Systems (DEBS 2010), Cambridge, United Kingdom, July 12-15, 2010. I. Miliaraki and M. Koubarakis, FoXtrot: Distributed Structural and Value XML filtering. Journal paper. To be submitted to ACM TWEB.

More Related