1 / 38

XML Data Dissemination using Automata on top of Structured Overlay Networks

This publication discusses the challenges of XML dissemination and proposes an automata-based approach for efficient data dissemination on structured overlay networks.

mcnairb
Download Presentation

XML Data Dissemination using Automata on top of Structured Overlay Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki Zoi Kaoudi Manolis Koubarakis Department of Informatics and Telecommunications National and Kapodistrian University of Athens

  2. Outline • XML Dissemination scenario • Problems • Background: DHTs • Our approach • Experiments • Future work

  3. Publication monitoring News monitoring XML XML XML XML XML XML Dissemination scenario Centralized Distributed Publisher Subscriber XPath/XQuery ? XML Dissemination system Parallel/Hierarchical XTrie Index-Filter ONYX YFilter XPath/XQuery ? Snoeren [SOSP 2001] XTrie FiST Subscriber XPush Publisher Gong et al. [ICDE05] Publisher

  4. XML XML XML XML XML XML Dissemination: Broker-based architecture • Mesh or tree-based overlays Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher

  5. Problems • Load imbalances

  6. XML XML XML XML XML XML Dissemination: Broker-based architecture • Systems like ONYX and work of Gong et al. [ICDE05] • Mesh or tree-based overlays Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher

  7. Problems • Load imbalances • Centralized control  Single point of failure and bottleneck

  8. XML XML XML XML XML XML Dissemination: Broker-based architecture • Systems like ONYX and work of Gong et al. [ICDE05] • Mesh or tree-based overlays Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher

  9. Problems • Load imbalances • Centralized control  Single point of failure and bottleneck • Scalability (size of routing tables)

  10. XML XML XML XML XML XML Dissemination: Broker-based architecture • Systems like ONYX and work of Gong et al. [ICDE05] • Mesh or tree-based overlays Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher

  11. Background: DHTs • Structured overlay networks • Solve the item location problem in a distributed and dynamic network of nodes (in O(log N) hops): • Let x be some data item. Find x! • Distributed version of hash tabledata structure • id=Hash(K) • Main operations: • Put: given a key (for a data item), map the key onto a node. • Get: Find the location of a data item with a given a key. • Successor peer → responsible peer

  12. XML XML XML XML XML XML Dissemination revisited:Structured overlay network architecture Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher

  13. Problems revisited • Load imbalances • Centralized control  Single point of failure and bottleneck • Scalability (size of routing tables)

  14. Automata-based approaches • XFilter and YFilter, ONYX, XTrie, IndexFilter, FiST etc. • Main idea • Construct an automaton from a set of XPath/Xquery queries • Use it as a matching engine against the XML documents

  15. dblp YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q1 year 3 2 phdthesis 1 0

  16. Q1 Q2 dblp phdthesis year school 3 5 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ 2 proceedings 1 4 0

  17. Q2 Q1 dblp phdthesis school year 3 5 title Q3 6 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ Q3: /dblp/proceedings/title = ‘XML Dissemination’ 2 proceedings 1 4 0

  18. Q4 Q2 Q1 dblp phdthesis author school year 5 8 3 title Q3 6 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’ 2 proceedings 1 4 * 0 7

  19. Q4 Q2 Q1 dblp phdthesis year author school 3 8 5 title Q3 6 Q5 cite * 11 10 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’ 2 Q5: //*/cite = [12743] proceedings 1 4 * 0 7 * ε 9

  20. Q4 Q2 Q1 dblp phdthesis year author school 3 8 5 title Q3 6 Q5 cite * 11 10 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’ 2 Q5: //*/cite = [12743] proceedings 1 4 * 0 7 * ε 9

  21. Main idea • Utilize a distributed version of a state-of-the-art approach YFilter • Instead of a centralized NFA • Distribute the NFA in the DHT

  22. Distributing the NFA on top of DHT P1 P2 P10 P9 P3 P8 P4 P7 P5 P6

  23. Distributing the NFA on top of DHT P1 P2 P10 P9 P3 P8 P4 P7 P5 P6

  24. Distributing the NFA on top of DHT ℓ=0 ℓ=1 P1 P2 P10 P9 P3 P8 P4 P7 P5 P6

  25. Distributing the NFA on top of DHT P1 P10 P2 P9 P3 P8 P4 P7 P5 P6

  26. dblp YFilter - NFA Execution These paths can be executed in parallel! Incoming XML document Start of document <dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings> </dblp> proceedings school * End of document title ε 5 9 10 6 9 10 * 4 7 9 10 Runtime stack * 1 9 10 0

  27. Distributed NFA execution – Iterative Incoming XML document <dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings> </dblp> Start of document 6 9 10 5 9 10 Publisher 4 7 9 10 1 9 10 P1 P2 0 P10 P9 P3 End of document P8 Publisher becomes overloaded! P4 P7 P5 P6

  28. 4 4 1 1 0 0 Distributed NFA execution - Recursive Incoming XML document Start of document <dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings> </dblp> Publisher 10 1 9 10 9 10 P1 P2 0 P10 0 P9 P3 End of document 9 10 P8 9 P4 0 6 P7 P5 4 7 P6 5 4 7 1 1 4 1 4 0 0 1 1 0 0 0

  29. Experimental evaluation • Chord simulator • 2 different document workloads • Aggregated • Including DBLP, NITF, ebXML, Auction (XMark) • NITF • 2 kinds of query sets • Random • Distinct

  30. Metrics • Network traffic • total number of messages • Latency • longest chain of hops • Filtering load • number of messages received during execution

  31. Iterative vs Recursive

  32. Varying number of queries – Network traffic

  33. Varying number of queries - Latency

  34. Load balancing • Virtual peers • Originally proposed in Chord • Mapping of multiple virtual peers to each real peer • Load-shedding • Replicate on demand

  35. Load balancing – Filtering load

  36. Conclusions • DHT-based protocols overcoming weaknesses of broker-based architectures • Utilize a distributed YFilter engine • Exploit inherent parallelism of an automaton • Experimental evaluation

  37. Future Work • Implementation and experimenting on an Internet-scale testbed like PlanetLab • More sophisticated methods for predicate evaluation

  38. Thank you for your attention Questions?

More Related