Foxtrot distributed structural value xml filtering
Download
1 / 40

FoXtrot : Distributed Structural & Value XML Filtering - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

FoXtrot : Distributed Structural & Value XML Filtering. Iris Miliaraki * Department of Informatics and Telecommunications National and Kapodistrian University of Athens. * Supported by Microsoft Research through European PhD Scholarship Programme. Outline of the talk.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' FoXtrot : Distributed Structural & Value XML Filtering' - dane


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Foxtrot distributed structural value xml filtering

FoXtrot:Distributed Structural & Value XML Filtering

Iris Miliaraki*

Department of Informatics and Telecommunications

National and Kapodistrian University of Athens

*Supported by Microsoft Research through European PhD Scholarship Programme


Outline of the talk
Outline of the talk

  • XML Filtering scenario

  • FoXtrot system

    • Distributed structural matching

    • Distributed value matching

  • Experimental evaluation

  • Sum up and future work


Xml filtering scenario
XML Filtering scenario

Centralized

Distributed

XML

Filtering system

XPath/XQuery

?

YFilter

Index-Filter

Parallel/Hierarchical XTrie

Subscriber

Publisher

ONYX

Gong et al. [ICDE05]

FiST

XTrie

XPath/XQuery

?

Li et al. [ICDCS08]

XPush

Snoeren [SOSP 2001]

Publisher

Subscriber


Xml filtering scenario1
XML Filtering scenario

Mesh or tree-based overlays

XPath/XQuery

?

  • Load imbalances

Subscriber

Publisher

XPath/XQuery

?

  • Potential bottlenecks due to centralized control

Publisher

Subscriber


Foxtrot
FoXtrot

Filtering of XML data using structured overlay networks

  • Load balanced

XPath/XQuery

?

  • Scalable

DHT

Subscriber

Publisher

XPath/XQuery

?

  • Fully distributed

Publisher

Subscriber


Xml data model example
XML data model - example

<bib>

<article title=“XML Filtering” conf=“VLDB” year=“2007”>

<author institure=“Harvard”>

John Smith

</author>

</article>

</bib>

<bib>

<article title=“XML Filtering” conf=“VLDB” year=“2007”>

<authorinstiture=“Harvard”

John Smith

</author>

</article>

</bib>

<bib>

<article title=“XML Filtering” conf=“VLDB” year=“2007”>

<authorinstiture=“Harvard”

John Smith

</author>

</article>

</bib>

Value Matching

  • Structural Matching

Q1: /bib/*/author[text()="John Smith"]

Q2: /bib/phdthesis[@published=2005]/author[@nationality=greek]

Q3: /bib/article[@conf=www]

Q4: /bib/article[@year=2009]/author[@degree-from="UOA"]

Q1: /bib/*/author[text()="John Smith"]

Q2: /bib/phdthesis[@published=2005]/author[@nationality=greek]

Q3: /bib/article[@conf=www]

Q4: /bib/article[@year=2009]/author[@degree-from="UOA"]

Q1: /bib/*/author[text()="John Smith"]

Q2: /bib/phdthesis[@published>2005]/author[@nationality=greek]

Q3: /bib/article[@conf=www]

Q4: /bib/article[@year=2009]/author[@degree-from="UOA"]


Automata based approaches
Automata-based approaches

  • XFilter and YFilter, ONYX, XTrie, IndexFilter, FiST etc.

  • Main idea

    • Construct an automaton from a set of XPath/Xquery queries

    • Use it as a matching engine against the XML documents

Structural matching!


Example nfa yfilter

bib

title

Q3

Example NFA (YFilter)

Q1: /bib/phdthesis/year = ‘2010’

Q2: /bib/proceedings/school = ‘Univ. of Athens’

Q3: /bib/proceedings/title = ‘XML Dissemination’

Q1

year

3

Q4: /bib/*/author = ‘Michael Smith’

2

phdthesis

Q5: //*/cite [@id = 12743]

Q2

school

proceedings

1

4

5

*

6

0

Q4

author

7

8

*

ε

Q5

cite

9

10

11

*


Designing foxtrot
Designing FoXtrot

  • Moving to a distributed solution

    • Utilize automata-based techniques

    • Instead of a single centralized automaton, the automaton is shared by the DHT peers

  • Design and employ methods for filtering of XML data against a distributed automaton


Distributing the nfa on top of dht
Distributing the NFA on top of DHT

P10

11

P9

2

P1

10

FoXtrot

P8

P2

3

P3

P7

0

7

6

5

P4

P6

9

4

P5

1


Distributing the nfa on top of dht1
Distributing the NFA on top of DHT

P10

11

P9

2

P1

10

FoXtrot

P8

P2

3

P3

P7

0

7

6

5

P4

P6

9

4

P5

1

2

4

7


Distributing the nfa on top of dht2
Distributing the NFA on top of DHT

ℓ=1

ℓ=0

P10

11

2

3

P9

P1

11

10

FoXtrot

P8

P2

3

P3

P7

9

1

0

10

7

6

6

5

P4

P6

9

10

6

5

4

P5

2

4

7

1


Load balancing in foxtrot
Load balancing in FoXtrot

Static replication

  • Create a fixed number r of replicas for each state

  • Load previously suffered by 1 peer, will be now shared by r+1 peers


Load balancing in foxtrot cont
Load balancing in FoXtrot cont.

Assumption: the frequency f of visiting an NFA state during filtering is inversely proportional to the NFA depth d of this state.

Dynamic replication

  • Create r/d replicas for each state where d is the NFA depth of the state


Centralized nfa execution yfilter

bib

Centralized NFA Execution (YFilter)

These paths can be executed in parallel!

Incoming XML document

Start of document

<bib>

<proceedings>

<school> Univ. of Athens

</school>

<title > XML and DHTs

</title>

</proceedings>

</bib>

school

proceedings

title

*

End of document

ε

*

*

5 9 10

6 9 10

Runtime

stack

4 7 9 10

1 9 10

0


Distributed nfa execution iterative
Distributed NFA execution – Iterative

P10

P9

4

P1

P2

1

5

1 9 10

P8

2

P3

0

6

Publisher becomes overloaded!

3

Runtime

stack

7

P7

P4

P6

P5


Distributed nfa execution recursive
Distributed NFA execution – Recursive

P10

P9

P1

P2

1

2

P8

P3

3

4

P7

3

2

4

2

P4

Several parallel executions

P6

3

P5


Distributed nfa
Distributed NFA

Structural matching!

What about value matching?

Miliaraki, Z. Kaoudi and M. Koubarakis. XML Data Dissemination using automata on top of structured overlay networks. In WWW 2008.


What about value matching
What about value matching?

  • Automata-based approaches efficient for structural matching

  • Queries apart from defining a structural path also contain value-based predicates

    /bib/phdthesis[@year>2005]/author[@nationality=greek]

  • We want FoXtrot to scale for both the size of the query set and the number of predicates per query


Definitions
Definitions

  • Attribute predicates: element[@attr op value]e.g. /bib/phdthesis[@published=2007]

    • Textual predicates: element[text() op value]e.g. /bib/*/author[text()=“John Smith”]

So, how can we deal with value matching along with structural matching?


Direct evaluation with automaton trie
Direct evaluation with automaton/trie

author

3

  • Treat predicates as elements!

author

5

Q1: /dblp/phdthesis[@year=2005]/author[@nationality=greek]

year

author

nationality

conference

3

8

10

2

7

phdthesis

Huge increase of NFA states!

author

text()

bib

*

5

9

0

1

4

Q2: /bib/*/author[text()=Michael Smith]

article

Destroy sharing of path expressions!

conference

text()

7

11

6

Q3: /bib/article/conference[text()=WWW 2009]


Bottom up evaluation
Bottom-up evaluation

  • Common rule in relational query optimization  apply selections as early as possible

  • Works well for relational query processing

A lot of effort evaluating predicates while the structure may not be matched


Top down evaluation
Top-down evaluation

  • Check predicates after structural matching

depending on predicate selectivity

number of false positives may be very large


Step by step evaluation
Step-by-step evaluation

  • XPath queries consist of distinct steps

  • Each step contains one or more value-based predicates

  • Perform value matching with structural matching in a stepwise manner

Effort spent for evaluating predicates while the structure may not be fully matched


Moving on to details
Moving on to details

  • Parse XML document and generate a set of candidate predicates to perform predicate evaluation

Enriched parsing events

Candidate predicates

CP1:article[@title="XML Filtering"]

CP2:article[@conf=VLDB]

CP3:article[@year=2007]

CP4:author[text()="John Smith"]

CP5:author[@institute=Harvard]


Top down evaluation1

Delay value matching after structural matching

Top-down evaluation

  • Execute distributed NFA

  • Only check predicates if a final state is reached

  • Each peer uses a local index mapping predicates to the list of queries that contain them (hash index)


Example
Example

Q1: /bib/phdthesis[@published=2005]/author[@nationality=greek]

Q2: /bib/*/author[text()="John Smith"]

Q3: /bib/article[@conf=www]

Q4: /bib/article[@year=2009]/author[@degree-from="UOA"]

Q5: /bib/article[@year=2009]/cite[@paper-id=2392]

Q6: /bib/article/cite[@paper-id=2770]

Candidate predicates

CP1:article[@title="XML Filtering"]

CP2:article[@conf=VLDB]

CP3:article[@year=2007]

CP4:author[text()="John Smith"]

CP5:author[@institute=Harvard]

author

2

5

phdthesis

6

author

bib

article

0

1

3

cite

*

7

conference

4

8


Top down evaluation with pruning
Top-down evaluation with pruning

  • At each step of the execution, part of the NFA is revealed

  • Applies on equality predicates

2

5

IDEA: Use a compact summary of predicate information to stop NFA execution (prune)if we can deduce that no match can be found

phdthesis

6

author

bib

article

0

1

3

cite

*

7

conference

4

8


Experimental evaluation
Experimental evaluation

  • Implemented FoXtrot in Java using FreePastry release (http://freepastry.org)

  • Environment

    • 400 peers in PlanetLab (http://www.planet-lab.org/)

    • 112 peers in a local shared cluster (http://www.grid.tuc.gr)


Experimental evaluation datasets
Experimental evaluation – Datasets

  • Sets of 106 distinct XPath queries

    • depth 5-15

    • predicates 1-3

    • wildcard probability 0.2

    • descendant axis probability 0.2

  • 1000 XML documents

    • depth 5-25









  • Sum up future work
    Sum up & future work

    • Overcome weaknesses of distributed XML filtering systems

      • Described methods to combine both structural and value XML filtering in a distributed environment

    • Future work

      • ….


    Other research
    Other research

    Atlas system

    • Distributed RDF query processing


    Thank you for your attention
    Thank you for your attention

    Questions?

    References

    I. Miliaraki, Z. Kaoudi and M. Koubarakis. XML Data Dissemination using Automata on top of Structured Overlay Networks. 17th International World Wide Web Conference (WWW 2008), Beijing, China, April 21-25, 2008.

    I. Miliaraki, and M. Koubarakis. Distributed Structural and Value XML Filtering. 4th ACM International Conference on Distributed Event-Based Systems (DEBS 2010), Cambridge, United Kingdom, July 12-15, 2010.

    I. Miliaraki and M. Koubarakis, FoXtrot: Distributed Structural and Value XML filtering. Journal paper. To be submitted to ACM TWEB.


    ad