managing xml and semistructured data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Managing XML and Semistructured Data PowerPoint Presentation
Download Presentation
Managing XML and Semistructured Data

Loading in 2 Seconds...

play fullscreen
1 / 15

Managing XML and Semistructured Data - PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on

Managing XML and Semistructured Data. Lecture 4: Path Expressions. Prof. Dan Suciu. Spring 2001. In this lecture. Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1. Path Expressions. Examples:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Managing XML and Semistructured Data' - oriana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
managing xml and semistructured data

Managing XML and Semistructured Data

Lecture 4: Path Expressions

Prof. Dan Suciu

Spring 2001

in this lecture
In this lecture
  • Path expressions
  • Regular path expressions
  • Evaluation techniques

Resources:

Data on the Web Abiteboul, Buneman, Suciu : section 4.1

path expressions
Path Expressions

Examples:

  • Bib.paper
  • Bib.book.publisher
  • Bib.paper.author.lastname

Given an OEM instance, the answer of a path expression p is a set of objects

path expressions1

Bib

&o1

paper

paper

book

references

&o12

&o24

&o29

references

references

author

page

author

year

author

title

http

title

title

publisher

author

author

author

&o43

&25

&o44

&o45

&o46

&o52

&96

1997

&o51

&o50

&o49

&o47

&o48

last

firstname

firstname

lastname

first

lastname

&o70

&o71

&243

&206

“Serge”

“Abiteboul”

“Victor”

122

133

“Vianu”

Path Expressions

Examples:

DB =

Bib.paper={&o12,&o29}

Bib.book.publisher={&o51}

Bib.paper.author.lastname={&o71,&206}

answer of a path expression
Answer of a Path Expression

Simple evaluation algorithms for Answer(P,DB):

Runs in PTIME in size(P), size(db):

  • PTIME complexity

Answer(P, DB) = f(P, root(DB))

Where:

f(e, x) = {x}

f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)}

regular path expressions
Regular Path Expressions

R ::= label | _ | R.R | (R|R) | R* | R+ | R?

Examples:

  • Bib.(paper|book).author
  • Bib.book.author.lastname?
  • Bib.book.(references)*.author
  • Bib.(_)*.zip
applications of regular path expressions
Applications ofRegular Path Expressions
  • Navigating uncertain structure:
    • Bib.book.author.lastname?
  • Syntactic substitute for inheritance:
    • Bib.(paper|book).author
    • Better: Bib.publication.author, but we don’t have inheritance
applications of regular path expressions1
Applications ofRegular Path Expressions
  • Computing transitive closure:
    • Bib.(_)*.zip

= everything accessible

    • Bib.book.(references)*.author

= everything accessible via references

  • Some regular expressions of doubtful practical use:
    • (references.references)*

= a path with an even number of references

    • (_._)* = paths of even length
    • (_._._.(_)?)* = paths of length (3m + 4n) for some m,n
  • But make great examples for illustration 
answer of a regular path expression
Answer of aRegular Path Expression

Recall:

  • Lang(R) = the set of words P generated by R

Answer of regular path expressions:

  • Answer(R,DB) = {Answer(P,DB) | P  Lang(R)}

Need an evaluation algorithm that copes with cycles

regular path expressions1
Regular Path Expressions

Recall: each regular expression  NDFA

Example: R = (a.a)*.a.b

A =

a

states(A) = {s1,s2,s3,s4}

initial(A) = s1

terminal(A) = {s4}

s1

s2

a

a

b

s3

s4

regular path expressions2
Regular Path Expressions

Canonical Evaluation Algorithm

  • Answer(R,DB):
  • construct A from R
  • construct product automaton G = A x DB:
    • nodes(G) = states(A) x nodes(db)
    • edges(G) = {((s,x),L,(s’,x’) | (s,L,s’)  edges(A), (x,L,x’)  edges(DB)}
    • root(G) = (initial(A), root(DB))
  • compute Gacc = set of nodes accessible from root(G)
  • return {x | s  terminal(A) s.t. (s,x)  Gacc}
regular path expressions3

_

&o1

s1

s2

s3

a

a

_

&o2

a

a

&o3

&o4

b

Regular Path Expressions

Example: R = _.(_._)*.a

A = DB =

Answer of R on DB = { &o2, &o3}

compute product automaton g
Compute Product Automaton G

_

a

_

s3,&o1

s1,&o1

s2,&o1

a

a

a

s3,&o2

s1,&o2

s2,&o2

a

a

a

a

a

a

s3,&o3

s3,&o4

s1,&o3

s1,&o4

s2,&o3

s2,&o4

b

b

b

compute accessible part g acc
Compute Accessible Part Gacc

_

a

_

s3,&o1

s1,&o1

s2,&o1

a

a

a

s3,&o2

s1,&o2

s2,&o2

a

a

a

a

a

a

s3,&o3

s3,&o4

s1,&o3

s1,&o4

s2,&o3

s2,&o4

b

b

b

Answer(R,DB) = {&o2, &o3}

complexity of regular path expressions
Complexity of Regular Path Expressions
  • The evaluation algorithm runs in PTIME in size(R), size(DB)
  • Even when there are cycles in DB