 Download Presentation Web Data Management

# Web Data Management - PowerPoint PPT Presentation Download Presentation ## Web Data Management

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Web Data Management Path Expressions

2. In this lecture • Path expressions • Regular path expressions • Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1

3. Path Expressions Examples: • Bib.paper • Bib.book.publisher • Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects

4. Path Expressions Bib &o1 Examples: DB = paper paper book references &o12 &o24 &o29 references references author page author year author title http title title publisher author author author &o43 &25 &o44 &o45 &o46 &o52 &96 1997 &o51 &o50 &o49 &o47 &o48 last firstname firstname lastname first lastname &o70 &o71 &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}

5. Answer of a Path Expression Answer(P, DB) = f(P, root(DB)) Where: f(e, x) = {x} f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)} Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): • PTIME complexity

6. Regular Path Expressions R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: • Bib.(paper|book).author • Bib.book.author.lastname? • Bib.book.(references)*.author • Bib.(_)*.zip

7. Applications of Regular Path Expressions • Navigating uncertain structure: • Bib.book.author.lastname? • Syntactic substitution for inheritance: • Bib.(paper|book).author • Better: Bib.publication.author, but we don’t have inheritance

8. Applications of Regular Path Expressions • Computing transitive closure: • Bib.(_)*.zip = everything accessible • Bib.book.(references)*.author = everything accessible via references • Some regular expressions of doubtful practical use: • (references.references)* = a path with an even number of references • (_._)* = paths of even length • (_._._.(_)?)* = paths of length (3m + 4n) for some m,n • But make great examples for illustration 

9. Answer of a Regular Path Expression Recall: • Lang(R) = the set of words P generated by R Answer of regular path expressions: • Answer(R,DB) = {Answer(P,DB) | P  Lang(R)} Need an evaluation algorithm that copes with cycles

10. Regular Path Expressions Recall: each regular expression  NDFA Example: R = (a.a)*.a.b A = a states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4} s1 s2 a a b s3 s4

11. Regular Path Expressions Canonical Evaluation Algorithm • Answer(R,DB): • construct A from R • construct product automaton G = A x DB: • nodes(G) = states(A) x nodes(db) • edges(G) = {((s,x),L,(s’,x’) | (s,L,s’)  edges(A), (x,L,x’)  edges(DB)} • root(G) = (initial(A), root(DB)) • compute Gacc = set of nodes accessible from root(G) • return {x | s  terminal(A) s.t. (s,x)  Gacc}

12. _ &o1 s1 s2 s3 a a _ &o2 a a &o3 &o4 b Regular Path Expressions Example: R = _.(_._)*.a A = DB = Answer of R on DB = { &o2, &o3}

13. Compute Product Automaton G _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b

14. Compute Accessible Part Gacc _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b Answer(R,DB) = {&o2, &o3}

15. Complexity of Regular Path Expressions • The evaluation algorithm runs in PTIME in size(R), size(DB) • Even when there are cycles in DB