1 / 44

Containment of Partially Specified Tree-Pattern Queries

Containment of Partially Specified Tree-Pattern Queries. Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE). Introduction. r. GREECE. USA. ATHENS. YAMAHA. BMW. HONDA. YAMAHA. BMW.

upton
Download Presentation

Containment of Partially Specified Tree-Pattern Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Containment of Partially Specified Tree-Pattern Queries Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE)

  2. Introduction

  3. r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW Motivating example • Consider a database including shops that sell spare parts for motorbikes. • The authors of the paper need spare parts for their motorbikes. • BUT… Stefanos Souldatos - SSDBM 2006

  4. r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW 1: Structural Differences • Dimitri Theodoratos lives in NY. He owns a Yamaha Serrow motorbike in Greece. So, he searches for spare parts in Greece or USA. • A structural difference emerges… ? Stefanos Souldatos - SSDBM 2006

  5. r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW 2: Structural Inconsistencies • Theodore Dalamagas owns a BMW motorbike and looks for spare parts worldwide. • A structural inconsistency emerges… 650cc/F650GS vs F650GS/650cc Stefanos Souldatos - SSDBM 2006

  6. r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW 3: Unknown Structure • Stefanos Souldatos owns a Honda Varadero. • But, he is not fully aware of the hierarchy... Stefanos Souldatos - SSDBM 2006

  7. r r r GREECE GREECE GREECE USA USA USA ATHENS ATHENS ATHENS YAMAHA YAMAHA YAMAHA BMW BMW BMW HONDA HONDA HONDA YAMAHA YAMAHA YAMAHA BMW BMW BMW ON-OFF ON-OFF ON-OFF TRAVEL TRAVEL TRAVEL TRAVEL TRAVEL TRAVEL ON-OFF ON-OFF ON-OFF TRAVEL TRAVEL TRAVEL 200cc 200cc 200cc F650GS F650GS F650GS 650cc 650cc 650cc VARADERO VARADERO VARADERO 200cc 200cc 200cc 650cc 650cc 650cc SERROW SERROW SERROW F650GS F650GS F650GS F650 F650 F650 NJ NJ NJ 125cc 125cc 125cc 1000cc 1000cc 1000cc SERROW SERROW SERROW 4: Source Integration • Pawel Placek wants to buy a motorbike that he can easily find spare parts for. • He searches in many hierarchies with different structure… Stefanos Souldatos - SSDBM 2006

  8. Motivation • Querying tree-structured data requires exact knowledge of the structure. • BUT, structure is not always strictly defined! • MOREOVER, the user needs to set queries without always dealing with structure: • Find shops for Honda spare parts in Athens, Greece • …where Athens is a descendant of Greece • …but I don’t know if Athens is an ancestor or a descendant of Honda. Stefanos Souldatos - SSDBM 2006

  9. Previous Approaches • Keyword-based search approach • Total absence of structure. • Approximation techniques • Relax the initial query and retrieve more answers. • Naive approach • All possible query patterns are generated and evaluated against the tree structure. • Traditional integration approach • Global structure and mapping rules. Stefanos Souldatos - SSDBM 2006

  10. Our Approach • We have designed a QueryLanguage that allows for partial specification of the structure (Partially Specified Tree-Pattern Queries). • Query formulation and evaluation is based on DimensionGraphs that summarize the structure of the trees. • Dimension graphs group sets of semantically related nodes called Dimensions. • We study the problem of Query Containment for Partially Specified Tree-Pattern Queries. Stefanos Souldatos - SSDBM 2006

  11. Data Model

  12. R C L E B M T r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW Dimension Graphs • Dimension graphs summarize the structure of the tree. DIMENSIONS R (oot) C (ountry) = {GREECE, USA} L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

  13. R C L E B M T Dimension Graphs… • Offer a summary of the structure of the tree. • Provide the necessary semantics for query formulation. • Set the framework for querying sources with structural differences and inconsistencies. • Support query evaluation and optimization. DIMENSIONS R (oot) C (ountry) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

  14. R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T Partially Specified Tree-pattern Queries • Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. DIMENSIONS R (oot) C (ountry) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

  15. R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Partially Specified Tree-pattern Queries • Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. DIMENSIONS R (oot) C (ountry) partially specified paths (PSP) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

  16. R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Partially-Specified Tree-pattern Queries • Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. DIMENSIONS R (oot) C (ountry) output path (*) partially specified paths (PSP) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

  17. R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Partially-Specified Tree-pattern Queries • Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. parent child ancestor descendant DIMENSIONS R (oot) C (ountry) output path (*) partially specified paths (PSP) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

  18. R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Partially-Specified Tree-pattern Queries node sharing expression (NSE) • Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. parent child ancestor descendant DIMENSIONS R (oot) C (ountry) output path (*) partially specified paths (PSP) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

  19. Dimension Trees and Query Evaluation

  20. R C = {Greece} C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Dimension Trees • We apply Inference Rules to the query to produce its Full Form. • We search for paths in the graph that match the paths of the query. • Paths in the graph correspond to Dimension Trees. • Dimension trees are translated into XPath queries. Stefanos Souldatos - SSDBM 2006

  21. R C = {Greece} C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Dimension Trees • We apply Inference Rules to the query to produce its Full Form. • We search for paths in the graph that match the paths of the query. • Paths in the graph correspond to Dimension Trees. • Dimension trees are translated into XPath queries. Stefanos Souldatos - SSDBM 2006

  22. R C = {Greece} C = {Greece} R C L B = {BMW} B = {BMW} C = {Greece} E B B = {BMW} M = ? E = ? M T T PSP p1 PSP *p2 E M Dimension Trees • We apply Inference Rules to the query to produce its Full Form. • We search for paths in the graph that match the paths of the query. • Paths in the graph correspond to Dimension Trees. • Dimension trees are translated into XPath queries. Stefanos Souldatos - SSDBM 2006

  23. R C = {Greece} C = {Greece} R C L B = {BMW} B = {BMW} C = {Greece} E B B = {BMW} M = ? E = ? M T T PSP p1 PSP *p2 E M Dimension Trees • We apply Inference Rules to the query to produce its Full Form. • We search for paths in the graph that match the paths of the query. • Paths in the graph correspond to Dimension Trees. • Dimension trees are translated into XPath queries. r/Greece/BMW/*T[*E]/*M Stefanos Souldatos - SSDBM 2006

  24. R C = {Greece} C = {Greece} R C L B = {BMW} B = {BMW} C = {Greece} E B B = {BMW} M = ? E = ? M T T PSP p1 PSP *p2 E M Dimension Trees So, Dimension Trees are equivalent to the Query in a specific Dimension Graph: “Dimension Trees = Query + Graph” r/Greece/BMW/*T[*E]/*M Stefanos Souldatos - SSDBM 2006

  25. Query Containment

  26. Query Containment • Absolute Containment • Query Q1 is absolutely contained in query Q2 if the answers of Q1 in every dimension graph are also answers of Q2. • Relative Containment (with respect to graph) • Query Q1 is contained in query Q2 with respect to graph G if the answers of Q1 in G are also answers of Q2 in G. Stefanos Souldatos - SSDBM 2006

  27. Query Containment • Absolute Containment • Query Q1 is absolutely contained in query Q2 if the answers of Q1 in every dimension graph are also answers of Q2. • Relative Containment (with respect to graph) • Query Q1 is contained in query Q2 with respect to graph G if the answers of Q1 in G are also answers of Q2 in G. Stefanos Souldatos - SSDBM 2006

  28. C = ? C = ? C = ? C = ? B = ? B = ? E = ? B = ? M = ? E = ? PSP *p3 PSP p4 PSP *p1 PSP p2 Checking Absolute Containment • Query Q1 is absolutely contained in query Q2 if there is a mapping from Q2 to Q1.  Q1 Q2 Stefanos Souldatos - SSDBM 2006

  29. Checking Relative Containment • Query Q1 is contained in query Q2 with respect to graph G if every dimension tree of Q1 is contained in a dimension tree of Q2.  A dimension tree of Q1 A dimension tree of Q2 R R C C B B T T E M E Stefanos Souldatos - SSDBM 2006

  30. Absolute & Relative Containment • Checking Absolute Containment: • 1msec • Checking Relative Containment: • 100msec – 1sec • We suggest a technique that improves the performance of Relative Containment check. (Relative Containment 2 – RC2) • It is a sound but not complete technique • However, it maintains high accuracy for containment check. Stefanos Souldatos - SSDBM 2006

  31. R C L E B M T Relative Containment 2 (RC2) • For each possible parent-child or ancestor-descendant relation in the query, we extract relations that hold in all graph paths that include it: R=>C : RC LB : RC, CL • We add these relations in Q1 and check absolutely whether Q1’ Q2. Stefanos Souldatos - SSDBM 2006

  32. Example • Q1 is NOT absolutely contained in Q2. • But, Q1 is relatively contained in Q2. Q1 Q2 C = ? L = ? L = ? B = ? PSP *p1 PSP *p2 Stefanos Souldatos - SSDBM 2006

  33. Example • Q1 is NOT absolutely contained in Q2. • But, Q1 is relatively contained in Q2. Relative Containment 2 LB : RC, CL G Q1 Q2 R = ? C = ? C = ? L = ? L = ? B = ? PSP *p1 PSP *p2 Stefanos Souldatos - SSDBM 2006

  34. Experiments

  35. Experiments • We measured… • execution time for checking Absolute Containment, Relative Containment and Relative Containment 2 (RC2) • the accuracy of RC2 • …varying • the density of the dimension graphs (dimensions, root-to-leaf paths) • the size of the queries (PSPs, nodes per PSP) Stefanos Souldatos - SSDBM 2006

  36. Varying Graphs RC2 accuracy > 80% RC2 accuracy > 50% 30 dimensions Relative Containment Relative Containment 2 time (sec) Absolute Containment Stefanos Souldatos - SSDBM 2006 graph paths

  37. Varying Queries RC2 accuracy > 93% 2 PSPs Relative Containment time (sec) Relative Containment 2 Absolute Containment Stefanos Souldatos - SSDBM 2006 dimensions per PSP

  38. Conclusionand Future Work

  39. Conclusion • We addressed the problem of Query Containment for Partially Specified Tree-Pattern Queries (PSTPQs). • We suggested a soundtechnique for checking Relative Query Containment. • Experiments showed that the technique improves checking of relative containment • one order of magnitude • with accuracy over 80% for graphs of common sizes (based on XML benchmarks e.g. XMark, XMach etc). Stefanos Souldatos - SSDBM 2006

  40. Future Work • Suggest and experiment with a set of heuristics for checking Relative Query Containment. • Discuss the trade-off between time and accuracy for each of those heuristics. Stefanos Souldatos - SSDBM 2006

  41. Questions?

  42. Links Introduction (2-10) Data Model (11-18) Query Evaluation (19-24) Query Containment (25-32) Experiments (33-36) Conclusion – Future Work (37-39) Stefanos Souldatos - SSDBM 2006

  43. Appendix

  44. R R C = {Greece} C = {Greece} R C = {Greece} C L B = {BMW} B = {BMW} C = {Greece} B = {BMW} E B B = {BMW} M = ? E = ? T M T T R R PSP p1 PSP *p2 M E C = {Greece} C = {Greece} M E B = {BMW} B = {BMW} T T E M E M E M Dimension Trees r/Greece/BMW/ *T[*E]/*M r/Greece/BMW/ *T/*M [*E] r/Greece/BMW/ *T/*E/*M r/Greece/BMW/ *T[*M/*E]/*E*M Stefanos Souldatos - SSDBM 2006

More Related