1 / 47

BLAS: An Efficient XPath Processing System Chen Y., Davidson S., Zheng Y.

BLAS: An Efficient XPath Processing System Chen Y., Davidson S., Zheng Y. Νίκος Λούτας. Outline. Problem being addressed in the paper Related work BLAS Experimental Results Evaluation. Problem.

drew
Download Presentation

BLAS: An Efficient XPath Processing System Chen Y., Davidson S., Zheng Y.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BLAS: An Efficient XPath Processing SystemChen Y., Davidson S., Zheng Y. Νίκος Λούτας

  2. Outline • Problem being addressed in the paper • Related work • BLAS • Experimental Results • Evaluation

  3. Problem • Number of disk accesses and joins is the primary bottleneck for evaluating complex queries efficiently!

  4. Motivation • Can we improve XPath processing which uses relational technology? • D-labeling • Processes descendant axis traversal using a single join rather than a transitive closure of joins. • Observation: D-labeling processes / and // in the same way using joins. • XPRESS – queriable compressed XML files • Reverse arithmetic encoding • A label path as a distinct interval in[0.0, 1.0) • Handling of path expressions : containment relationships

  5. Goals • Process / (simple path expressions) more efficiently • Reduce the number of disk accesses and joins • Optimize the join operations

  6. Outline • Problem being addressed in the paper • Related work • BLAS • Experimental Results • Evaluation

  7. Related work • XML storage and query processing • Store XML data naively as a file • The whole file needs to be traversed whenever a query is processed  not efficient for large XML data sets • Store XML using a commercial RDBMS • Indexing, query processing capabilities

  8. Related work (cont’d) • XML storage and query processing • An XML document as a graph  generate a tuple for every edge • Simple, general and automatic generation of XML query – SQL mapping • An XML query may involve many self-joins • Self-joins can be eliminated by inlining the distinct child information into the parent tuple  complex XML query – SQL mapping Problem:In all above approaches, wetypically need to rely on auxiliary code in a general-purpose programminglanguage together with SQL to express an XML query

  9. Related work (cont’d) • Indexing • Structural indexes create a structural summary which is extracted from the XML document as a directed graph  queries evaluated by pruning the search space • Path / tree queries • Indexing for branching path queries  restrict the class of queries indexed to achieve performance benefits • Materialized views

  10. Related work (cont’d) • Labeling • D-labeling • Build minimum label size D-labels • Build a B+ tree over D-labels to support tree queries • Effective for translating XQuery to SQL • XPRESS  an XML data compression technique which uses reverse arithmetic encoding to encode label paths as a distinct interval within [0.0,1). Furthermore, it supports query evaluation over the compressed document using the containment relationship among the intervals.

  11. Outline • Problem being addressed in the paper • Related work • BLAS • Experimental Results • Evaluation

  12. Bi-LAbeling based System (BLAS) • Based on D-labeling and P-labeling • Process XPath queries which can be represented as trees • Index generator  stores D-labeling, P-labeling, data values of an XML document • Query engine  RDBMS or twig join

  13. BLAS (cont’d) • Query translator • Decomposes an XPath query into a set of suffix path queries • encodes each suffix path query using P-labeling • generates a corresponding SQL query for each suffix path query • composes the SQL subqueries into a complete SQL query plan using D-labeling

  14. Subquery Suffix Path Query Subquery Generator (based on P-labeling) Query … … XPath Query Query decomposition Subquery composition (based on D-labeling) Subquery Suffix Path Query Ancestor-descendant relationship between the results of the suffix path queries Query Translator Query Engine P-labeling generator P-labelings SAX Parser XML Events Storage Data values Query result Data loader D-labeling generator D-labelings Architecture of BLAS

  15. BLAS: D-labeling • A D-label of an XML node is a triplet <d1,d2,d3>, such that for any two nodes n and m, n ≠ m: • n.d1 ≤ n.d2 (validation) • m is a descendant of n, if and only if n.d1 < m.d1 and n.d2 > m.d2 (descendant) • m is a child of n, if and only if m is a descendant of n and n.d3 + 1 = m.d3 (child) • n and m have no ancestor-descendant relationship, if and only if n.d2 < m.d1 and n.d1 > m.d2 (nonoverlap)

  16. BLAS: D-labeling (cont’d) • Where for a node n: • d1  the position of the start tag of n in the XML document • d2  the position of the end tag of n in the XML document • d3  level of n in the XML trees

  17. BLAS: D-labeling (cont’d) • Descendant axis query //t1//t2 • Retrieve all the nodes reachable by t1 and t2  two lists, l1 and l2 • Test for ancestor-descendant relationships between nodes in l1 and in l2 (D-join) • //proteinDatabase//refinfo, pDB and refinfo  relations which store node tagged by proteinDatabase and refinfo • Select pDB.start, pDB.end, refinfo.start, refinfo.end • From pDB, refinfo • Where pDB.start < refinfo.start and pDB.end > refinfo.end

  18. The labeling (start, end, level) can be used to detect ancestor-descendant relationships between nodes in a tree. books ... (1, 20000, 1) book (6, 1200, 2) (10,80,3) (81, 250,3) ... title section “The lord of the rings …” (100, 200,4) section title “Locating middle-earth” ... title figure “A hall fit for a king” description “King Theoden's golden hall” D-labeling scheme

  19. BLAS: P-labeling • Efficiently process consecutive child axis steps (suffix path query) • A P-label for a suffix path P is an interval IP =< p1, p2 >, such that for any two suffix path expressions P, Q: • P.p1 ≤ P.p2(Validation ) • P  Q if and only if interval IP is contained inIQ, i.e. Q.p1 ≤ P.p1 and Q.p2 ≤ P.p2(Containment) • P Q = , if and only if IP and IQ do notoverlap, i.e. P.p1 > Q.p2 or P.p2 < Q.p1(Nonintersection)

  20. BLAS: P-labeling (cont’d) • For an XML node n, such that SP(n) =< p1, p2 >,the P-label for this XML node,denoted as n.plabel, is the integerp1 • Findall nodes n such that Q.p1 ≤ SP(n).p1≤ Q.p2and evaluate suffixpath query Q by obtaining the set of XML nodes whose P-labelsare contained in the P-label of Q • [[Q]] = {n | Q.p1 ≤n.plabel≤ Q.p2 }

  21. BLAS: Intuition for P-labels • Assign each node a number, and each suffix path an interval such that: • For any two suffix paths Q1 and Q2, Q1contained in Q2 iff Q1’s interval is contained in Q2’s • A node is contained in the suffix path iff its number is contained in the path interval. • Replaces a sequence of joins by a selection.

  22. BLAS: P-labeling Construction • For paths • For XML Trees • Assign / ratio r0 and each tag ratio ri = 1 / (n+1) • Define domain [0,m-1], m  (n + 1)h • Construct P-labels for suffix paths • Assign // an interval of <0, m-1> • Partition the interval I tag order proportional to ti’s ri • allocate < 0, p1 > to suffix paths starting with /, and < pi, pi+1 - 1 > to suffix paths starting with //ti • Partition over each subinterval of path //ti by tags according to their ratios.

  23. /books/book ... 2.11*103 2.1*104 2.2*104 //books/book /book //book/book ... 3*104 2*104 2.1*104 2.2*104 2.3*104 //book //title //section / //books ... 104 2*104 3*104 4*104 5*104 105 0 BLAS: Constructing P-label for paths

  24. BLAS: P-labeling Construction (cont’d) • m = 1012 and99 tags • Each tag is assigned a r = 0.01 • construct a P-label for suffix path • P= /ProteinDatabase/ProteinEntry/protein/name

  25. Sample XML Protein Repository

  26. BLAS: Constructing P-label for XML nodes (cont’d) books ... P-label of an XML node: m, where the P-label for the path from root is [m,n] book ... title section 42100 E.g. /books/book/section: [42100, 42110] “The lord of the rings …” section title “Locating middle-earth” ... Evaluating a suffix path query Q  finding all nodes whose P-label is contained in the P-label of Q title figure “A hall fit for a king” description “King Theoden's golden hall”

  27. BLAS: Query Language • XPath queries containing /, //, *, and predicates (branches)  tree queries • The evaluation of a path expression P returns the set of nodes [[P]] in an XML tree T which are reachable by P starting from the root of T • A source path SP(n) of a node n in an XML tree T, is the unique simple path P from the root to itself. • A path expression P is contained in a path expression • Q, P  Q, if and only if for any XML tree T [[P]]  [[Q]] • Path expressions P and Q are non-overlapping,P  Q = , if and only if for any XML tree T, [[P]]  [[Q]] = 

  28. BLAS: Query Translator • Split • Steps: • Descendent axis elimination • Branch elimination • Dfs traversal • p//q  p and //q • D-elimination – D-join

  29. BLAS: Query Translator: (I) Decomposition book section title figure Q: //book[//title]/section/figure

  30. title BLAS: Query Translator: (I) Decomposition (cont’d) book book section figure Q: //book[//title]/section/figure

  31. title BLAS: Query Translator: (I) Decomposition (cont’d) book section figure Q: //book[//title]/section/figure

  32. title BLAS: Query Translator: (I) Decomposition (cont’d) book book section figure Q: //book[//title]/section/figure

  33. title BLAS: Query Translator: (II) Selection on P-labels book book section figure Q: //book[//title]/section/figure

  34. title BLAS: Query Translator: (III) Join on D-labels book book section figure Q: //book[//title]/section/figure

  35. BLAS: Query Translator - Push-up • Used when schema information is absent • Descendent axis elimination • Push-up branch elimination • P[q1…qn]/r  p, p/q1, …, p/qn, p/r

  36. BLAS: Query Translator - Unfold • Used when schema information is present • Both non-recursive and recursive schemas • replace D-joins with a process that first performs selections on P-labels and then unions the results  very efficient • selections using an index are cheap • the union is very simple since there are no duplicates • subqueries are all simple path queries, which can be implemented as a select operation with equality predicates • reduce the number of disk accesses

  37. BLAS: Query Translator – Unfold (cont’d)

  38. BLAS: Comparison with D-labeling book book book section title section title figure figure BLAS D-labeling BLAS: Fewer joins, fewer disk accesses

  39. Outline • Problem being addressed in the paper • Related work • BLAS • Experimental Results • Evaluation

  40. Experiment Setup • Data sets • Query sets • Suffix path queries • Path queries • XPath queries • Benchmark queries • Query Engine: TwigStack Join

  41. Query Execution Time Query Name: A:Auction P: Protein S: Shakespeare 1: suffix path query 2: path query 3: XPath query

  42. Number of data elements visited Query Name: A:Auction P: Protein S: Shakespeare 1: suffix path query 2: path query 3: XPath query

  43. Benchmark Query Execution Time

  44. Scalability BLAS

  45. Outline • Problem being addressed in the paper • Related work • BLAS • Experimental Results • Evaluation

  46. Contributions • P-labeling scheme is proposed to evaluate suffix path queries efficiently. • BLAS combines P-labeling and D-labeling to evaluate XPath queries. • BLAS is more efficient than state-of-the-art work because the queries translated from XPath queries require: • fewer disk accesses • fewer joins • Experiments show the effectiveness of BLAS

  47. Evaluation • Successful effort • Trade off between additional cost and execution time • BLAS vs RDBMS ?

More Related