1 / 26

Selectivity Estimation of XPath for Cyclic Graphs

Selectivity Estimation of XPath for Cyclic Graphs. Yun Peng. Outline. Motivation Problem definition Prime number labeling Selectivity estimation Implementation. Motivation.

Download Presentation

Selectivity Estimation of XPath for Cyclic Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selectivity Estimation of XPath for Cyclic Graphs Yun Peng

  2. Outline • Motivation • Problem definition • Prime number labeling • Selectivity estimation • Implementation

  3. Motivation • To retrieve sub graphs from large graph databases efficiently, selectivity estimation is one of the most important query optimization technologies

  4. An Example • Query q=//faculty[//RA][//TA] means to list all faculties that have both RA and TA • To evaluate this query, we have two evaluation plans • One plan • Find out faculties having RA. Result set size is 3. • Find out faculties having TA from the intermediate results • Another plan • Find out faculties having TA. Result set size is 2. • Find out faculties having RA from the intermediate results

  5. Problem Definition • Selectivity estimation is that given a query, estimate how many results are produced by this query without costly evaluation q=//faculty[//RA] Selectivity(q) = 3

  6. Our methodology skeleton • Step1: label the graph nodes (pre-prepared) • Step2: Estimate query selectivity based on the pre-prepared labels (after a query comes)

  7. Prime number labeling • Label each graph node with an integer, which is production of some prime numbers

  8. Prime number labeling (cont.) • Divisibility of labels implies ancestor-descendent relationship For example, 3*5*7*11 is divisible by 11, so node g is descendent of node a

  9. Optimization • Replace integers by vectors

  10. Optimization (cont.) implies node b is descendent of node a

  11. Our methodology skeleton • Step1: label the graph nodes (pre-prepared) • Step2: Estimate query selectivity based on the pre-prepared labels (after a query comes)

  12. Selectivity Estimation • Two dimensional histogram • Originally designed for selectivity estimation on trees [Jargadish 2004] • Label each tree node by an interval, e.g. (l, r) • Represent the interval by a dot (l, r) on the XOY coordination system • Partition the XOY plain to grids as buckets • Estimate results using this histogram

  13. Selectivity Estimation (cont.)

  14. Optimization [0,3] [0,1] Replace integers by vectors

  15. Consecutive Ones Property Matrix • Given a 0/1 matrix, if we can find an order of columns such that all row’s 1s are consecutive, this matrix is called consecutive ones property matrix (C1P matrix) • Reorganization is linear • Find the largest C1P sub matrix is NP and if 1s number of each column is larger than 3, it is un- polynomial time approximatable

  16. Add extra columns

  17. Add extra columns • Given a 0/1 matrix, add minimum number of extra columns such that result matrix is a C1P matrix is NP?

  18. Heuristic algorithm • Duplicate • Merge

  19. Heuristic algorithm (cont.)

  20. Heuristic Algorithm (cont.)

  21. Selectivity Estimation (cont.)

  22. Implementation

  23. Implementation

  24. Implementation

  25. Thanks!

More Related