Selectivity Estimation of XPath for Cyclic Graphs

Selectivity Estimation of XPath for Cyclic Graphs Yun Peng

Outline • Motivation • Problem definition • Prime number labeling • Selectivity estimation • Implementation

Motivation • To retrieve sub graphs from large graph databases efficiently, selectivity estimation is one of the most important query optimization technologies

An Example • Query q=//faculty[//RA][//TA] means to list all faculties that have both RA and TA • To evaluate this query, we have two evaluation plans • One plan • Find out faculties having RA. Result set size is 3. • Find out faculties having TA from the intermediate results • Another plan • Find out faculties having TA. Result set size is 2. • Find out faculties having RA from the intermediate results

Problem Definition • Selectivity estimation is that given a query, estimate how many results are produced by this query without costly evaluation q=//faculty[//RA] Selectivity(q) = 3

Our methodology skeleton • Step1: label the graph nodes (pre-prepared) • Step2: Estimate query selectivity based on the pre-prepared labels (after a query comes)

Prime number labeling • Label each graph node with an integer, which is production of some prime numbers

Prime number labeling (cont.) • Divisibility of labels implies ancestor-descendent relationship For example, 3*5*7*11 is divisible by 11, so node g is descendent of node a

Optimization • Replace integers by vectors

Optimization (cont.) implies node b is descendent of node a

Our methodology skeleton • Step1: label the graph nodes (pre-prepared) • Step2: Estimate query selectivity based on the pre-prepared labels (after a query comes)

Selectivity Estimation • Two dimensional histogram • Originally designed for selectivity estimation on trees [Jargadish 2004] • Label each tree node by an interval, e.g. (l, r) • Represent the interval by a dot (l, r) on the XOY coordination system • Partition the XOY plain to grids as buckets • Estimate results using this histogram

Selectivity Estimation (cont.)

Optimization [0,3] [0,1] Replace integers by vectors

Consecutive Ones Property Matrix • Given a 0/1 matrix, if we can find an order of columns such that all row’s 1s are consecutive, this matrix is called consecutive ones property matrix (C1P matrix) • Reorganization is linear • Find the largest C1P sub matrix is NP and if 1s number of each column is larger than 3, it is un- polynomial time approximatable

Add extra columns

Add extra columns • Given a 0/1 matrix, add minimum number of extra columns such that result matrix is a C1P matrix is NP?

Heuristic algorithm • Duplicate • Merge

Heuristic algorithm (cont.)

Heuristic Algorithm (cont.)

Selectivity Estimation (cont.)

Implementation

Thanks!

Selectivity Estimation of XPath for Cyclic Graphs