260 likes | 331 Views
Selectivity Estimation of XPath for Cyclic Graphs. Yun Peng. Outline. Motivation Problem definition Prime number labeling Selectivity estimation Implementation. Motivation.
E N D
Outline • Motivation • Problem definition • Prime number labeling • Selectivity estimation • Implementation
Motivation • To retrieve sub graphs from large graph databases efficiently, selectivity estimation is one of the most important query optimization technologies
An Example • Query q=//faculty[//RA][//TA] means to list all faculties that have both RA and TA • To evaluate this query, we have two evaluation plans • One plan • Find out faculties having RA. Result set size is 3. • Find out faculties having TA from the intermediate results • Another plan • Find out faculties having TA. Result set size is 2. • Find out faculties having RA from the intermediate results
Problem Definition • Selectivity estimation is that given a query, estimate how many results are produced by this query without costly evaluation q=//faculty[//RA] Selectivity(q) = 3
Our methodology skeleton • Step1: label the graph nodes (pre-prepared) • Step2: Estimate query selectivity based on the pre-prepared labels (after a query comes)
Prime number labeling • Label each graph node with an integer, which is production of some prime numbers
Prime number labeling (cont.) • Divisibility of labels implies ancestor-descendent relationship For example, 3*5*7*11 is divisible by 11, so node g is descendent of node a
Optimization • Replace integers by vectors
Optimization (cont.) implies node b is descendent of node a
Our methodology skeleton • Step1: label the graph nodes (pre-prepared) • Step2: Estimate query selectivity based on the pre-prepared labels (after a query comes)
Selectivity Estimation • Two dimensional histogram • Originally designed for selectivity estimation on trees [Jargadish 2004] • Label each tree node by an interval, e.g. (l, r) • Represent the interval by a dot (l, r) on the XOY coordination system • Partition the XOY plain to grids as buckets • Estimate results using this histogram
Optimization [0,3] [0,1] Replace integers by vectors
Consecutive Ones Property Matrix • Given a 0/1 matrix, if we can find an order of columns such that all row’s 1s are consecutive, this matrix is called consecutive ones property matrix (C1P matrix) • Reorganization is linear • Find the largest C1P sub matrix is NP and if 1s number of each column is larger than 3, it is un- polynomial time approximatable
Add extra columns • Given a 0/1 matrix, add minimum number of extra columns such that result matrix is a C1P matrix is NP?
Heuristic algorithm • Duplicate • Merge