80 likes | 159 Views
Learn about selectivity estimation in RDF data using examples and estimation approaches for various triple patterns. Understand how to calculate selectivity with histograms and unbound subjects, predicates, and objects.
E N D
Selectivity Estimation Example Mohammad Farhan Husain
Example Data R1, R2, … , R8 are resources i.e. URIs P1 and P2 are predicates, also URIs L1, L2, … , L5 are literals R = Total number of unique resources = 8 T = Total number of triples = 8 TP1 = Total number of triples having predicate P1 = 5 TP2 = Total number of triples having predicate P2 = 3 For any query: Selectivity of a bound subject s = sel(s) = 1 / R = 1 / 8 = 0.125 Selectivity of predicate P1 = sel(P1) = TP1 / T = 5 / 8 = 0.625 Selectivity of predicate P2 = sel(P2) = TP2 / T = 3 / 8 = 0.375 Selectivity of unbound subject and predicate and object = 1.0
Example Histogram for P1 Suppose there is a hash function which assigns the object values of triples having predicate P1 in two bins in the following manner: Bin 1 contains: L1, L2 and R2 Bin 2 contains: R4 and L3
Example Histogram for P2 Suppose the same hash function assigns the object values of triples having predicate P2 in two bins in the following manner: Bin 1 contains: L5 Bin 2 contains: L4 and R1
Selectivity Estimation for Triple Pattern Example with Bound Predicate • Triple Pattern: ?s P1 L2 • Estimated selectivity = sel(s) x sel(P1) x sel(L2) = 1.0 x 0.625 x sel(P1, L2) = 1.0 x 0.625 x (h1(P1, L2) / TP1) = 1.0 x 0.625 x (Height of Bin 1 / TP1) = 1.0 x 0.625 x (3 / 5) = 0.375 • Here, h1(P1, L2) denotes the bin of the histogram of predicate P1 where the hash function puts L2 in.
Selectivity Estimation for Triple Pattern Example with Unbound Predicate • Triple Pattern: ?s ?p L2 • Estimated selectivity = sel(s) x sel(p) x sel(L2) = 1.0 x 1.0 x {∑Pi ϵ P sel(Pi, L2)} = 1.0 x 1.0 x {sel(P1, L2) + sel(P2, L2)} = 1.0 x 1.0 x {h1(P1, L2) / TP1 + h1(P2, L2) / TP2} = 1.0 x 1.0 x {Height of Bin 1 of P1 Histogram / TP1 + Height of Bin 1 of P2 Histogram / TP2} = 1.0 x 1.0 x {3 / 5 + 1 / 3} = 0.933 • Note that the hash function always puts the value L2 into bin 1. That is why we pick the height of Bin 1 of the histogram for P2 even though P2 does not have the value L2 as its object in any of the triples.
Selectivity Estimation for Triple Pattern Example with Unbound Object • Triple Pattern: ?s P1 ?o • Estimated selectivity = sel(s) x sel(P1) x sel(o) = 1.0 x 0.625 x 1.0 = 0.625