1 / 36

Dealing with Uniqueness Constraint in Query Optimization

Dealing with Uniqueness Constraint in Query Optimization. Class Presentation Shouzheng Yang. Introduction. Where does uniqueness constraint exist: DISTINCT Set operations such as What are the solutions? Sort Under what condition is DISTINCT clause unnecessary?

coy
Download Presentation

Dealing with Uniqueness Constraint in Query Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dealing with Uniqueness Constraint in Query Optimization Class Presentation Shouzheng Yang

  2. Introduction Where does uniqueness constraint exist: DISTINCT Set operations such as What are the solutions? Sort Under what condition is DISTINCT clause unnecessary? And how do we identify such situations. What can we make use of to determine whether duplicate elimination is unnecessary?

  3. Exploiting Uniqueness in Query Optimization 1. Class of SQL queries considered With only algebraic operators selection, projection, and extended Cartesian product. And we assume WHERE clause only consists of two tables. WHERE clause can be

  4. Exploiting Uniqueness in Query Optimization 2. Constraint (Check, Unique Specification) Create Table SUPPLIER ( SNO …, SNAME…, SCITY…, BUDGET…, STATUS…, PRIMARY KEY(SNO), CHECK (SNO BETWEEN 1 AND 499), CHECK (SCITY IN (‘Chicago’, ‘New York’, ‘Toronto’)), CHECK(BUDGET<>0 OR STATUS=‘Inactive”))

  5. Some notations and expressions: the extended Cartesian product select all rows of R that satisfy condition C projection. d is either Dist or All A  b (corresponding attributes in A and b must either agree in value, or both be NULL.)

  6. Exploiting Uniqueness in Query Optimization Table for following examples: SUPPLIER(SNO, SNAME, SCITY,BUDGET,STATUS) PARTS(SNO, PNO, PNAME, OEM-PNO, COLOR) AGENTS(SNO, ANO, ANAME, ACITY). Tuples in PARTS reference the SUPPLIER who supply them. Tuples in AGENTS reference the SUPPLIER they represent. OEM-PNO is a candidate key

  7. Exploiting Uniqueness in Query Optimization Example: Select all S.SNO, SNAME, P.PNO, PNAME for a given supplier-no who supply that part. SELECT ALL S.SNO, SNAME, P.PNO, PNAME FROM SUPPLIER S, PARTS P WHERE P.SNO=:SUPPLIER-NO AND S.SNO=P.SNO By observation: We claim that PNO is a key of the derived table. Guess the theorem!?

  8. Exploiting Uniqueness in Query Optimization Intuitively: SELECT A FROM R, S WHERE The uniqueness condition will be met if: Either all the columns of Key (R x S) are in the project list, or A subset of key columns is present in the projection list, and the values of the other key columns are equated to constants or can be inferred through the selection predicate or table constraints.

  9. Uniqueness Condition Formally, theorem (Uniqueness Condition): SELECT A FROM R, S WHERE Two expressions Q and V are equivalent if and only if :

  10. Uniqueness Condition

  11. Proof: Sufficiency If theorem 1 holds but Q≠V There exist (at least) two tuples r0, r0’ ∈ (I(R) ×I(S)) such that r0[A]=r0’[A] (Two tuples are same after projection in Q) Back to base tables, r0 and r0’ are derived from the tuples r0 [α(R)], r0’[α(R)], r0 [α(S)], r0’[α(S)], either r0 [α(R)] ≠r0’[α(R)] or r0 [α(S)] ≠r0’[α(S)]. However, the violates theorem where, r0 [Key (R×S)]=r0’[Key (R×S)]

  12. Proof: Necessity Q contains no duplicates but theorem 1 does not hold. There exist two tuples r0, r0’ ∈Domain (R × S) such that following is false but table constraints, key dependencies and query predicates are true. There must exist at least one column β∈ Key (R) o Key (S) where r0[β] ≠ r0’ [β] . Suppose β∈ Key (R). Because r0[A]=r0’[A], V contains a single tuple. Back to base table, r0 [α(R)] ≠r0’[α(R)] , so the extended Cartesian product with r0 [α(S)] = r0’[α(S)] yields at least two tuples. Then Q ≠ V

  13. Example SELECT ALL S.SNO, SNAME, P.PNO, PNAME FROM SUPPLER S, PARTS P WHERE P.SNO=:SUPPLER-NO AND .SNO=P.SNO

  14. Example

  15. Example Finally if these can functionally determine this:

  16. Example In a simple word: Since :

  17. Algorithm

  18. Algorithm

  19. Example SELECT DISTINCT S.SNO, S.SNAME, P.PNO, P.PNAME FROM SUPPLIER S, PARTS P WHERE P.SNO=:SUPPLIER-NO AND S.SNO=P.SNO

  20. Algorithm Line 8: C  P.SNO=:SUPPLIER-NO ∧ S.SNO=P.SNO ∧ T Line 9-17: C is unchanged Line 18: C is not simply true; we proceed

  21. Algorithm Line 21: E_1  P.SNO=: SUPPLIER-NO ∧ S.SNO=P.SNO Line 24: V={S.SNO, SNAME, P.PNO. PNAME} Line 26: V={S.SNO, SNAME, P.PNO, PNAME, P.SNO} Line 29: V is unchanged.

  22. Corollary 1 2 3 • Q = V1 • Q = V2 iff Q = V1 AND πAll [AR] (σ [CR] (R)) contains no duplicate rows.

  23. Example for Corollary SELECT ALL S.SNO, S.SNAME FROM SUPPLIER S WHERE S.SNAME=:SUPPLIER-NAME AND EXISTS ( SELECT * FROM PARTS P WHERE S.SNO=P.SNO AND P.PNO=:PART-NO) SELECT ALL S.SNO, SNAME FROM SUPPLIER S, PARTS P WHERE S.SNAME=:SUPPLIER-NAME AND S.SNO=P.SNO AND P.PNO=:PART-NO SELECT DISTINCT S.SNO, SNAME FROM SUPPLIER S, PARTS P WHERE S.SNAME=:SUPPLIER-NAME AND S.SNO=P.SNO AND P.PNO=:PART-NO

  24. Reasoning about Duplicate Elimination with Description Logic PERSON < (Name: STRING) PERSON (Name Id) STUDENT < PERSON TAKES < (S: STUDENT), (C: COURSE), (MARK: INT), TAKES (S,C Id), TAKES (S, C.Time C) DEPT < (Name: STRING), (Head: PROF), DEPT (Name Id) DEPT (Head  Id) Tables: PROF < PERSON, (In: DEPT) COURSE < (Num: INT), (Room: INT), (Time: INT), (Inst: PROF), (In: DEPT), COURSE (Num, In Id), COURSE (Room, Time Id), COURSE (Inst.In In)

  25. Example for objective Find the unique names of students taking some course taught at the same time as some other course numbered: P2 with an instructor in the department named: P1. Select distinct S.SNAME as NAME From STUDENT as S, TAKES as T, COURSE as C Where S=T.S and C.Time=T.C.Time and C.Inst.In.Name =:P1 and C.Num=:P2 Select S.SNAME as NAME From STUDENT as S, TAKES as T, COURSE as C Where S=T.S and C.Time=T.C.Time and C.Inst.In.Name =:P1 and C.Num=:P2

  26. Lemma Normal Form Let Q be an arbitrary object relational conjunctive query. Then there is an equivalent query with the form. Select V From C1 as A1, … Cm as Am, (Elim Select W from Cm+1 as Am+1, … , Cn as An, R) Where R is a list of equalities of the form Ai.Pf1=Aj.Pf2 , basically, to represent the join part. V and W are sets of attribute names.

  27. Theorem (Duplicate Elimination Reduction) Q : select V from C1 as A1,…,Cmas Am, (elim select W from Cm+1 as Am+1 ,…,Cnas An , R) is semantically equivalent to Q’: select V from C1 as A1 ,…,Cm+1 as Am+1, (elim select W ∪ {Am+1} from Cm+2 as Am+2,…, Cnas An , R) if and only if A ∪ W  Am+1 where A = {A1 ,…,Am}.

  28. Proof Necessity: If Q=Q’, but A∪ W  Am+1 does not hold. There must exist v, w that Tuple Ai (v)=Tuple Ai (w) for Ai∈ { A ∪ W }, but FAm+1(v)≠ FAm+1(w). Then a single tuple FAi(v) for Q but At least the two tuples (FAi(v), Am+1 ): FAm+1(v) (FAi(w), Am+1 ): FAm+1(w) for Q’ This fact is preserved by the final duplicate-preserving projection and thus we have Q≠Q’ Sufficiency: Similar idea find two tuples differ on Am+1 to yield contradiction that A∪ W  Am+1 is false.

  29. Example First, the original query is transformed to this lemma normal form. select :P1, :P2, Name from elim ( select :P1, :P2, Name from STUDENT as S, TAKES as T, COURSE as C, S = T.S, C.Time = T.C.Time, C.Inst.In.Name = :P1, C.Num = :P2, Name = S.Name ) Note that parameters :P1 and :P2 are converted to “result” attributes on the normalized query.

  30. Example Now we use our rewriting rule from the Theorem to move the reference to COURSE out of the scope of the elim operator. The necessary condition is (A ∪ W  Am+1). That is (Name, :P1, :P2  C ). Select :P1, :P2, Name From COURSE as C, elim ( select C, :P1, :P2, Name from STUDENT as S, TAKES as T S = T.S, C.Time = T.C.Time, C.Inst.In.Name = :P1, C.Num = :P2, Name = S.Name )

  31. Example A sequence of two additional applications of the duplicate elimination rewrite (then check for (Name, :P1, :P2,CT) to move the TAKES as T and (Name, :P1, :P2, C, T  S) for STUDENT as S) results in the following query: select :P1, :P2, Name from COURSE as C, TAKES as T, STUDENT as S, elim ( select C, T, S, :P1, :P2, Name from S = T.S, C.Time = T.C.Time, C.Inst.In.Name = :P1, C.Num = :P2, Name = S.Name )

  32. Conclusion If we manage to move all class references out of the scope of the elim operator (as in Example), we can completely remove the duplicate elimination operation from the query. (Why?) Could be reduced to the idea from last paper. Note they are class references which will be implemented as tables.

  33. Example Objective Achieved Find the unique names of students taking some course taught at the same time as some other course numbered: P2 with an instructor in the department named: P1. Select distinct S.SNAME as NAME From STUDENT as S, TAKES as T, COURSE as C Where S=T.S and C.Time=T.C.Time and C.Inst.In.Name =:P1 and C.Num=:P2 Select S.SNAME as NAME From STUDENT as S, TAKES as T, COURSE as C Where S=T.S and C.Time=T.C.Time and C.Inst.In.Name =:P1 and C.Num=:P2

  34. Laws Involving Duplicate Elimination We can also move the δto either or both of the arguments of an intersection: But not across the operators

  35. Laws Involving Duplicate Elimination • δ(R) = R if R has no duplicates. Important cases of such a relation R include: • A stored relation with a declared primary key. • b) A relation that is the result of a grouping operation, since grouping creates a relation with no duplicates.

  36. Thank you!

More Related