1 / 108

Chapter 7 The Query Compiler

Chapter 7 The Query Compiler. Query Processor : Query Parser Tree Logical Query Plan Physical Query Plan Query Structure Relational Algebraic Expression Tree. 1 2 3.

Download Presentation

Chapter 7 The Query Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7 The Query Compiler Query Processor: Query Parser Tree Logical Query Plan Physical Query Plan Query Structure Relational Algebraic Expression Tree 1 2 3

  2. The Stages of Query Compilation Query Parser § 7.1 Preprocessor Logical query plan generator § 7.3 Query rewriter Preferred logic query plan

  3. Parsing Convert a SQL statement to a parse tree which consists of the following nodes: 1. Atoms: lexical elements such as keywords, names of attributes or relations, constants, parentheses, operators and other schema elements 2. Syntactic categories: names for families of query subparts such <SFW>, <Condition>

  4. A Grammar of a Simple Subset of SQL 1. Query: <Query> ::= <SFW> <Query> ::= (<Query>) 2. Select-From-Where: <SFW>::= SELECT <SelList> FROM <FromList> WHERE <Condition>

  5. 3. Select-Lists: <SelList>::= <Attribute>,<SelList> <SelList>::= <Attribute> 4. From-Lists: <FromList>::= <Relation>,<FromList> <FromList>::= <Relation> 5. Conditions: <Condition>::= <Condition> AND <Condition> <Condition>::= <Tuple> IN <Query> <Condition>::= <Attribute> = <Attribute> <Condition>::= <Attribute> LIKE <Pattern> 6. <tuple>::=<attribute>

  6. An Example StarsIn( title, year, starName) MovieStar( name,address, gender, birthdate) Find the movies with stars born in 1960 SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ );

  7. <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> titleStarsIn <Attribute> ( <Query>) starName <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthdate ‘%1960’

  8. <Query> SELECT title FROM StarsIn, MovieStar WHERE starName =name AND birthdate LIKE ‘%1960’ <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> , <FromList> titleStarsIn <RelName> AND MovieStar <Condition> <Condition> <Attribute> = <Attribute> <Attribute> LIKE <Pattern> starName name birthdate‘%1960’

  9. Preprocessor • View Expansion • Semantic Checking • Check relation uses • Check and resolve attribute uses • Check types

  10. Algebraic Laws for Improving Query Plans ▪ Commutative and Associative Laws ▪ Laws Involving Selection ▪ Laws Involving Projection ▪ Laws About Joins and Products ▪ Laws Involving Duplicate Elimination ▪ Laws Involving Grouping and Aggregation

  11. Commutative and Associative Laws R×S=S×R R∞S=S∞R R∪S=S∪R R∩S=S∩R (R×S) ×T=R× (S×T) (R∞S) ∞T=R∞ (S∞T) (R∪S) ∪T=R∪ (S∪T) (R∩S) ∩T=R∩ (S∩T)

  12. Theta Join:R∞ S = S∞ R c c • Suppose R(a,b), S(b,c) and T(c,d). (R ∞ S) ∞ T R ∞ ( S ∞ T ) R.a>S.b a<d R.a>S.b a<d

  13. Laws Involving Selection ▪σC1 AND C2 (R)= σC1(σC2(R)) ▪σC1 OR C2 (R)= (σC1(R))∪s(σC2(R)) =σC2(σC1(R))

  14. Transformation Examples σ(a=1 OR a=3)AND b<c (R) σ(a=1 OR a=3)(σb<c(R)) σa=1(σb<c(R))∪σa=1(σb<c(R)) σ(a=1 OR a=3)AND b<c (R) σb<c(σa=1 OR a=3 (R)) σ b<c(σa=1(R) ∪σa=3(R))

  15. σLaw for Binary Operators 1. ∪:The selection must be pushed to both arguments. 2. ―: The selection must be pushed to the first argument and optionally may be pushed to the second. 3. Others:It is only required that the selection be pushed to one argument.

  16. σC(R∪S) = σC(R)∪σC(S) • σC(R―S) = σC(R)―S = σC(R)―σC(S) • σC(R×S) = σC(R)×S • σC(R∞S) = σC(R)∞S • σC(R ∞ S) =σC(R)∞ S D D • σC(R∩S) = σC(R)∩S For example, R(a,b) and S(b,c) σa=1 OR a=3(σb<c(R∞S) →σa=1 OR a=3(R∞σb<c(S)) Suppose the relation R has all the attributes mentioned in C

  17. Pushing Selections • Sometimes move a selection as far up the tree and then push the selections down all possible branches • E.g. , StarsIn (title, year, starName) Movie (title, year, length, studioName) View : CREATE VIEW MovieOf1996 AS SELECT * FROM Movie WHERE year=1996; Query: “Which stars worked for which studios in 1996?” SELECT starName, studioName FROM MovieOf1996 NATURAL JOIN StarsIn

  18. ПstarName, studioName ∞ σyear=1996 StarsIn Movie ∵σC(R ∞ S) =σC(R)∞ S ∴ σyear=1996(Movie) ∞ StarsIn =σyear=1996(Movie ∞ StarsIn) ∵ σC(R ∞ S) =σC(R) ∞ σC(S) ∴ σ year=1996(Movie ∞ StarsIn) =σyear=1996(Movie) ∞ σyear=1996(StarsIn) ПstarName, studioName ∞ σyear=1996 σyear=1996 Movie StarsIn

  19. Laws Involving Projection A projection may be introduced anywhere in an expression tree, as long as it eliminates only attributes that are never used by any of the operators above, and are not in the result of the entire expression.

  20. Basic Laws: ▪ ПL(R∞S)=ПL(ПM(R)∞ПN(S)) ▪ ПL(R∞S)=ПL(ПM(R)∞ПN(S)) C C ▪ ПL(R×S)=ПL(ПM(R) ×ПN(S)) where M,N are attributes of R and S respectively or input attributes in L

  21. Suppose there are relations R(a,b,c), S(c,d,e) Пa+e→x,b→y(R∞S) Пa+e→x,b→y(Пa,b,c(R)∞Пc,e(S)) Пa+e→x,b→y( R∞Пc,e(S)) • ПL(R∪B S)=ПL(R)∪B ПL(S) • Projections cannot be pushed below ∪S,―,∩. For example, R(a,b):{(1,2)}; S(a,b): {(1,0)} Пa(R∩S)=Φ, Пa(R)∩Пa(S)={(1)}

  22. Projection Involving Some Computation R(a,b,c), S(c,d,e) Пa+b→x,d+e→y(R∞S) =Пx,y(Пa+b→x,c(R)∞Пd+e→y,c(S)) If x or y is c, we need a temporary name. Пa+b→c,d+e→y(R∞S) =Пz→c,y(Пa+b→z,c(R)∞Пd+e→y,c(S))

  23. Pushing a projection below a selection ПL(σc(R))=ПL(σc(ПM(R))) (M: input attributes of L or mentioned in C) For example, from StarsIn( title, years, starName) to find stars that worked in 1996 SELECT starName FROM StarsIn WHERE year=1996; ПstarName σyear=1996 ПstarName,year StarsIn ПstarName σyear=1996 StarsIn Notice: If there is index on year, it may not improve the plan

  24. Laws About Joins and Products • R∞S=σc(R × S) c • R∞S=ПL(σc(R × S)) Usually use the rule from right to left ?

  25. Laws Involving Duplicate Elimination ▪ δ(R)=R if R has no duplicates [ R:1) A stored relation with a declared primary key 2) The result of aγ operation] ▪ δ(R ∪s S)=R ∪s S the same as ∩s, ―s

  26. Several laws that push δ ▪δ(R×S) =δ(R)×δ(S) ▪δ(R ∞ S) =δ(R) ∞δ(S) ▪δ(R ∞ S) =δ(R) ∞δ(S) c c ▪δ(σc(R))=σc(δ(R)) Notice δ cannot be moved across ∪B,―B or П

  27. For example, R has two copies of t tuple,S has one copy of t T(a,b): {(1,2),(1,3)}. δ(Пa(T))={(1)} Пa(δ(T))={(1) , (1)}

  28. Laws Involving Grouping and Aggregation • General Rules: ▪ δ(γL(R))=γL(R) ▪ γL(R)=γL(ПM(R)) (M: attributes of R mentioned in L) • Other Rules: ▪ MIN, MAX: Not affected by duplicates γL(R)= γL(δ(R)) ▪ SUM, COUNT, AVG: Affected by duplicates

  29. An Example Relations: MovieStar( name, addr, gender, birthdate) StarsIn( title, year, starName) Query:For each year, find the birthdate of the youngest star to appear in a movie that year SELECT year, MAX (birthdate) FROM MovieStar, StarsIn WHERE name=starName GROUP BY year; γyear, MAX (birthdate) σname=starName × MovieStar StarsIn

  30. Combine the selection and product into an equijoin • Generate a δ belowγ • Generate a Пbetween theγ and the introduced δ to project onto year and birthdate γyear, MAX (birthdate) Пyear,birthdate ∞ name=starName δδ Пbirthdate, name Пyear, starName MovieStar StarsIn γyear, MAX (birthdate) σname=starName δ ∞ name=starName MovieStar StarsIn

  31. From Parse Trees to Logical Query Plans Suppose <Query> is a <SFW> construct,<Condition> has no subqueries,convert <SFW> into a relational algebra expression from bottom to top as follows: 1. Product all relations from <FromList>; 2. σc, C is the <Condition> expression; 3. ПL, L is the list of attributes in the <SelList>

  32. Translation of A Parse Tree to an Algebraic Expression Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> , <FromList> title StarsIn <RelName> AND MovieStar <Condition> <Condition> <Attribute> = <Attribute> <Attribute> LIKE <Pattern> starName name birthdate ‘%1960’

  33. Пtitle σstarName=name AND birthdate LIKE‘%1960’ × StarsIn MovieStar

  34. Removing Subqueries From Conditions • Two-argument selection Node:σ Left Child: The Relation R Right Child: The Condition C

  35. <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starName <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthdate ‘%1960’

  36. Пtitle σ StarsIn<Condition> <Tuple> IN Пname <Attribute> σbirthdate LIKE ‘1960’ starName MovieStar

  37. Replacement of Two-Argument Selection by a One-Argument Selection Uncorrelated Subquery: Two-Argument Selection with a left child for R and right child for t IN S : • Replace the <Condition> by the expression S • Replace the two-argument selection σc. 。 • Give σc an argument that is the product of R and S。

  38. Uncorrelated Subquery Пtitle σstarName=name × StarsIn Пname σbirthdate LIKE ‘1960’ MovieStar

  39. Correlated Subquery: δ Пm1.title,m1.year σ StarsIn m1 <Condition> ―≤γAvg(s.birthdate) m1.year 40 σm2.title=m1.title AND m2.year=m1.year ∞ m2.starName=s.name StarsIn m2 MovieStar s SELECT DISTINCT m1.title, m1.year FROM StarsIn m1 WHERE m1.year-40<=( SELECT AVG(birthdate) FROM StarsIn m2, MovieStar s WHERE m2.starName=s.name AND m1.title=m2.title AND m1.year=m2.year ) Find the movies where the average age of stars was at most 40 when the movie was made.

  40. δ Пm1.title,m1.year σm1.year-40≤abd ∞ m2.title=m1.title AND m2.year=m1.year StarsIn m1 γm2.title,m2.year,Avg(s.birthdate)→abd ∞ m2.starName=s.name StarsIn m2 MovieStar s

  41. δ Пm2.title,m2.year σm2.year-40≤abd γm2.title,m2.year,Avg(s.birthdate)→abd ∞ m2.starName=s.name StarsIn m2 MovieStar s

  42. Improving the Logical Query Plan • Pushing down selection. • Pushing down projection,or adding new projection. • Removing duplicate elimination, or moving to a more convenient position. • Turning selection and product into an equijoin.

  43. Пtitle ∞ starName=name StarsIn σbirthdate LIKE ‘1960’ MovieStar Пtitle σstarName=name AND birthdate LIKE ‘%1960’ × StarsIn MovieStar Пtitle σstarName=name × StarsIn σbirthdate LIKE ‘1960’ MovieStar Пtitle σstarName=name × StarsIn σbirthdate LIKE ‘1960’ MovieStar

  44. Grouping Associative/Commutative Operators • To group the nodes with the same associative/ commutative operators into a single node with many children • In some situation,natural join can be combined with theta-join: • Replace the natural joins with theta-join; • Add a projection; • The theta-join conditions must be associative ∞ ∪ U V W R S T ∞ ∞∞ ∪ U V W R ∪ S T

  45. Estimating the Cost of Operations When deriving physical plans from a logical plan, we need select • an order and grouping for associative-and-commutative operations; • an algorithm for each operator in the logical plan; • additional operators – scanning, sorting, and so on; • the way in which arguments are passed from one operator to the next

  46. Estimating Sizes of Intermediate Relations • Give accurate estimates • Are easy to compute • Are logically consistent

  47. Estimating the Size of a Projection Suppose R(a, b, c), a, b are integers with 4 bytes respectively,c is a string with 100 bytes. Each tuple header requires 12 bytes and each block header requires 24 bytes,Then each block can hold (1024-24)/120=8 tuples。 If T(R)=10,000, then B(R)=10,000/8=1250 For S=Пa+b,c(R),each tuple of S is 116 bytes and each block can only hold (1024-24)/116=8 tuples, B(S)=1250 For U= Пa,b(R),each tuple of U is 20 bytes. Each block can hold 1000/20=50 tuples. B(U)=10,000/50=200

  48. Estimating the Size of a Selection • For S=σA=c(R),T(S)=T(R)/V(R,A)。 • For S=σa<10(R),T(S)=T(R)/3。 • For S= σa≠10(R), T(S)= T(R) T(R)- T(R)/V(R,A)

  49. AND of Conditions Selectivity factor in equality:1/3 ≠ :1 A=c :1/V(R,A) For R(a,b,c), S=σa=10 AND a>20(R), T(R)=10,000,V(R,a)=50. 则T(S)=T(R)/(50*3)=67 If the condition is contradictory S=σa=10 AND a>10(R) then T(S) = 0

  50. OR of Conditions Suppose S=σC1 OR C2(R), • the sum of the number of tuples satisfying C1 and those satisfying C2. • T(S)=n(1-(1-m1/n)(1-m2/n)) If R has n tuples, m1 of which satisfy C1 and m2 of which satisfy C2. For example:R(a,b), T(R)=10,000. S=σa=10 OR b<20(R), V(R,a)=50. m=T(R)/V(R,a)=200. n=T(R)/3=3333,then T(S)=10,000(1-(1-200/10,000)(1-3333/10,000))=3466

More Related