1 / 132

CMPT 454

Query Optimization. CMPT 454. Query Optimization. Parsing Queries Relational algebra review Relational algebra equivalencies Estimating relation size Cost based plan selection Join order. Query Optimization.

dooley
Download Presentation

CMPT 454

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Optimization CMPT 454

  2. Query Optimization • Parsing Queries • Relational algebra review • Relational algebra equivalencies • Estimating relation size • Cost based plan selection • Join order

  3. Query Optimization • Generating equivalent logical plans and their physical plans is known as query optimization • Selecting a good query plan entails making decisions • Which equivalent plan leads to the most efficient query • Which algorithm should be used to implement an operation • How data from one operation should be passed to the next • These choices depend on database metadata • Size of relations • Number and frequency of attributes • Indexing and data file organization

  4. Query Processing query Query Compilation generating, improving and estimating query plans Query Execution algorithms to implement physical plan operators result

  5. Query Compilation query Parser Preprocessor Logical plan generator Rewriter Logical plan

  6. Query Compilation • Parsing • Construct a parse tree for a query • Generate equivalent logical query plans • Convert the parse tree to a query plan in relational algebra • Transform the plan into more efficient equivalents • Generate a physical plan • Select algorithms for each of the operators in the query • Including details about how tables are to be accessed or sorted

  7. Parsing • The parser takes an SQL query and converts it to a parse tree • A parse tree is a tree whose nodes are • Atoms – keywords, attribute names, relations, constants, operators or • Syntactic categories – families of query subparts such as a query or a condition • A node that is an atom has no children • If a node is a syntactic category it is described by one of the rules of the grammar

  8. A Simple Grammar – Queries • The syntactic category <Query> represents SQL queries • For this simple grammar there is just one rule for queries • The symbol ::= means “can be expressed as” • The query rule omits GROUP BY, HAVING and (many) other optional clauses • <Query> ::= SELECT <SelList> FROM <FromList> WHERE <Condition>

  9. Select Lists • A select list is a comma separated list of attributes • A single attribute, or • An attribute, a comma and a list of attributes • These rules do not provide for expressions, aggregations and aliases • <SelList> ::= <Attribute>, <SelList> • <SelList> ::= <Attribute>

  10. From Lists • A from list is a comma separated list of relations • These rules do not provide for joins, sub-queries and tuple variables • <FromList> ::= <Relation>, < FromList > • < FromList > ::= < Relation>

  11. Conditions • This extremely abbreviated set of rules does not include • OR, NOT and EXISTS • Comparisons not on equality or LIKE • ... • <Condition> ::= <Condition> AND <Condition> • <Condition> ::= <Attribute> IN <Query> • <Condition> ::= <Attribute> = <Attribute> • <Condition> ::= <Attribute> LIKE <Pattern>

  12. Base Syntactic Categories • There are three base syntactic categories • <Attribute>, <Relation>, and <Pattern> • These categories are not defined by rules but by which atoms they can contain • An <Attribute> can be any string of characters that identifies a legal attribute • A <Relation> can be any string of characters that identifies a legal relation

  13. Example • Consider two relations • Account = {accID, balance, ownerSIN} • Transaction = {transID, amount, date, owner_acc} • And a query ... • SELECT owner_acc, amount • FROM Transaction • WHERE owner_acc IN( • SELECT accID • FROM Account • WHERE balance = 1000000)

  14. Example Parse Tree <Query> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> , <SelList> <RelName> <Attribute> IN owner_acc <Attribute> Transaction owner_acc amount <Query> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> = 1000000 accID Account balance

  15. Example • Consider the same two relations • Account = {accID, balance, ownerSIN} • Transaction = {transID, amount, date, ownerAcc} • And an equivalent query to the last one ... • SELECT ownerAcc, amount • FROM Transaction, Account • WHERE balance = 1000000 AND ownerAcc = accID

  16. Example Parse Tree <Query> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> , <SelList> <RelName> , <SelList> ownerAcc <Attribute> Transaction <RelName> AND amount Account <Condition> <Condition> <Attribute> = 1000000 <Attribute> = <Attribute> balance ownerAcc accID

  17. Preprocessor • The pre-processor is responsible for a number of functions • Virtual views • A relation in a query may be a virtual view • Such relations must be replaced by a parse tree that describes the view • Semantic checking • A syntactically valid query may violate rules on names

  18. Semantic Checking • Check each relation used in the FROM clause • Check and resolve attributes used • In a relation in a FROM clause of the query • All attributes must be in the correct scope • Check types • The types of attributes must be appropriate for their uses • Operators must have operands of appropriate and compatible types

  19. Parse Tree to Relational Algebra • The next stage is to transform the parse tree into a logical query plan • Convert the parse tree to relational algebra • Improve the plan • To make improvements to a plan we need to be able to compute equivalent queries

  20. Relational Algebra Review p...(s...(A) - s...(B))

  21. Relational Algebra Operators • Selection () • age > 60(Employee) returns all employees over 60 • Projection () • sin, salary(Employee) returns all SINs, and salaries • Set Operations • Union () • Intersection () • Set Difference () • Cartesian Product () • Division () • Joins ( )

  22. Union Compatibility Doctor  Patient Not union compatible

  23. Intersection sin,fName,lName(Doctor)  sin,fName,lName(Patient)

  24. Union sin,fName,lName(Doctor)  sin,fName,lName(Patient)

  25. Set Difference sin,fName,lName(Doctor)  sin,fName,lName(Patient)

  26. Cartesian Product Doctor  Patient

  27. Intermediate Relations fName,lName,description(Patient.msp = Operation.msp  age  50(Patient  Operation))

  28. Intermediate Relations fName,lName,description(Patient.msp = Operation.msp  age  50(Patient  Operation))

  29. Intermediate Relations fName,lName,description(Patient.msp = Operation.msp(age  50(Patient)  Operation))

  30. Relational Algebra Equivalencies

  31. Commutative and Associative Laws • If an operator is commutative the order of its arguments do not matter • e.g. + (x + y = y + x), but not – (x – y ≠ y – x) • If an operator is associative then two uses of it may be grouped from the left or the right • e.g. (x + y) + z = x + (y + z) • Operands of associative and commutative operators may be grouped and ordered in any way

  32. Bags and Sets • SQL queries result in bags, not sets • A bag may contain duplicates whereas sets do not • Some set-theoretic laws apply to sets but do not apply to bags • For example, the distributive law of intersection over union • A  (B  C)  (A  B)  (A  C) • Does not apply to bags

  33. Set Operations • Unions are both commutative and associative • R  S  S  R • R  (S  T)  (R  S)  T • Intersections are both commutative and associative • R  S  S  R • R  (S  T)  (R  S)  T • Set difference is neither commutative nor associative • R  S  S  R • R  (S  T)  (R  S)  T

  34. Cartesian Products and Joins • Cartesian product and joins are commutative • e.g. R S  S R • Cartesian products and joins are associative • e.g. R  (S  T)  (R  S)  T • Relations may therefore be joined in any order

  35. Join Definition • A selection and Cartesian product can be combined to form a join • c(R  S)  R c S • e.g. P.msp = O.msp(Patient  Operation)  Patient Operation • This may have an impact of the cost of a query • Some join algorithms are much more efficient than performing a Cartesian product

  36. Join Order • The order in which joins and Cartesian products are made affects the size of intermediate relations • Which, in turn, affects the time taken to process a query • Consider these three relations: • Customer = {sin, fn, ln, age} – 1,000 records • Account = {acc, type, balance} – 1,200 records • Owns = {sinfkCustomer, accfkAccount} – 1,400 records • Owns (Customer Account) • Intermediate relation – 1,000 * 1,200 = 1,200,000 records • (Owns Customer) Account • Intermediate relation – 1,400 records

  37. Selections • Selections may reduce considerably reduce the size of relations in a query • It is important to push selections as far down the query plan tree as possible • Without changing the query to one that is not equivalent

  38. Splitting Selections • Conjunctions can be split into a cascading selection • c1  c2  cn(R)  c1(c2((cn(R)))) • age>50  name=“Abe”(Patient)  age>50(name=“Abe”(Patient)) • Selections are commutative • c1(c2(R))  c2(c1(R)) • age>50(name=“Abe”(Patient))  name=“Abe” (age>50(Patient)) • Disjunctive selections can be replaced by unions • c1 c2(R)  c1(R)  c2(R) • This only works if R is a set (not a bag)

  39. Selections and Set Operations • A selection can be pushed through a union, and must be applied to both arguments • c(R  S)  c(S)  c(R) • A selection can be pushed through an intersection, and need only be applied to one argument • c(R  S)  c(S)  (R) • A selection can be pushed through a set difference, and must be applied to the first argument • c(R  S)  c(S)  (R)

  40. Selections and Cartesian Products and Joins • A selection can be pushed through a Cartesian product, and is only required in one argument • c(R  S)  c(R)  S • If the selection involves attributes of only one relation • This relationship can be stated more generally • Replace c with: cRS (with attributes of both R and S), cR(with attributes just of R) and cS (with attributes just of S): • c(R  S)  cRS(cR(R)  cS(S)) • age>50  P.msp=O.msp  desc="lobotomy"(Patient  Operation)  P.msp=O.msp(age>50(Patient)  desc="lobotomy"(Operation))

  41. Pushing Selections age>50  P.msp=O.msp  desc="lobotomy"(Patient  Operation) age>50  P.msp=O.msp  desc="lobotomy"  Patient Operation Pushing selections as far down as possible result age>50 description = “lobotomy" Patient Operation

  42. Projections • Only the final projection in a series of projections is required • a1 (R) a1(a2((an(R)))) • where ai  ai+1 • For example: • city(Patient) city(city,fName(city,fName,lName(Patient)))

  43. Projections and Set Operations • A projection can be pushed through a union, and must be applied to both arguments • a(R  S)  a(R)  a(S) • Projections can’t be pushed through an intersection or a set difference • a(R  S)  a(R)  (S) • a(R  S)  a(R)  (S)

  44. Projections and Cartesian Products • Projections can be pushed through Cartesian products • a(R  S)  aR(R)  aS(S) • Let the attribute list a be made up of aR (attributes of R), and aS (attributes of S) • e.g. P.msp,Name,lName,description,O.msp(R  S)  • msp,fName,lname(Patient)  description,msp(Operation) • In this example a selection could then be made to extract corresponding patient and operations records

  45. Projections and Joins • Projections can be pushed through joins • If the join condition attributes are all in the projection • e.g. msp,age,description(Patient Operation)  • msp,age(Patient) msp,description(Operation) • More generally • Let aR contains the attributes of R that appear in c or a, and aS contains the attributes of S that appear in c or a: • a(R c S)  a(aR(R) c aS(S)) • e.g. fName,lName,acc(AcccountC.sin = A.sin  balance > income Customer) •  • acc,balance,sin(Account) C.sin = A.sin  balance > income fName,lName,acc(sin,Name,lName,income(Customer))

  46. Selections and Projections • Selections and Projections are commutative if the selection only involves attributes of the projection • a(c(R))  c(a(R)) • e.g. msp,fName,lName(age > 50(Patient)) • is not equivalent to • age > 50(msp,fName,lName(Patient)) • In other words, don't project out attributes that are required for downstream selections!

  47. Duplicate Removal • Duplicate removal may be pushed through several operators • Selection, Cartesian product and joins • It can be moved to either or both the arguments of an intersection • It cannot be generally pushed through unions set difference or projections

  48. Grouping and Aggregation • There are a number of transformations that may be applied to queries with aggregates • Some of the transformations depend on the aggregation • The projection of attributes not included in the aggregation may be pushed down the tree • Duplicate removal may be pushed through MIN and MAX, but not SUM, or COUNT, or AVG

  49. Logical Query Plans

  50. Logical Query Plans • Once a parse tree has been constructed for a query it is turned into a logical query plan • A logical query plan consists of relational algebra operators (and relations) • Nodes and components of the parse tree are replaced by relational algebra operators • The relational algebra expression is modified • To an expression that is expected to result in an efficient physical query plan

More Related