Semantic query optimization techniques
Download
1 / 31

Semantic Query Optimization Techniques - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Semantic Query Optimization Techniques. November 16, 2005. By : Mladen Kovacevic. Background. 1980's, semantic information stored in dbs as integrity constraints could be used for query optimization semantic : “of or relating to meaning or the study of meaning”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Semantic Query Optimization Techniques' - elijah-donovan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Semantic query optimization techniques

Semantic Query Optimization Techniques

November 16, 2005

By : Mladen Kovacevic


Background
Background

  • 1980's, semantic information stored in dbs as integrity constraints could be used for query optimization

  • semantic: “of or relating to meaning or the study of meaning”

    (http://wordnet.princeton.edu)

  • integrity : preserve data consistency when changes made in db.

  • no extensive implementation existing today (1999)


Introduction
Introduction

  • Key factor in relational database system’s improvement in query execution time, is query optimization.

  • Query execution can be improved by:

    • Analyzing integrity information, and rewriting queries exploiting this information (JE & PI)

    • Avoid expensive sorting costs (Order Optimization)

    • Exploiting uniqueness by knowing rows will be unique, thus, avoiding extra sorts. (EU)


Presentation overview
Presentation Overview

  • Semantic Query Optimization techniques

    • Join Elimination (JE)

    • Predicate Introduction (PI)

    • Order Optimization (OO)

    • Exploiting Uniqueness (EU)


Some motivation
Some Motivation

  • Describing two techniques in SQO, demonstrated in DB2 UDB.

    • Predicate Introduction

    • Join Elimination

  • Reasons:

    • rewriting queries by hand showed that these two provided consistent optimization.

    • practical to implement

    • extendible to other DBMS’s.

  • Data sets used : TPC-D and APB-1 OLAP benchmarks

    onlyREFERENTIAL INTEGRITY constraints and CHECK CONSTRAINTS used!


Semantic query optimization sqo techniques
Semantic Query Optimization (SQO) Techniques

  • Join Elimination: Some joins need NOT be evaluated since the result may be known apriori (more on this later)

  • Join Introduction: Adding a join can help if relation is small compared to original relations and highly selective.

  • Predicate Elimination : If predicate known to be always true, can be eliminated from query (DISTINCT clause on Primary Key – Uniqueness exploitation!)

  • Predicate Introduction: New predicates on indexed attributes can result in a much faster access plan.

  • Detecting the Empty Answer Set : If query predicates inconsistent with integrity constraints, the query does not have answer.


Why sqo implementations not used
Why SQO implementations not used?

  • Deductive Databases : Many cases SQO techniques were designed for deductive databases, thus not appearing to be useful in relational database context.

  • CPU & I/O Speeds similar : When being developed, CPU & I/O speeds were not as dramatically different

    • (savings in I/O not worth the CPU time added)

  • Lack of Integrity Constraints : Thought that many integrity constraints are needed for SQO to be useful


Two stage optimizer
Two-stage Optimizer

  • Examples of SQO techniques always designed for a two-stage optimizer

    • Stage 1 : logically equivalent queries created (DB2’s query rewrite optimization)

    • Stage 2 : generate plans of all these queries, choosing the one with lowest estimated cost. (DB2’s query plan optimization)

      • Join order, join methods, join site in a distributed database, method for accessing input table, etc.


Join elimination
Join Elimination

  • Simple : Eliminate relation where join is over tables related through referential integrity constraint, and primary key table referenced only in the join

VIEW DEFINITION

CREATE VIEW Supplier_Info (n, a, c) as

SELECT s_name, s_address, n_name

FROM tpcd.supplier, tpcd.nation

WHERE s_nationkey = n_nationkey

QUERY

SELECT s_n, s_a

FROM Supplier_Info


Join elimination con t
Join Elimination (con’t)

  • Query can be rewritten internally as:

SELECT s_n, s_a

FROM tpcd.supplier

  • Why do such a simple rewrite?

    • User may not have access to the supplier table,

    • and/or may only know about the view.

    • Sometimes GUI managers create these “dumb”

    • queries so need to optimize

    • Non-programmers write queries often, and may

  • not even think about this.

  • Algorithm for generic redundant join removal

  • provided in paper.


  • Example join elimination
    Example – Join Elimination

    SELECT p_name, p_retailprice, s_name, s_address

    FROM tpcd.lineitem, tpcd.partsupp,

    tpcd.part, tpcd.supplier

    WHERE p_partkey = ps_partkey and

    s_suppkey = ps_suppkey and

    ps_partkey = l_partkey and

    ps_suppkey = l_suppkey and

    l_shipdate between '1994-01-01' and

    '1996-06-30' and l_discount >= 0.1

    GROUP BY p_name, p_retailprice, s_name, s_address

    ORDER BY p_name, s_name

    PART

    PARTKEY

    PARTSUPP

    LINEITEM

    PARTKEY

    PARTKEY

    SUPPKEY

    SUPPKEY

    SUPPLIER

    SUPPKEY

    1 – many relationship


    Example join elimination1
    Example : Join Elimination

    • Any immediate improvements that can be seen here?

    p_partkey = ps_partkey and

    s_suppkey = ps_suppkey and

    ps_partkey = l_partkey and

    ps_suppkey = l_suppkey

    PS_PARTKEY = L_PARTKEY

    P_PARTKEY

    PS_PARTKEY

    L_PARTKEY

    PS_PARTKEY = L_PARTKEY

    P_PARTKEY = PS_PARTKEY

    S_SUPPKEY

    PS_SUPPKEY

    L_SUPPKEY

    PS_SUPPKEY = L_SUPPKEY

    S_SUPPKEY = PS_SUPPKEY

    S_SUPPKEY = L_SUPPKEY


    Results
    Results

    • 100 MB db size

    • Execution Time : 58.5 sec -> 38.25 sec (35 % improvement)

    • I/O Cost: 4631 -> 1498 page reads (67 % improvement)


    Results olap environment
    Results – OLAP Environment

    • In OLAP (online analytical processing) servers, using a star schema (one fact table, with several dimension tables) improvements ranged from 2% to 96 %.

      • In these cases, much improvement came from CPU cost instead of I/O, because dimension tables were small enough to fit into memory...


    Predicate introduction
    Predicate Introduction

    • Techniques discussed :

      • Index Introduction : add new predicate on attribute if index exists on that attribute.

        • Assumption : index retrieval is better than table scan, is this always good?

    • Scan Reduction : reduce number of tuples that qualify for a join.

      • Problem : Not very common; unlikely that there will be any check constraints or predicates with inequalities about join columns

  • Detecting empty query answer set (not shown as query execution time essentially 0)


  • Example predicate introduction
    Example - Predicate Introduction

    SELECT sum(l_extendedprice * l_discount) as revenue

    FROM tpcd.lineitem

    WHERE l_shipdate >= date(‘1994-01-01’) and

    l_shipdate < date(‘1994-01-01’)+ 1 year and

    l_discount between .06 – 0.01 and

    .06 + 0.01 and l_quantity < 24;

    Check Constraint : l_shipdate <= l_receiptdate

    Index : l_receiptdate

    • Maintaining semantics, we can add :

      • l_receiptdate >= date(‘1994-01-01’)


    Example predicate introduction1
    Example - Predicate Introduction

    SELECT sum(l_extendedprice * l_discount) as revenue

    FROM tpcd.lineitem

    WHERE l_shipdate >= date(‘1994-01-01’) and

    l_shipdate < date(‘1994-01-01’)+ 1 year and

    l_receiptdate >= date(‘1994-01-01’) and

    l_discount between .06 – 0.01 and

    .06 + 0.01 and l_quantity < 24;

    Check Constraint : l_shipdate <= l_receiptdate

    Index : l_receiptdate

    • Maintaining semantics, we can add :

      • l_receiptdate >= date(‘1994-01-01’)

    • Why would we want to do this? In order to have optimizer choose a plan using the index. Is this always good?

    • NO! What if most of the rows in the table need to be returned? We should use a tablescan instead.


    Predicate introduction algorithm
    Predicate Introduction - Algorithm

    • Input : set of all check constraints defined for a database and the set of all predicates in query

    • Output: set of all non-redundant formulas derivable from the input set. This answer set can then be added to the query, but only a few are potentially useful.

      • The goal in the paper was to choose additions that would guarantee improvement.

  • Conditions in paper: Conservative approach of introducing predicates that will have the plan optimizer use an index. Insist on only one index available with the query predicate.




  • Why longer execution for p3 p5
    Why Longer Execution for P3/P5?

    • P2 and P3 are the same except for the following

      P2 :

      SELECT ...

      FROM ...

      WHERE l_shipdate >= date ('1998-09-01') and

      l_shipdate < date ('1998-09-01') + 1 month

      P3 :

      SELECT ...

      FROM ...

      WHERE l_shipdate >= date ('1995-09-01') and

      l_shipdate < date ('1995-09-01') + 1 month

    • Difference in table shows that P2 has 2 % of the tuples falling in the range while P3 has 48 % of the tuples fall in the category : BOTH plans will choose index scan! P3 is so large that tablescan is better in this case.

    • Cost model underestimates cost of locking/unlocking index pages

    • Estimated number of tuples goes down because of the reduction factor problem (multiply in the new predicate added)


    Adjustments for reduction factor problem
    Adjustments for Reduction Factor Problem

    • Add new predicate only when it contains a major column of an index and a scan of that index is sufficient to answer the query (thus, no table scan necessary)

    • Original Index : <receiptdate, discount, quantity, extendedprice>

    • New Index : <receiptdate, discount, quantity, extendedprice, shipdate, partkey, suppkey, orderkey>


    Order optimization techniques
    Order Optimization Techniques

    • Access plan strategies exploit the physical orderings provided either by indexes or sorting

    • GOAL: optimize the sorting strategy

    • Techniques

      • Pushing down sorts in joins

      • Minimizing the number of sorting columns

      • Detecting when sorting can be avoided because of predicates, keys or indexes

    • Order Optimization : detecting when indexes provide an interesting order, so that sorting can be either avoided, and used as sparingly as possible.

    • Interesting Orders : when the side effect of a join produces rows in sorted order, which can be taken advantage of later (if another join needed, ORDER BY, GROUP BY, DISTINCT)


    Fundamental operators
    Fundamental Operators

    • Order optimization requires the following operations

      • Reduce Order

      • Test Order

      • Cover Order

      • Homogenize Order



    Exploiting uniqueness
    Exploiting Uniqueness

    • Checking to see if query contains unnecessary DISTINCT clauses

      • How does this make improvements?

        • Removing duplicates is performed by SORTING, a costly operation.

    • Example is removing DISTINCT keyword from query if it is applied onto the primary key itself (since primary keys are, by definition, distinct)


    How to exploit uniqueness
    How to exploit uniqueness?

    • Using knowledge about:

      • Keys

      • Table Constraints

      • Query Predicates

    • Cannot always be tested efficiently, so we look for a sufficient solution.


    Summary
    Summary

    • Important Outcome : experimental evidence showing SQO can provide effective enhancement to the traditional query optimization.

      • Join Elimination : geared towards OLAP environment (where very useful)

        • Independent on existence of complex integrity constraint – semantic reasoning used about referential integrity constraints

        • Easy to implement and execute

    • Predicate Introduction : guaranteeing improvements more difficult, needing rather severe restrictions imposed (limits the applicability of this approach)

    • Order Optimization : utilizing functional dependencies and table information, we use it in creating a “smart” access plan, avoiding or optimizing sort operations.

    • Exploiting Uniqueness : uniqueness is powerful when it reduces the number of expensive sorts. Discovering true ways of exploiting this technique are quite tricky and specific.


    References
    References

    • Qi Cheng, Jarek Gryz, Fred Koo, et al: Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. Proceedings of the 25th VLDB Conference, Edinburg, Scotland,1999.

    • David E. Simmen, Eugene J. Shekita, Timothy Malkemus: FundamentalTechniques for Order Optimization. SIGMOD Conference 1996: 57-67

    • G. N. Paulley, Per-ke Larson: Exploiting Uniqueness in Query Optimization. ICDE 1994: 68-79



    ad