Foundations of database systems
Download
1 / 46

Foundations of Database Systems - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Foundations of Database Systems. Indexing Instructor: Zhijun Wang. DBMS and Applications. 1. Announcements. Quiz#2 will be given next week. DBMS and Applications. 2. Physical Database Design . DBMS and Applications. 3. File structures.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Foundations of Database Systems' - shiri


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Foundations of database systems

Foundations of Database Systems

Indexing

Instructor: Zhijun Wang

DBMS and Applications

1


Announcements

Announcements

Quiz#2 will be given next week

DBMS and Applications

2


Physical database design
Physical Database Design

DBMS and Applications

3


File structures
File structures

Selecting among alternative file structures is one of the most important choices in physical database design.

In order to choose intelligently, you must understand characteristics of available file structures.

DBMS and Applications

4


Sequential files
Sequential Files

Simplest kind of file structure

Unordered: insertion order

Ordered: key order

Simple to maintain

Provide good performance for processing large numbers of records

DBMS and Applications

5


Unordered sequential file
Unordered Sequential File

DBMS and Applications

6


Ordered sequential file
Ordered Sequential File

DBMS and Applications

7


Indexes
Indexes

Indexes are special data structures used to improve database performance

SQL Server automatically creates an index on all primary and foreign keys

Additional indexes may be assigned on other columns that are:

Frequently used in WHERE clauses

Used for sorting data

DBMS and Applications

8


Indexes1
Indexes

SQL Server supports two kinds of indexes:

Clustered index: the data are stored in the bottom level of the index and in the same order as that index

Nonclustered index: the bottom level of an index contains pointers to the data

Clustered indexes are faster than nonclustered indexes for updating and retrieval

DBMS and Applications

9


Creating an index by gui in enterprise manager
Creating an Index: By GUI in Enterprise Manager

DBMS and Applications

10


Hash files
Hash Files

Support fast access unique key value

Converts a key value into a physical record address

Mod function: typical hash function

Divisor: large prime number close to the file capacity

Physical record number: hash function plus the starting physical record number

DBMS and Applications

11


Example hash function calculations for stdssn key
Example: Hash Function Calculations for StdSSN Key

Assume the file capacity is 100, 97 is the biggest prime number

less than to 100.

Physical

Record

DBMS and Applications

12


Hash file after insertions
Hash File after Insertions

DBMS and Applications

13



Multi way tree btrees files
Multi-Way Tree (Btrees) Files

A popular file structure supported by most DBMSs.

Btree provides good performance on both sequential search and key search.

DBMS and Applications

15


Properties of btrees
Properties of Btrees

Balanced: all leaf nodes reside on the same level

Bushy: the number of branches is large, 50-200

Block-oriented: each node in a tree is a block

Dynamic: the shape of a Btree is changed as logic records are inserted and deleted

Ubiquitous: the Btree is a widely implemented and used file strcture.

DBMS and Applications

16


Structure of a btree of height 3
Structure of a Btree of Height 3

DBMS and Applications

17


Btree node containing keys and pointers
Btree Node Containing Keys and Pointers

DBMS and Applications

18


Btree insertion examples
Btree Insertion Examples

DBMS and Applications

19


Btree deletion examples
Btree Deletion Examples

DBMS and Applications

20


B tree the most widely used index
B+ Tree: The Most Widely-used Index

Index Entries

(Direct search)

Data Entries

("Sequence set")

  • Supports equality and range-searches efficiently.

  • Minimum 50% occupancy (except for root).

    • Each node contains d <= m <= 2d entries.

    • The parameter d is called the order of the tree.

DBMS and Applications

21


Root

33

24

14

19

39*

22*

24*

27*

38*

3*

5*

19*

20*

29*

33*

34*

2*

7*

14*

16*

An Example of a B+ Tree

  • Search begins at root, and key comparisons direct it to a leaf.

  • Search for 5*, 15*, all data entries >= 24* ...

  • Based on the search for 15* in the appropriate leaf, we know it is not in the tree!

DBMS and Applications

22


Inserting a data entry into a b tree
Inserting a Data Entry into a B+ Tree

  • Find correct leaf L.

  • Put data entry onto L.

    • If L has enough space, done!

    • Else, must splitL (into L and a new node L2)

      • Redistribute entries evenly, copy upmiddle key.

      • Insert index entry (into parent of L) pointing to L2.

  • This can happen recursively

    • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.)

  • Splits “grow” tree; root split increases height.

    • Tree growth: gets wider or one level taller at top.

  • DBMS and Applications

    23


    Entry to be inserted in parent node.

    Entry to be inserted in parent node.

    (Note that 19 is pushed up and

    Only appears once in the index.

    Contrast this with a leaf split.)

    (Note that 5 is

    s copied up and

    5

    continues to appear in the leaf.)

    19

    5*

    3*

    7*

    2*

    8*

    24

    33

    5

    14

    Inserting 8* into Example B+ Tree

    • Observe how minimum occupancy is guaranteed in both leaf and index pg splits.

    • Note difference between copy-upand push-up; be sure you understand the reasons for this.

    DBMS and Applications

    24


    Root

    19

    24

    33

    5

    14

    39*

    2*

    3*

    19*

    20*

    22*

    24*

    27*

    38*

    5*

    7*

    8*

    29*

    33*

    34*

    14*

    16*

    Example B+ Tree After Inserting 8*

    • Notice that root was split, leading to increase in height.

    • In this example, we can avoid split by re-distributing entries;

      • however, this is usually not done in practice.

    DBMS and Applications

    25


    Deleting a Data Entry from a B+ Tree

    • Start at root, find leaf L where entry belongs.

    • Remove the entry.

      • If L is at least half-full, done!

      • If L has only d-1 entries,

        • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

        • If re-distribution fails, mergeL and sibling.

    • If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

    • Merge could propagate to root, decreasing height.

    DBMS and Applications

    26


    Root

    19

    27

    33

    5

    14

    39*

    2*

    3*

    19*

    24*

    27*

    29*

    38*

    5*

    7*

    8*

    33*

    34*

    14*

    16*

    Example Tree After (Inserting 8*, Then) Deleting 20* and 22* ...

    • Deleting 20* is easy.

    • Deleting 22* is done with re-distribution.

      • Notice how middle key (27) is copied up.

    DBMS and Applications

    27


    And then deleting 24
    ... And Then Deleting 24*

    Root

    5

    14

    19

    33

    33

    39*

    22*

    27*

    38*

    29*

    33*

    34*

    39*

    3*

    19*

    38*

    2*

    5*

    7*

    8*

    27*

    33*

    34*

    29*

    14*

    16*

    • Must merge leaf nodes.

    • Observe ‘toss’of index entry 27 (on right), and ‘pull down’of index entry 19 (below).

    DBMS and Applications

    28


    Example of non leaf re distribution
    Example of Non-leaf Re-distribution

    2*

    3*

    39*

    5*

    7*

    8*

    38*

    17*

    18*

    20*

    21*

    22*

    27*

    29*

    33*

    34*

    14*

    16*

    Root

    22

    33

    17

    20

    5

    14

    • Tree is shown below during deletion of 24*.

    • In contrast to previous example, can re-distribute entry from left child of root to right child.

    DBMS and Applications

    29


    After re distribution
    After Re-distribution

    Root

    17

    22

    33

    5

    13

    20

    2*

    3*

    39*

    5*

    7*

    8*

    38*

    17*

    18*

    20*

    21*

    22*

    27*

    29*

    33*

    34*

    14*

    16*

    • Intuitively, entries are re-distributed by ‘pushingthrough’the splitting entry in the parent node.

    • It suffices to re-distribute index entry with key 20; we’ve re-distributed 17 as well for illustration.

    DBMS and Applications

    30


    Cost of operations
    Cost of Operations

    The height of Btree dominates the number of physical record accesses operation.

    Logarithmic search cost

    Upper bound of height: log function’

    Log base: minimum number of keys in a node

    The cost to insert a key = [the cost to locate the nearest key] + [the cost to change nodes].

    DBMS and Applications

    31


    B tree
    B+Tree

    Provides improved performance on sequential and range searches.

    In a B+tree, all keys are redundantly stored in the leaf nodes.

    To ensure that physical records are not replaced, the B+tree variation is usually implemented.

    DBMS and Applications

    32


    Index matching
    Index Matching

    Determining usage of an index for a query

    Complexity of condition determines match.

    Single column indexes: =, <, >, <=, >=, IN <list of values>, BETWEEN, IS NULL, LIKE ‘Pattern’ (meta character not the first symbol)

    Composite indexes: more complex and restrictive rules

    DBMS and Applications

    33


    Bitmap index
    Bitmap Index

    Can be useful for stable columns with few values

    Bitmap:

    String of bits: 0 (no match) or 1 (match)

    One bit for each row

    Bitmap index record

    Column value

    Bitmap

    DBMS converts bit position into row identifier.

    DBMS and Applications

    34


    Bitmap index example
    Bitmap Index Example

    Bitmap Index on FacRank

    Faculty Table

    DBMS and Applications

    35


    Bitmap join index
    Bitmap Join Index

    Bitmap identifies rows of a related table.

    Represents a precomputed join

    Can define for a join column or a non-join column

    Typically used in query dominated environments such as data warehouses

    DBMS and Applications

    36


    Summary of file structures
    Summary of File Structures

    DBMS and Applications

    37


    Index selection
    Index Selection

    Most important decision

    Difficult decision

    Choice of clustered and nonclustered indexes

    DBMS and Applications

    38


    Clustering index example
    Clustering Index Example

    DBMS and Applications

    39


    Nonclustering index example
    Nonclustering Index Example

    DBMS and Applications

    40



    Trade offs in index selection
    Trade-offs in Index Selection

    Balance retrieval against update performance

    Nonclustering index usage:

    Few rows satisfy the condition in the query

    Join column usage if a small number of rows result in child table

    Clustering index usage:

    Larger number of rows satisfy a condition than for nonclustering index

    Use in sort merge join algorithm to avoid sorting

    More expensive to maintain

    DBMS and Applications

    42


    Difficulties of index selection
    Difficulties of Index Selection

    Application weights are difficult to specify.

    Distribution of parameter values needed

    Behavior of the query optimization component must be known.

    The number of choices is large.

    Index choices can be interrelated.

    DBMS and Applications

    43


    Selection rules
    Selection Rules

    Rule 1: A primary key is a good candidate for a clustering index.

    Rule 2: To support joins, consider indexes on foreign keys.

    Rule 3: A column with many values may be a good choice for a non-clustering index if it is used in equality conditions.

    Rule 4: A column used in highly selective range conditions is a good candidate for a non-clustering index.

    DBMS and Applications

    44


    Selection rules1
    Selection Rules

    Rule 5: A frequently updated column is not a good index candidate.

    Rule 6: Volatile tables (lots of insertions and deletions) should not have many indexes.

    Rule 7: Stable columns with few values are good candidates for bitmap indexes if the columns appear in WHERE conditions.

    Rule 8: Avoid indexes on combinations of columns. Most optimization components can use multiple indexes on the same table.

    DBMS and Applications

    45


    Index creation
    Index Creation

    To create the indexes, the CREATE INDEX statement can be used.

    The word following the INDEX keyword is the name of the index.

    CREATE INDEX is not part of SQL:1999.

    Example:

    DBMS and Applications

    46


    ad