foundations of database systems
Download
Skip this Video
Download Presentation
Foundations of Database Systems

Loading in 2 Seconds...

play fullscreen
1 / 46

Foundations of Database Systems - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Foundations of Database Systems. Indexing Instructor: Zhijun Wang. DBMS and Applications. 1. Announcements. Quiz#2 will be given next week. DBMS and Applications. 2. Physical Database Design . DBMS and Applications. 3. File structures.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Foundations of Database Systems' - shiri


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
foundations of database systems

Foundations of Database Systems

Indexing

Instructor: Zhijun Wang

DBMS and Applications

1

announcements

Announcements

Quiz#2 will be given next week

DBMS and Applications

2

physical database design
Physical Database Design

DBMS and Applications

3

file structures
File structures

Selecting among alternative file structures is one of the most important choices in physical database design.

In order to choose intelligently, you must understand characteristics of available file structures.

DBMS and Applications

4

sequential files
Sequential Files

Simplest kind of file structure

Unordered: insertion order

Ordered: key order

Simple to maintain

Provide good performance for processing large numbers of records

DBMS and Applications

5

unordered sequential file
Unordered Sequential File

DBMS and Applications

6

ordered sequential file
Ordered Sequential File

DBMS and Applications

7

indexes
Indexes

Indexes are special data structures used to improve database performance

SQL Server automatically creates an index on all primary and foreign keys

Additional indexes may be assigned on other columns that are:

Frequently used in WHERE clauses

Used for sorting data

DBMS and Applications

8

indexes1
Indexes

SQL Server supports two kinds of indexes:

Clustered index: the data are stored in the bottom level of the index and in the same order as that index

Nonclustered index: the bottom level of an index contains pointers to the data

Clustered indexes are faster than nonclustered indexes for updating and retrieval

DBMS and Applications

9

hash files
Hash Files

Support fast access unique key value

Converts a key value into a physical record address

Mod function: typical hash function

Divisor: large prime number close to the file capacity

Physical record number: hash function plus the starting physical record number

DBMS and Applications

11

example hash function calculations for stdssn key
Example: Hash Function Calculations for StdSSN Key

Assume the file capacity is 100, 97 is the biggest prime number

less than to 100.

Physical

Record

DBMS and Applications

12

hash file after insertions
Hash File after Insertions

DBMS and Applications

13

multi way tree btrees files
Multi-Way Tree (Btrees) Files

A popular file structure supported by most DBMSs.

Btree provides good performance on both sequential search and key search.

DBMS and Applications

15

properties of btrees
Properties of Btrees

Balanced: all leaf nodes reside on the same level

Bushy: the number of branches is large, 50-200

Block-oriented: each node in a tree is a block

Dynamic: the shape of a Btree is changed as logic records are inserted and deleted

Ubiquitous: the Btree is a widely implemented and used file strcture.

DBMS and Applications

16

structure of a btree of height 3
Structure of a Btree of Height 3

DBMS and Applications

17

btree insertion examples
Btree Insertion Examples

DBMS and Applications

19

btree deletion examples
Btree Deletion Examples

DBMS and Applications

20

b tree the most widely used index
B+ Tree: The Most Widely-used Index

Index Entries

(Direct search)

Data Entries

("Sequence set")

  • Supports equality and range-searches efficiently.
  • Minimum 50% occupancy (except for root).
    • Each node contains d <= m <= 2d entries.
    • The parameter d is called the order of the tree.

DBMS and Applications

21

slide22

Root

33

24

14

19

39*

22*

24*

27*

38*

3*

5*

19*

20*

29*

33*

34*

2*

7*

14*

16*

An Example of a B+ Tree

  • Search begins at root, and key comparisons direct it to a leaf.
  • Search for 5*, 15*, all data entries >= 24* ...
  • Based on the search for 15* in the appropriate leaf, we know it is not in the tree!

DBMS and Applications

22

inserting a data entry into a b tree
Inserting a Data Entry into a B+ Tree
  • Find correct leaf L.
  • Put data entry onto L.
    • If L has enough space, done!
    • Else, must splitL (into L and a new node L2)
      • Redistribute entries evenly, copy upmiddle key.
      • Insert index entry (into parent of L) pointing to L2.
  • This can happen recursively
      • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.)
  • Splits “grow” tree; root split increases height.
    • Tree growth: gets wider or one level taller at top.

DBMS and Applications

23

slide24

Entry to be inserted in parent node.

Entry to be inserted in parent node.

(Note that 19 is pushed up and

Only appears once in the index.

Contrast this with a leaf split.)

(Note that 5 is

s copied up and

5

continues to appear in the leaf.)

19

5*

3*

7*

2*

8*

24

33

5

14

Inserting 8* into Example B+ Tree

  • Observe how minimum occupancy is guaranteed in both leaf and index pg splits.
  • Note difference between copy-upand push-up; be sure you understand the reasons for this.

DBMS and Applications

24

slide25

Root

19

24

33

5

14

39*

2*

3*

19*

20*

22*

24*

27*

38*

5*

7*

8*

29*

33*

34*

14*

16*

Example B+ Tree After Inserting 8*

  • Notice that root was split, leading to increase in height.
  • In this example, we can avoid split by re-distributing entries;
    • however, this is usually not done in practice.

DBMS and Applications

25

slide26

Deleting a Data Entry from a B+ Tree

  • Start at root, find leaf L where entry belongs.
  • Remove the entry.
    • If L is at least half-full, done!
    • If L has only d-1 entries,
      • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).
      • If re-distribution fails, mergeL and sibling.
  • If merge occurred, must delete entry (pointing to L or sibling) from parent of L.
  • Merge could propagate to root, decreasing height.

DBMS and Applications

26

slide27

Root

19

27

33

5

14

39*

2*

3*

19*

24*

27*

29*

38*

5*

7*

8*

33*

34*

14*

16*

Example Tree After (Inserting 8*, Then) Deleting 20* and 22* ...

  • Deleting 20* is easy.
  • Deleting 22* is done with re-distribution.
    • Notice how middle key (27) is copied up.

DBMS and Applications

27

and then deleting 24
... And Then Deleting 24*

Root

5

14

19

33

33

39*

22*

27*

38*

29*

33*

34*

39*

3*

19*

38*

2*

5*

7*

8*

27*

33*

34*

29*

14*

16*

  • Must merge leaf nodes.
  • Observe ‘toss’of index entry 27 (on right), and ‘pull down’of index entry 19 (below).

DBMS and Applications

28

example of non leaf re distribution
Example of Non-leaf Re-distribution

2*

3*

39*

5*

7*

8*

38*

17*

18*

20*

21*

22*

27*

29*

33*

34*

14*

16*

Root

22

33

17

20

5

14

  • Tree is shown below during deletion of 24*.
  • In contrast to previous example, can re-distribute entry from left child of root to right child.

DBMS and Applications

29

after re distribution
After Re-distribution

Root

17

22

33

5

13

20

2*

3*

39*

5*

7*

8*

38*

17*

18*

20*

21*

22*

27*

29*

33*

34*

14*

16*

  • Intuitively, entries are re-distributed by ‘pushingthrough’the splitting entry in the parent node.
  • It suffices to re-distribute index entry with key 20; we’ve re-distributed 17 as well for illustration.

DBMS and Applications

30

cost of operations
Cost of Operations

The height of Btree dominates the number of physical record accesses operation.

Logarithmic search cost

Upper bound of height: log function’

Log base: minimum number of keys in a node

The cost to insert a key = [the cost to locate the nearest key] + [the cost to change nodes].

DBMS and Applications

31

b tree
B+Tree

Provides improved performance on sequential and range searches.

In a B+tree, all keys are redundantly stored in the leaf nodes.

To ensure that physical records are not replaced, the B+tree variation is usually implemented.

DBMS and Applications

32

index matching
Index Matching

Determining usage of an index for a query

Complexity of condition determines match.

Single column indexes: =, <, >, <=, >=, IN <list of values>, BETWEEN, IS NULL, LIKE ‘Pattern’ (meta character not the first symbol)

Composite indexes: more complex and restrictive rules

DBMS and Applications

33

bitmap index
Bitmap Index

Can be useful for stable columns with few values

Bitmap:

String of bits: 0 (no match) or 1 (match)

One bit for each row

Bitmap index record

Column value

Bitmap

DBMS converts bit position into row identifier.

DBMS and Applications

34

bitmap index example
Bitmap Index Example

Bitmap Index on FacRank

Faculty Table

DBMS and Applications

35

bitmap join index
Bitmap Join Index

Bitmap identifies rows of a related table.

Represents a precomputed join

Can define for a join column or a non-join column

Typically used in query dominated environments such as data warehouses

DBMS and Applications

36

summary of file structures
Summary of File Structures

DBMS and Applications

37

index selection
Index Selection

Most important decision

Difficult decision

Choice of clustered and nonclustered indexes

DBMS and Applications

38

clustering index example
Clustering Index Example

DBMS and Applications

39

nonclustering index example
Nonclustering Index Example

DBMS and Applications

40

trade offs in index selection
Trade-offs in Index Selection

Balance retrieval against update performance

Nonclustering index usage:

Few rows satisfy the condition in the query

Join column usage if a small number of rows result in child table

Clustering index usage:

Larger number of rows satisfy a condition than for nonclustering index

Use in sort merge join algorithm to avoid sorting

More expensive to maintain

DBMS and Applications

42

difficulties of index selection
Difficulties of Index Selection

Application weights are difficult to specify.

Distribution of parameter values needed

Behavior of the query optimization component must be known.

The number of choices is large.

Index choices can be interrelated.

DBMS and Applications

43

selection rules
Selection Rules

Rule 1: A primary key is a good candidate for a clustering index.

Rule 2: To support joins, consider indexes on foreign keys.

Rule 3: A column with many values may be a good choice for a non-clustering index if it is used in equality conditions.

Rule 4: A column used in highly selective range conditions is a good candidate for a non-clustering index.

DBMS and Applications

44

selection rules1
Selection Rules

Rule 5: A frequently updated column is not a good index candidate.

Rule 6: Volatile tables (lots of insertions and deletions) should not have many indexes.

Rule 7: Stable columns with few values are good candidates for bitmap indexes if the columns appear in WHERE conditions.

Rule 8: Avoid indexes on combinations of columns. Most optimization components can use multiple indexes on the same table.

DBMS and Applications

45

index creation
Index Creation

To create the indexes, the CREATE INDEX statement can be used.

The word following the INDEX keyword is the name of the index.

CREATE INDEX is not part of SQL:1999.

Example:

DBMS and Applications

46

ad