- By
**xia** - Follow User

- 392 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Chapter 9 Multilevel Indexing and B-Trees' - xia

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Chapter 9Multilevel Indexing and B-Trees

CIS 402:File Management Techniques

9.1 Introduction : Invention of the B-Tree

Introduction: The Invention of the B-tree- 1972 Acta Infomatica : R. Bayer and E. McCreight (at Boeing Corporation) “Organization and Maintenance of Large Ordered Indexes”
- 1979 : ‘de facto’ standard for database index
- D.Comer “The Ubiquitous B-tree” ACM Computing Survey
- Why the name B-tree?
- Balanced, Bushy, Broad, Boeing, Bayer
- Retrieval, Insertion, Deletion time

= log KI ( I : no of indexes in file, K : no of indexes in a page)

- Excellent for dynamically changing random access files

Statement of the Problem

- Problems in an index on secondary storage
- Searching the index must be faster than binary searching
- In binary search:

15 items - 4 seeks, 1,000 items - 9.5 seeks

- Insertion and deletion must be as fast as search
- inserting a key may involve moving many other keys in some file structures

9.3 Indexing with Binary Search Trees

Binary Search Tree- Advantages
- Data may not be physically sorted
- Good performance on balanced tree
- Insert cost = search cost
- Disadvantages
- In out-of-balance binary tree, more seeks are required

9.3 Indexing with Binary Search Trees

At most 4 seeks/one record

KF

Binary search tree

representation

FB

SD

HN

WS

CL

PA

AX

DE

FT

JD

NR

RF

TK

YJ

Balanced Binary Search Tree- Sorted list of keys
- AX, CL, DE, FB, FT, HN, JD, KF, NR, PA, RF, SD, TK, YJ

9.3 Indexing with Binary Search Trees

KF

FB

SD

HN

CL

WS

PA

DE

FT

JD

AX

NR

RF

TK

LV

LA

NP

MB

ND

NK

Unbalanced Binary Tree- At most 9 seeks/one record

YJ

- Worst case : sequential search

9.3 Indexing with Binary Search Trees

Paged Binary Tree- Page
- A unit of disk I/O for handling seek and transfer of disk data
- Typically, 4k, 8k, 16k ...
- Paged Binary Tree
- Divide a binary tree into pages and then store each page in a block of contiguous locations on disk.
- If every page holds 7 keys, 511 nodes(keys) in only three seeks
- Performance : # of seeks
- A completely full balanced tree : log2 (N+1)
- A completely full paged tree : log(k+1) (N+1)
- (k : # of keys hold in a single page)

9.3 Indexing with Binary Search Trees

Paged Binary Tree

Each leaf below itself points to a page 64 more pages, 512 total keys

9.3 Indexing with Binary Search Trees

The Problem with Paged Trees- Only valid when we have the entire set of keys in hand before the tree is built
- Main Problem: Easily out of balance
- How to select a good separator
- How to group keys
- How to guarantee the maximum loading
- B-tree provides a solution for above problems!

9.3 Indexing with Binary Search Trees

random input sequence :C S D T A M P I B W N G U R K E H O L J Y Q Z F X V

D

C

S

A

M

U

I

T

W

P

B

G

K

N

R

V

Y

E

H

J

L

O

Q

X

Z

F

Paged Binary Tree (Out of balance)

9.4 Multilevel Indexing : A Better Approach to Tree Indexes

Multilevel Indexing- Approach as simple index record
- limited on the number of keys allowed
- Approach as multirecord index
- consists of a sequence of simple index records
- binary search is too expensive
- Approach as multilevel index
- reduced the number of records to be searched
- speed up the search

<example>

80Mbytes file of 8,000,000 records

10-byte keys

9.4 Multilevel Indexing : A Better Approach to Tree Indexes

Example of Multilevel Indexing4th level index

1

a single index record with 8 keys

1 2 8

3rd level index

1

2

:

:

8

1 2 . . . 100

:

:

801 800

8 index records to index the largest keys

in the 800 second-level records

2nd level index

1

2

:

:

9

:

:

:

800

1 2 . . . 100

:

:

901 1000

:

:

7901 8000

800 index records with 80,000 keys

choose one of the keys in each index record as the

key of that whole record

Overhead: 809 records to index 80000

Search Time: 3 disk reads

(keep top level in memory)

Lowest level index (1st level) is an index to data file and its

reference fields are record addresses in the data file

Multi-level Indexing

- How can we insert new keys into the multilevel index?
- The indexing records on some level can get full
- Several levels of indexes might need to be be rebuilt
- Overflow chain may be helpful, but still ugly
- In the example, the index in memory (level 4) has only 8 of 100 slots in use.
- In ANSI C++, the STL provides dynamic lists, solving many problems detailed in the text, which predates the STL
- Even with the STL, B-trees still provide a perfect data structure to hold the multi-level indexes

9.5 B-Trees:Working up from the bottom

B-Trees: Working up from the bottom- Bayer and McCreight, 1972, Acta Infomatica
- Build trees upward from the bottom instead of downward from the top
- Each node of a B-tree is an index record which consists of “key-reference” pairs
- Order: maximum number of key-reference pairs in a node
- Each node of a B-Tree represents a page of memory

Formal Definition of B-Tree Properties

In a B-Tree of order m,

- Every page has a maximum of m descendants
- Every page, except for the root and leaves, has at least m/2 descendants.
- The root has at least two descendants (unless it is a leaf).
- All the leaves appear on the same level.
- The leaf level will form a complete, ordered index of the associated data file.

How do B-Trees work?

- Each node of a B-Tree:
- is an Index Record.
- ideally is stored in a single page of memory
- has the same maximum number of key-ref. pairs (order)
- has a minimum number of key-ref pairs, typically order/2
- ordinarily has one more link pointer than data slots
- for indexing, in the text, each node in any level above the leaves has one pointer for each datum
- each datum on these levels represents a reference to another index and represents the largest value in that lower index
- We will cover regular B-Trees
- Discuss how to work with one less pointer
- Rightmost entry in rightmost node in each level is largest
- never take rightmost-child

B-Tree Animations

- Some interesting animations on the web:

http://sky.fit.qut.edu.au/~maire/baobab/baobab.html

http://www.cs.tcd.ie/Jeremy.Jones/vivio/trees/B-tree.htm

- Internet Explorer only
- has tutorial of sorts

http://www.geocities.com/SiliconValley/Program/2864/File/btree.html

3

2

1

2

Storing B-Trees in a File- To store a B-Tree, must place in file in a manner in which the tree can be reconstructed.
- Solution: Store each node in a list like this...

Links

Data

Node

Height

#keys

Node link points at

Brown

Mills

Actual B-Tree RepresentedPage 1

Page 3

Page 2

Davis

Note that it is ordered (of course) and of order 3

Why not just use a Binary Search Tree?

- Problems:
- Unbalanced tree requires too many looks
- Nodes can’t be made to match page size in any case
- Two pointers per node; can’t do multi-level indexing

9.8 B-Tree Methods Search, Insert, and Others

Algorithm for Search- Searching procedure
- iterative
- work in two stages

operating alternatively on entire pages (Class BTree)

and then within pages (Class BTreeNode)

- Step1: Loading a page into memeory
- Step 2: Searching through a page, looking for the key along the tree until it reaches the leaf level

B-Tree Insertion: Overview

- When inserting a new key into an index record that is not full, update that record
- When inserting a new key into an index record that is full, split the record in two, with half of the keys in each half.
- The largest key of the split record is promoted
- this may cause a new recursive split.
- With indexes, promotion is of a key, not, an index record
- the promoted key thus appears on at least 2 levels

9.6 Example of Creating a B-Tree

B-Tree Insertion:Splitting & Promoting- Splitting
- Creation of two nodes out of one because the original node becomes overfull
- Result in the need to promote a key to a higher-level node to provide an index separating the two new nodes
- Promotion of a key
- Movement of a key from one node into a higher-level node when split occurs
- Again, the key rises, but not the direct reference

B-tree Insertion : Basic Facts

- Major components of insertion
- Split the node
- Promote the middle key
- Increase the height of the B-tree
- bottom-up
- Insertion may touch no more than 2 nodes per level
- Insertion cost is strictly linear in the height of the tree

9.8 B-Tree Methods Search, Insert, and Others

Algorithm for Insertion: Prelim- Observations of Insertion, Splitting, and Promotion
- proceed all the way down to the leaf level
- after finding the insertion location at the leaf level, the work proceeds upward from the bottom
- Iterative procedure as having three phases
- Search to the leaf level, using FindLeaf method
- Insertion, overflow detection, and splitting on the upward path (recursive, working up)
- Creation of a new root node, if the current root was split

9.8 B-Tree Methods Search, Insert, and Others

Algorithm for Insertion- With no redistribution
- Locate node on bottom most level in which to insert record. Location is determined by key search.
- If vacant record slot is available, insert the record so that key sequencing is maintained. Then, update the pointer associated with the record (Pointer is null for level 0 records). Then Stop!
- If no vacant record slot exists, identify median record. All records and pointers to the left of the median records are stored in one node (the original) and those to the right are stored in another node(the new node).

9.8 B-Tree Methods Search, Insert, and Others

Algorithm for Insertion (con’t)- (Step 4) If the topmost node was split, create a new topmost node which contains the median record identified in Step 3, filled with pointers to the original and split nodes. Update the root node to point to the new topmost node. Then Stop!
- (Step 5) If topmost node was not split, prepare to insert median record identified in Step 3 and a pointer to the new node (created in Step 3). Then Goto Step 2.
- Step 4 increases height of B-tree by 1 level

9.8 B-Tree Methods Search, Insert, and Others

B-Tree for File Indexing Insertion ExampleInsert 1

Insert 3

Insert 19,4,20

4 is on two levels!

split

4

3

3

20

4

20

19

2

0

0

1

3

19

4

20

0

1

Insert 13,16

Insert 9

4

20

4

16

20

2

split

2

1

3

13

4

16

20

19

1

19

3

9

20

4

13

16

0

1

3

0

1

B-Tree Nomenclature

- Be aware that terms are not uniform in the literature
- Definitions are also quite different
- In fact, there are a number of B-tree variations
- The course text uses “B tree” to describe what is a B+ tree in other books
- The course text’s “B+ tree” is a B+ tree with a linked list of sorted data blocks

Our Book

B+-Tree

with

Linked List

B+-Tree

Root

C

G

I

A

B

C

E

F

G

H

I

Data Block

Data Block

Data Block

Data Block

Another aspect (node structures) Homogeneous Trees :B-Tree in other text

- Homogeneous trees -leaf nodes and interior nodes have same structures; Each contains both data pointers and tree pointers
- Average search length less for homogeneous trees, because some searches may conclude before reaching a leaf node

23 pointers to 23 records in data file (4 of 23 records shown)

37

64

45

53

85

91

8

23

1

7

14

20

27

36

70

80

88

95

38

40

50

52

60

Another Aspect (node structures) Heterogeneous Trees :B+-Tree in other text

- Heterogeneous trees - leaf nodes and interior nodes have different structures

23 pointers to 23 records in data file

37

64

45

53

85

91

14

23

1

7

8

14

20

23

27

36

64

70

91

95

80

85

88

37

38

40

45

50

52

53

60

Topic

B-Tree

B

-Tree

Algorithm Complexity

Rather complex

more simple

for insertion

Retrieval

less efficiency

more efficient

efficiency

(B-tree is tall &

B+-tree is short

spindly)

& bushy

Storage

slightly more

less efficient

efficiency

efficient

(more space)

(less space)

1-pass structure

rather complex

simple

creation algorithms

Comparison of B-Tree and B+-Tree in other text

9.10 Formal Definition of B-Tree Properties

Formal Definition of B-Tree Properties** Properties of a B-tree of order m (page==node)

- Every page has a maximum of m descendants
- Every page, except for the root and the leaves, has at least (m/2) descendants
- The root has at least two descendants (unless it is a leaf)
- All leaves appear on the same level
- The leaf level forms a complete, ordered index of the associated data file

Worst-case Search Depth

- Search depth : full depth of the tree
- Worst case occurs
- When every page of the tree has only the minimum # of descendants
- Maximal height with a minimum breadth

9.12 Deletion, Merging, and Redistribution

Deletion, Redistribution, and Concatenation- Must ensure that the B-tree properties are maintained after a deletion
- Algorithm (with redistribution and cocatenation)
- If the key to be deleted is not in a leaf,

swap it with its immediate successor, which is in a leaf

(might be redistributed or concatenated!)

- Delete the key

9.12 Deletion, Merging, and Redistribution

Deletion Algorithm(Cont’d)- If underflow occurs

(the leaf now contains one too few keys),

- If the left or right sibling has more than the minimum number of keys , redistribute
- Otherwise, concatenate the two leaves and the median key from the parent into one leaf
- Apply above step 3 to the parent as if it were deleted

9.12 Deletion, Merging, and Redistribution

Redistribution- Occurs when a sibling has more than the minimum # of keys
- Idea: Move keys between siblings to balance tree
- Results in change of the key in the parent page
- Does not propagate : strictly local effects
- How many keys should be moved?
- Not necessarily fixed
- Even distribution is desired

Redistribution (con’t)

- Must fix a shortage.
- The possibilities:
- Merge sibling nodes and move the center element up
- Move an element up from the left sibling into the root, and an element from the root on down
- Move an element up from the right sibling into the root, and an element from the root on down

9.12 Deletion, Merging, and Redistribution

Concatenation (merge)- Occurs in case of underflow (page less than ½-full)
- Combine the two partial pages and the correct key from the parent page make a single full page
- Reverses splitting
- Concatenation must involve demotion of keys :
- may cause underflow in the parent page
- The effects propagate upward
- The depth of the tree may be reduced by one.

9.12 Deletion, Merging, and Redistribution

Deletion Examples

Figure A

P

Z

I

P

G

I

M

X

Z

D

T

B

C

K

L

M

R

S

T

Z

A

D

J

Q

Y

E

G

I

O

P

V

W

X

F

H

N

U

9.12 Deletion, Merging, and Redistribution

Deletion Example 1Removal of key C from figureA:

Change occurs only in leaf node

P

Z

I

B

C

A

D

P

G

I

M

X

Z

D

T

B

D

K

L

M

R

S

T

Z

A

J

Q

Y

E

G

I

O

P

V

W

X

F

H

N

U

9.12 Deletion, Merging, and Redistribution

Deletion Example 2

Result of deleting P from figure A

: P changes to O in the second

level and the root

O

Z

I

O

F

I

M

X

Z

D

T

B

C

K

L

M

R

S

T

Z

A

D

J

Q

Y

E

G

I

O

V

W

X

F

H

N

U

9.12 Deletion, Merging, and Redistribution

Deletion Example 3

Result of deleting H from figure A :

Removal of H caused an underflow,

and two leaf nodes were merged

P

Z

I

P

I

M

X

Z

D

T

B

C

K

L

M

R

S

T

Z

A

D

J

Q

Y

E

G

I

O

P

V

W

X

F

N

U

Redistribution during Deletion

- A way to improve storage utilization
- need to do this when rightmost key is deleted
- rightmost key occurs in parent node, too!
- Tends to make an efficient B-tree in terms of space utilization
- Worst case : around 50%
- Average case : 67 ~ 69%

0

M

DELETE J

(No change)

1

Q U

D H

R S

A C

E F

I J K

N O P

V W X Y Z

DELETE M

(Swap with N)

0

M N

1

Q U

D H

MN O P

A C

E F

I K

V W X Y Z

O P

C D E F

NOW UNDERFLOW

PROPAGATES UPWARD!

0

N

1

Q W

H

O P

C D E F

S U V

I K

X Y Z

HEIGHT OF THE TREE

DECREASED

H N Q W

I K

X Y Z

S U V

What if Q W block was Q W Wa Wb?

B* Trees

- Knuth, 1973, Addison-Wesley
- Use redistribution operation during insertion
- Perform two-to-three split
- When split, the page has at least one sibling that is also full
- After split, the pages are about 2/3 full

9.15 Buffering of Pages:Virtual B-Trees

Buffering of B-tree pages: Virtual B-Trees- B-tree size >> main memory (in practice)
- Need buffering pages of B-tree
- Better to keep the root page in the main memory
- Need to find a way to minimize disk access
- root in memory: starting point
- can keep some other pages in memory, too!
- hope the next page we need is in memory
- Buffer replacement algorithm:
- When to fetch/purge/maintain?
- LRU + page height weighting factor
- Least recently used pages may be least likely to be accessed
- Higher level pages are more likely to be accessed

9.15 Buffering of Pages:Virtual B-Trees

Placement of Information associated with the Key- How to store associated information
- In a data and index mingled file
- Once the key is found, no more disk access required
- In a separate file
- Larger number of keys per a page

Higher order, shallower tree

9.16 Variable-Length Records and Keys

Variable Length Records and Keys- A B-tree with variable length keys
- No single, fixed order
- A different criterion for over/underflow condition
- Using max/min number of bytes
- (c.f. max/min number of keys)
- Key promotion mechanism
- Shortest variable-length keys are promoted in preference to longer ones
- Pages with the largest numbers of descendants up high in the tree

Download Presentation

Connecting to Server..