Data Structures

1 / 69

# Data Structures - PowerPoint PPT Presentation

Data Structures. Lecture 5 B-Trees. Eran Halperin and Hanoch Levy March 2014. How does a binary tree compare with k- ary tree?. Binary worse: Higher height Cost is logk Binary better: Lower width Cost is k OVERALL: BINARY BETTER! SO WHY BOTHER WITH K- ary ? .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Data Structures' - eamon

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Data Structures

Lecture 5

B-Trees

Eran Halperin and Hanoch LevyMarch 2014

How does a binary tree compare with k-ary tree?

• Binary worse: Higher height
• Cost is logk
• Binary better: Lower width
• Cost is k
• OVERALL: BINARY BETTER!
• SO WHY BOTHER WITH K-ary?

Idealized computation model

CPU

RAM

Each instruction takes one unit of time

Each memory access takes one unit of time

A more realistic model

CPU

Cache

Disk

Each level much larger but much slower

RAM

Information moved in blocks

A simplified I/O mode

CPU

RAM

Disk

Each block is of size m.

Count both operations and I/O operations

Data structures in the I/O model

Linked list and search trees behave poorly in the I/O model.

Each pointer followed may cause a disk access

We need an alternative for binary search treesthat is more suited to the I/O model

B-Trees !

A 4-node

10

25

42

key< 10

10 < key < 25

25 < key < 42

42 < key

3 keys

4-way branch

An r-node

k0

• k1
• k2
• kr−3
• kr−2

c0

c1

c2

cr−2

cr−1

r−1 keys

r-way branch

B-Trees (with minimum degree d)

Each node holds between d−1 and 2d −1 keys

Each non-leaf node has between d and 2d children

The root is special:has between 1and 2d −1 keys

and between 2 and 2d children (if not a leaf)

All leaves are at the same depth

A 2-4 tree

B-Tree with minimal degree d=2

13

4 6 10

15 28

1 3

30 40 50

14

5

7

11

16 17

Node structure

k0

• k1
• k2
• kr-3
• kr-2

r –the degree

c0

c1

c2

cr−2

cr−1

key[0],…key[r−2] –the keys

item[0],…item[r−2] –the associated items

child[0],…child[r−1] –the children

leaf –is the node a leaf?

Possibly a different representation for leaves

The height of B-Trees
• At depth 1 we have at least 2 nodes
• At depth 2 we have at least 2dnodes
• At depth 3 we have at least 2d2nodes
• At depth h we have at least 2dh−1nodes
Red-BlackTrees vs. B-Trees

n = 230  109

30 ≤ height of Red-BlackTree ≤ 60

Up to 60pages read from disk

Height of B-Tree with d=1000 is only 3

Each B-Tree node resides in a block/page

Only 3 (or 4) pages read from disk

Disk access  1 millisecond (10-3 sec)

Memory access 100 nanosecond (10-7 sec)

Look for k in node x

Look for k in the subtree of node x

Number of I/Os - logdn

Number of operations – O(d logdn)

Number of ops with binary search – O(log2d logdn) = O(log2n)

B-Trees – What are they good for?
• Large degree B-treesare used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block.
• Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties.
• B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.

Rotate right

A

B

B

A

Rotate left

Split (a full node)

B

A

C

B

A

C

d−1

d−1

d−1

d−1

Join

Insert

13

5 10

15 28

1 3

30 40 50

14

6

11

16 17

Insert(T,2)

Insert

13

5 10

15 28

1 2 3

30 40 50

14

6

11

16 17

Insert(T,2)

Insert

13

5 10

15 28

1 2 3

30 40 50

14

6

11

16 17

Insert(T,4)

Insert

13

5 10

15 28

1 2 3 4

30 40 50

14

6

11

16 17

Insert(T,4)

Split

13

5 10

15 28

1 2 3 4

30 40 50

14

6

11

16 17

Insert(T,4)

Split

13

5 10

15 28

2

30 40 50

14

1

3 4

6

11

16 17

Insert(T,4)

Split

13

2 5 10

15 28

1

30 40 50

14

3 4

6

11

16 17

Insert(T,4)

Splitting an overflowing node

B

A

C

B

A

C

d

d−1

d

d−1

Another insert

13

2 5 10

15 28

1

30 40 50

14

3 4

6

11

16 17

Insert(T,7)

Another insert

13

2 5 10

15 28

1

30 40 50

14

6 7

3 4

11

16 17

Insert(T,7)

and another insert

13

2 5 10

15 28

1

30 40 50

14

6 7

3 4

11

16 17

Insert(T,8)

and another insert

13

2 5 10

15 28

1

30 40 50

14

3 4

11

16 17

6 7 8

Insert(T,8)

and the last for today

13

2 5 10

15 28

1

30 40 50

14

3 4

11

16 17

6 7 89

Insert(T,9)

Split

13

2 5 10

15 28

7

1

30 40 50

14

3 4

8 9

11

6

16 17

Insert(T,9)

Split

13

2 5 7 10

15 28

1

30 40 50

14

3 4

8 9

11

6

16 17

Insert(T,9)

Split

13

5

2

7 10

15 28

1

30 40 50

14

3 4

8 9

11

6

16 17

Insert(T,9)

Split

5 13

2

7 10

15 28

1

30 40 50

14

3 4

8 9

11

6

16 17

Insert(T,9)

Insert – Bottom up

• Find the insertion point by a downward search
• Insert the key in the appropriate place
• If the current node isoverflowing, split it
• If its parent is now overflowing, split it, etc.
• Need both a downward scan and an upward scan
• Nodes are temporarily overflowing
• Need to keep parents on a stack

Split-Root(T)

T.root

C

T.root

C

d−1

d−1

d−1

d−1

Number of I/Os – O(1)

Number of operations – O(d)

Split-Child(x,i)

x

key[i]

x

key[i]

B

A

C

B

A

x.child[i]

x.child[i]

C

d−1

d−1

d−1

d−1

Number of I/Os – O(1)

Number of operations – O(d)

Insert – Top down

• While conducting the search,splitfull children on the search pathbefore descending to them!

Number of I/Os – O(logdn)

Number of operations – O(d logdn)

Amortized no. of splits – O(1)

Insert – Top down

Number of I/Os – O(logdn)

Number of operations – O(d logdn)

Amortized no. of splits – O(1)

• Argument:
• Each split increases # nodes by 1
• # nodes <= # values = #inserts
•  # splits <= # inserts

Bottom-UpDeletions fromB-Trees

As always, similar, but slightly more complicated than insertions

To delete an item in an internal node, replace it by its successor(or predecessor) and delete successor (or predecessor)

To delete a leaf, delete the relevant key,

and if the leaf has too few keys, fix the tree using rotations and joins.

Split (a full node)

B

A

C

B

A

C

d−1

d−1

d−1

d−1

Join

Rotate right

A

B

B

A

Rotate left

Delete

7 15

3

10 13

22 28

30 40 50

20

24 26

14

1 2

4 6

11 12

8 9

delete(T,26)

Delete

7 15

3

10 13

22 28

30 40 50

20

24

14

1 2

4 6

11 12

8 9

delete(T,26)

Delete

7 15

3

10 13

22 28

30 40 50

20

24

14

1 2

4 6

11 12

8 9

delete(T,13)

Delete (Replace with predecessor)

7 15

3

10 12

22 28

30 40 50

20

24

14

1 2

4 6

11 12

8 9

delete(T,13)

Delete

7 15

3

10 12

22 28

30 40 50

20

11

24

14

1 2

4 6

8 9

delete(T,13)

Delete

7 15

3

10 12

22 28

30 40 50

20

11

24

14

1 2

4 6

8 9

delete(T,24)

Delete

7 15

3

10 12

22 28

30 40 50

20

11

14

1 2

4 6

8 9

delete(T,24)

Delete (steal from sibling)

7 15

3

10 12

22 30

40 50

20

11

28

14

1 2

4 6

8 9

delete(T,24)

Rotate right

A

B

B

A

Rotate left

Delete

7 15

3

10 12

22 30

40 50

20

11

28

14

1 2

4 6

8 9

delete(T,20)

Delete

7 15

3

10 12

22 30

40 50

11

28

14

1 2

4 6

8 9

delete(T,20)

Delete (Join)

7 15

3

10 12

30

40 50

22 28

11

14

1 2

4 6

8 9

delete(T,20)

Split (a full node)

B

A

C

B

A

C

d−1

d−1

d−1

d−1

Join

Few more..

7 15

3

10 12

30

40 50

22 28

11

14

1 2

4 6

8 9

delete(T,22)

Few more..

7 15

3

10 12

30

40 50

28

11

14

1 2

4 6

8 9

delete(T,22)

Few more..

7 15

3

10 12

30

40 50

28

11

14

1 2

4 6

8 9

delete(T,28)

Few more..

7 15

3

10 12

30

40 50

11

14

1 2

4 6

8 9

delete(T,28)

Stealing again

7 15

3

10 12

40

50

30

11

14

1 2

4 6

8 9

delete(T,28)

Another one

7 15

3

10 12

40

50

30

11

14

1 2

4 6

8 9

delete(T,30)

Another one

7 15

3

10 12

40

50

11

14

1 2

4 6

8 9

delete(30,T)

After Join

7 15

3

10 12

11

40 50

14

1 2

4 6

8 9

delete(30,T)

Now we can steal

7 15

3

10 12

11

40 50

14

1 2

4 6

8 9

delete(30,T)

Now we can steal

7 12

10

3

15

40 50

14

1 2

4 6

11

8 9

delete(30,T)

More ?

7 12

10

3

15

40 50

14

1 2

4 6

11

8 9

delete(40,T)

Delete – Top down

• Assume, at first, that the item to be deleted is in a leaf
• While conducting the search,make sure that each child descended into contains at least d keys
• How?
• Use rotations or joins
• When the item is located, it resides in a leaf containing at least d keys, so it can be removed

Delete – Top down

• While conducting the search,make sure that each child you descend to contains at least d keys

d−1

 d

d−1

d−1

Rotate! (Steal)

Join!

Delete – Top down

• What if the item to be deleted is in an internal node?
• Descend as before from the root untilthe item to be deleted is located
• Keep a pointer to the node containing the item
• Carry on descending towards the successor, making sure that nodes contain at least d keys
• When the successor is found, delete it from its leafand use it to replace the item to be deleted