Data Structures

1 / 69

# Data Structures - PowerPoint PPT Presentation

Data Structures. Lecture 5 B-Trees. Haim Kaplan and Uri Zwick November 2012. A 4 -node. 10. 25. 42. key &lt; 10. 10 &lt; key &lt; 25. 25 &lt; key &lt; 42. 42 &lt; key. 3 keys. 4 -way branch. An r -node. …. k 0. k 1. k 2. k r−3. k r−2. c 0. c 1. c 2. c r −2. c r −1.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Data Structures' - burke

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Data Structures

Lecture 5

B-Trees

Haim Kaplan and Uri ZwickNovember 2012

A 4-node

10

25

42

key< 10

10 < key < 25

25 < key < 42

42 < key

3 keys

4-way branch

An r-node

k0

• k1
• k2
• kr−3
• kr−2

c0

c1

c2

cr−2

cr−1

r−1 keys

r-way branch

B-Trees (with minimum degree d)

Each node holds between d−1 and 2d −1 keys

Each non-leaf node has between d and 2d children

The root is special:has between 1 and 2d −1 keys

and between 2 and 2d children (if not a leaf)

All leaves are at the same depth

A 2-4 tree

B-Tree with minimal degree d=2

13

4 6 10

15 28

1 3

30 40 50

14

5

7

11

16 17

Node structure

k0

• k1
• k2
• kr-3
• kr-2

r –the degree

c0

c1

c2

cr−2

cr−1

key[0],…key[r−2] –the keys

item[0],…item[r−2] –the associated items

child[0],…child[r−1] –the children

leaf –is the node a leaf?

Possibly a different representation for leafs

The height of B-Trees
• At depth 1 we have at least 2 nodes
• At depth 2 we have at least 2dnodes
• At depth 3 we have at least 2d2nodes
• At depth h we have at least 2dh−1nodes

Look for k in node x

Look for k in the subtree of node x

Number of nodes accessed - logdn

Number of operations – O(d logdn)

Number of ops with binary search – O(log2d logdn) = O(log2n)

B-Trees vs binary search trees
• Wider and shallower
• Access less nodes during search
• But may take more operations

The hardware structure

CPU

Cache

Disk

Each memory-level much larger but much slower

RAM

 Information moved in blocks

A simplified I/O model

CPU

RAM

Disk

Each block is of size m.

Count both operations and I/O operations

Data structures in the I/O model

Each node (struct) is allocated continuously.

Harder to control the disk blocks containing different nodes

 Linked list and search trees behave poorly in the I/O model.

Each pointer followed may cause a disk access

Pick d such that a node fits in a block

 B-trees reduce the worst case # of I/Os

Look for k in node x

Look for k in the subtree of node x

I/Os

Number of nodes accessed - logdn

Number of operations – O(d logdn)

Number of ops with binary search – O(log2d logdn) = O(log2n)

Red-BlackTrees vs. B-Trees

n = 230  109

30 ≤ height of Red-BlackTree ≤ 60

Up to 60pages read from disk

Height of B-Tree with d=1000 is only 3

Each B-Tree node resides in a block/page

Only 3 (or 4) pages read from disk

Disk access  1 millisecond (10-3 sec)

Memory access 100 nanosecond (10-7 sec)

B-Trees – What are they good for?
• Large degree B-treesare used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block.
• Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties.
• B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.

Rotate/Steal right

A

B

B

A

Rotate/Steal left

Number of operations – O(d)

Number of I/Os – O(1)

Split

B

A

C

B

A

C

d−1

d−1

d−1

d−1

Join

Number of operations – O(d)

Number of I/Os – O(1)

Insert

13

5 10

15 28

1 3

30 40 50

14

6

11

16 17

Insert(T,2)

Insert

13

5 10

15 28

1 2 3

30 40 50

14

6

11

16 17

Insert(T,2)

Insert

13

5 10

15 28

1 2 3

30 40 50

14

6

11

16 17

Insert(T,4)

Insert

13

5 10

15 28

1 2 3 4

30 40 50

14

6

11

16 17

Insert(T,4)

Split

13

5 10

15 28

1 2 3 4

30 40 50

14

6

11

16 17

Insert(T,4)

Split

13

5 10

15 28

2

30 40 50

14

1

3 4

6

11

16 17

Insert(T,4)

Split

13

2 5 10

15 28

1

30 40 50

14

3 4

6

11

16 17

Insert(T,4)

Splitting an overflowing node

B

A

C

B

A

C

d

d−1

d

d−1

Another insert

13

2 5 10

15 28

1

30 40 50

14

3 4

6

11

16 17

Insert(T,7)

Another insert

13

2 5 10

15 28

1

30 40 50

14

6 7

3 4

11

16 17

Insert(T,7)

and another insert

13

2 5 10

15 28

1

30 40 50

14

6 7

3 4

11

16 17

Insert(T,8)

and another insert

13

2 5 10

15 28

1

30 40 50

14

3 4

11

16 17

6 7 8

Insert(T,8)

and the last for today

13

2 5 10

15 28

1

30 40 50

14

3 4

11

16 17

6 7 89

Insert(T,9)

Split

13

2 5 10

15 28

7

1

30 40 50

14

3 4

8 9

11

6

16 17

Insert(T,9)

Split

13

2 5 7 10

15 28

1

30 40 50

14

3 4

8 9

11

6

16 17

Insert(T,9)

Split

13

5

2

7 10

15 28

1

30 40 50

14

3 4

8 9

11

6

16 17

Insert(T,9)

Split

5 13

2

7 10

15 28

1

30 40 50

14

3 4

8 9

11

6

16 17

Insert(T,9)

Insert – Bottom up

• Find the insertion point by a downward search
• Insert the key in the appropriate place
• If the current node isoverflowing, split it
• If its parent is now overflowing, split it, etc.
• Need both a downward scan and an upward scan
• Need to keep parents on a stack
• Nodes are temporarily overflowing

Insert – Top down

• While conducting the search,splitfull children on the search pathbefore descending to them!
• When the appropriate leaf it reached,it is not full, so the new key may be added!

Split-Root(T)

T.root

C

T.root

C

d−1

d−1

d−1

d−1

Split-Child(x,i)

x

key[i]

x

key[i]

B

A

C

B

A

x.child[i]

x.child[i]

C

d−1

d−1

d−1

d−1

Insert – Top down

• While conducting the search,splitfull children on the search pathbefore descending to them!

Number of I/Os – O(logdn)

Number of operations – O(d logdn)

Deletions from B-Trees

7 15

3

10 13

22 28

30 40 50

20

24 26

14

1 2

4 6

11 12

8 9

delete(T,26)

Delete

7 15

3

10 13

22 28

30 40 50

20

24

14

1 2

4 6

11 12

8 9

delete(T,26)

Delete

7 15

3

10 13

22 28

30 40 50

20

24

14

1 2

4 6

11 12

8 9

delete(T,13)

Delete (Replace with predecessor)

7 15

3

10 12

22 28

30 40 50

20

24

14

1 2

4 6

11 12

8 9

delete(T,13)

Delete

7 15

3

10 12

22 28

30 40 50

20

11

24

14

1 2

4 6

8 9

delete(T,13)

Delete

7 15

3

10 12

22 28

30 40 50

20

11

24

14

1 2

4 6

8 9

delete(T,24)

Delete

7 15

3

10 12

22 28

30 40 50

20

11

14

1 2

4 6

8 9

delete(T,24)

Delete (steal from sibling)

7 15

3

10 12

22 30

40 50

20

11

28

14

1 2

4 6

8 9

delete(T,24)

Rotate/Steal right

A

B

B

A

Rotate/Steal left

Delete

7 15

3

10 12

22 30

40 50

20

11

28

14

1 2

4 6

8 9

delete(T,20)

Delete

7 15

3

10 12

22 30

40 50

11

28

14

1 2

4 6

8 9

delete(T,20)

Delete (Join)

7 15

3

10 12

30

40 50

22 28

11

14

1 2

4 6

8 9

delete(T,20)

Few more..

7 15

3

10 12

30

40 50

22 28

11

14

1 2

4 6

8 9

delete(T,22)

Few more..

7 15

3

10 12

30

40 50

28

11

14

1 2

4 6

8 9

delete(T,22)

Few more..

7 15

3

10 12

30

40 50

28

11

14

1 2

4 6

8 9

delete(T,28)

Few more..

7 15

3

10 12

30

40 50

11

14

1 2

4 6

8 9

delete(T,28)

Stealing again

7 15

3

10 12

40

50

30

11

14

1 2

4 6

8 9

delete(T,28)

Another one

7 15

3

10 12

40

50

30

11

14

1 2

4 6

8 9

delete(T,30)

Another one

7 15

3

10 12

40

50

11

14

1 2

4 6

8 9

delete(30,T)

After Join

7 15

3

10 12

11

40 50

14

1 2

4 6

8 9

delete(30,T)

Now we can steal

7 15

3

10 12

11

40 50

14

1 2

4 6

8 9

delete(30,T)

Now we can steal

7 12

10

3

15

40 50

14

1 2

4 6

11

8 9

delete(30,T)

More ?

7 12

10

3

15

40 50

14

1 2

4 6

11

8 9

delete(40,T)

Delete – Top down

• Assume, at first, that the item to be deleted is in a leaf
• While conducting the search,make sure that each child descended into contains at least d keys
• How?
• Steal or join
• When the item is located, it resides in a leaf containing at least d keys, so it can be removed

Delete – Top down

• While conducting the search,make sure that each child you descend to contains at least d keys

d−1

 d

d−1

d−1

Rotate! (Steal)

Join!

Delete – Top down

• What if the item to be deleted is in an internal node?
• Descend as before from the root untilthe item to be deleted is located
• Keep a pointer to the node containing the item
• Carry on descending towards the successor, making sure that nodes contain at least d keys
• When the successor is found, delete it from its leafand use it to replace the item to be deleted

Deletions fromB-Trees

As always, similar, but slightly more complicated than insertions

(may need to replace with successor)

Deletion is slightly simpler for B+-Trees

B-Trees vs. B+-Trees
• In a B-tree each node contains items and keys
• In a B+-tree leaves contain items and keys.Internal nodes contain keys to direct the search.
• Keys in internal nodes are either keys of existing items, or keys of items that were deleted.
• Internal nodes may contain more keysso overall the # of items we can store increases