1 / 41

Group Project B- Tree Student: Yongsheng Ma

CS632 – Algorithm Professor: G. Gibson. Group Project B- Tree Student: Yongsheng Ma. B-Tree. Introduction Operations Complexities Applications Summary. B-Tree Properties. A m-way search way Root node may have as few as two children or none if the tree is empty Root may be a leaf

vanya
Download Presentation

Group Project B- Tree Student: Yongsheng Ma

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS632 – Algorithm Professor: G. Gibson Group Project B-Tree Student: Yongsheng Ma

  2. B-Tree • Introduction • Operations • Complexities • Applications • Summary

  3. B-Tree Properties • A m-way search way • Root node may have as few as two children or none if the tree is empty • Root may be a leaf • Internal nodes have at least ceiling(m/2) and at most m non-null sub-trees

  4. B-Tree Properties • All leaf nodes are at the same level; that is, the tree is perfectly balanced. • A leaf node has at least ceiling(m/2)-1 entries (keys) and at most m-1 entries (keys).

  5. B-Tree Properties • “branching factor ” can be quite large. • Each node may have many children, from a handful to thousands. • The keys in each node is in non-decreasing order.

  6. Operations • Searching a key • Inserting a key • Splitting a node • Deleting a node

  7. Searching a key • Much like searching a binary tree. • Make a multi-way branching decision at each node • The nodes encountered form a path downward from the root.

  8. Searching a key • The number of pages accessed is (h)=(logtn) , in which h is the height and n is the number of keys. • CPU time is O(th)=O(t logtn) . • Note • t is minimum degree for B-tree. • So each node has the maximum number of children as 2t and entries(keys) as 2t-1.

  9. Searching a key M D H Q T X B C F G J K L N P R S V W Y Z

  10. Creating a empty tree • We can assume there is no disk read. • Allocates one disk page to be used as a new node in O(1) time.

  11. Splitting a node • A fundamental operation used during insertion • The median key moves up into its parent node, which must be non-full. • If it has no parent, then the tree grows in height by one

  12. Splitting a node … … N W … … … … N S W … … P Q R S T U V P Q R T U V t=4

  13. Splitting a node H A D F H L N P A D F L N P t=4

  14. Inserting a key • Requiring • O(h) disk accesses. • CPU time O(th)=O(t logtn) .

  15. Inserting a key • Splitting the root is the only way to increase the height of a B-tree. • Unlike a binary tree, a B-tree increases in height at the top instead of the bottom .

  16. Inserting a key (a) initial tree G M P X A C D E J K N O R S T U V Y Z t=3

  17. Inserting a key (b) B inserted G M P X A B C D E J K N O R S T U V Y Z t=3

  18. Inserting a key (c) Q inserted G M P T X A B C D E J K N O Q R S U V Y Z t=3

  19. Inserting a key (d) L inserted P G M T X A B C D E J K L N O Q R S U V Y Z t=3

  20. Inserting a key (e) F inserted P C G M T X A B D E F J K L N O Q R S U V Y Z t=3

  21. Deleting a key • is analogous to insertion but is a little more complicated. • Exists various cases of deleting keys from B-tree.

  22. Deleting a key • Different conditions can affect different behaviors. • In practice, deletion operations are most often used to delete keys from leaves.

  23. Deleting a key • When deleting a key from an internal node, however, the procedure makes a downward pass through the tree but may have to return to the node from which the key was deleted to replace the key with its predecessor or successor.

  24. Deleting a key • Although this procedure seems complicated, it involves only O(h) disk operations for a B-tree with height h. • The CPU time required is O(th)=O(t logtn) .

  25. Deleting a key (a) Initial tree P C G M T X A B D E F J K L N O Q R S U V Y Z t=3

  26. Deleting a key (b) F deleted: case 1 P C G M T X A B D E J K L N O Q R S U V Y Z t=3

  27. Deleting a key (c) M deleted: case 2a P C G L T X A B D E J K N O Q R S U V Y Z t=3

  28. Deleting a key (d) G deleted: case 2c P C L T X A B D E J K N O Q R S U V Y Z t=3

  29. Deleting a key (e) D deleted: case 3b C L P T X A B E J K N O Q R S U V Y Z t=3

  30. Deleting a key (e’) tree shrinks in height C L P T X A B E J K N O Q R S U V Y Z t=3

  31. Deleting a key (f) B deleted: case 3a E L P T X A C J K N O Q R S U V Y Z t=3

  32. Complexities • A large Branching Factor reduces the number of disk accesses required to find a key. • When root node resides in memory, a tree with a height of 1 will require at most 2 disk accesses to find any key in the tree, this can be realized in Constant Time O(1).

  33. Complexities • Running Time is comprised of the number of disk accesses and the CPU time. • During a disk Read or Write, an entire page of information is accessed • The number of disk accesses is measured in terms of pages that have to be read from or written to the disk.

  34. Complexities • The number of disk pages accessed is O(h)=O(logtn). • The CPU time to traverse within each node is O(t). • The Total Time is O(th) which is equal to O(tlogtn) or ≈ O(log n). • It is the same for every basic operation.

  35. Applications • Databases cannot typically be maintained entirely in memory. • Secondary storage is usually used. • B-tree is often used to index the data and to provide fast access.

  36. Applications • Searching an un-indexed and unsorted database containing n key values will have a worst case running time of O(n) • Indexed with a B-tree, the same search operation will run in O(log n)

  37. Applications – an example • To perform a search for a single key on a set of one million keys (1,000,000), a linear search will require at most 1,000,000 comparisons. • If the same data is indexed with a B-tree of minimum order 10 and height 9, 81 comparisons will be required in the worst case.

  38. Summary • B-Tree is a balanced, multi-way file organization. • Search, Insert, and Delete operations retain desirable logarithmic costs. • B-Tree schemes promote 50% storage usage.

  39. Extra • B-tree variants • B+ and B* tree • Branching factors are improved

  40. Extra • B+ tree • Combine features of ISAM and B tree • Contain Index pages and Data pages • Data pages always appear as leaf nodes • Root and intermediate nodes are index pages

  41. Extra • B+ tree • Saves more space (but who cares) • Non-leaf and leaf nodes contain different numbers of nodes • Deletion more complicated • Faster look up for B-trees because the height of the tree is smaller (because items are stored more compactly)

More Related