Lecture 8 on Physical Database
This presentation is the property of its rightful owner.
Sponsored Links
1 / 37

Lecture 8 on Physical Database PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 8 on Physical Database. DBMS has a view of the database as a collection of stored records, and that view is supported by the file manager which has a view of the database as a collection of pages, and that view is supported by the disk manager. The DBMS, file manager and disk manager.

Download Presentation

Lecture 8 on Physical Database

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 8 on physical database

Lecture 8 on Physical Database

DBMS has a view of the database as a collection of stored records, and that view is supported by the file manager which has a view of the database as a collection of pages, and that view is supported by the disk manager.


Lecture 8 on physical database

The DBMS, file manager and disk manager

DBMS

Request stored record

Store record returned

File manager

Request stored page

Stored page return

Disk Manager

Disk I/O operation

Data read from disk

Stored database


Lecture 8 on physical database

Physical database design

The physical database design is initiated to a certain extent in the “logical” design. The physical organization is determined largely by the need for operational efficiency, fast response times, and cost minimization. Most data storage devices record data as a stream of bits. The groups of bits which we can read with one machine instruction are called physical records. The physical records are stored at locations which are identified by a means of machine addresses. A program identifies a logical record or sequent by means of a key.


Sequential file

Sequential file

With the physical sequential access method, the physical records are stored in logical sequence. If the storage medium to be used is a tape, the programmer has to present the physical records in a logical sequence. If the storage medium is a direct access one, the system will interconnect the physical records so that they are in logical sequence, even if they were not presented in logical sequence. The records must be read in a fixed sequence from begin to end sequential.


Overflow area

Overflow area

With ISAM files the records are grouped so as to fit onto physical disk tracks, and one track on each cylinder contains an index to the records stored in that cylinder. When new records are inserted after the original sequential file has been set up these are stored in an overflow area. The index track contains pointers both to the prime data area and to the overflow area.


Indexed file b tree

Indexed File: B+-Tree

A common scheme for extremely large files is to induce a hierarchy of indices that follow the hierarchical nature of the secondary storages devices on which the file resides. We can view the hierarchy of indices as a tree. The benefit of index file is efficient space utilization in searching indexes.


Lecture 8 on physical database

Multilevel Index B+ Tree


B tree file

B+ Tree File


B tree parameters

B+-tree parameters

B+-trees are defined to use a particular insertion/deletion strategy that ensures no node, except the root, is less than half full.

In general, we define index block values as:

2d -1 ≥ Order where Order is the maximum pointer in each index block.

d = number of search values in index block

Similarly, we define leaf block values as:

2e -1 ≥ Order where Order is the number of key values in each leaf block.

e = number of minimum key values in leaf block

For example, if the order is 3, then the number of key values in an index block is d = 2 such that 2d-1>3.

Also, the number of key values in a leaf block is 3 such that 2e-1>3 where e=2.


Insert

Insert

To insert a record with key value v, apply the lookup procedure to find the block B in which this record belongs.

If there are room (< 2e-1 records) in B, insert the new record in B.

If there is no room (=2e-1 records) in block B, create a new block B1 and divide the records from B and the inserted record into two groups of e record each.

The effects of inserting a record into B can ripple up the tree for several levels up to the root.


Insert b tree in sorted sequence of 1 2 3 4 5 6 7 8 9 10

Insert B+ tree in sorted sequence of (1,2,3,4,5,6,7,8,9,10)


Initial b tree in unsorted sequence of 1 4 9 16 25 49 64 81 36 100 121 144 169 196 225 256

Initial B+-tree in unsorted sequence of (1,4,9,16,25,49, 64,81, 36,100,121,144,169,196,225,256)


B tree after insertion of 32

B+-tree after insertion of 32


Deletion

Deletion

If we delete the record with key value v, we use the lookup procedure to find this record.

If after deletion, block B has more than half (e) records,we are done.

If, after deletion, block B has less than half (e-1) records, we look for a neighbor block B1. If B1 has more than half ( e) records, we distribute records of B and B1;

otherwise combine B with B1, which will have exactly 2e-1 records, and in the parent of B, modify the record for B1 and delete record for B.


Deletion1

Deletion

If the deleted record was the first in block B, then we go to the parents of B to change the key value in the record for B.

If B is the first child of its parent, the parent has no key value for B, so we must go to the parent’s parent and so on, until we find an ancestor A1 of B such that A1 is not the first child of its parent A2.

Then the new lowest key value of B goes in the record of A2 that points to A1.


Lecture 8 on physical database

B+-tree after deleting 64


Lookup search b tree

Lookup (Search B+ tree)

Let us search for a record with key value v. We find a path from the root of B+-tree to some leaf, where the desired record will be found if it exists. Suppose we have reached node (block) B. If B is a leaf, then examine block B for a record with key value v. If B is not a leaf, it is an index block. Determine which key value in block B covers v. In the record of B that covers v is a pointer to another block. That block follows B in the path being constructed.


B tree block access operations

B+-tree block access operations

Given a B+-tree with n records and e values in the leaf, and d values in each branch.

The tree will have no more than n/e leaves, no more than n/(de) parents of parents of leaves and so on.

If there are i nodes on paths from the root to leaves, then n>di-1e. It follows that

i1+logd(n/e) where i = number of I/O costs to access leaf block

For example, if n=1000000, e=5 and d=50, the number read/writes of blocks in an operation is

i  1+log50(200000)  4.12  5 (round up)

i = 4 (round down)

Notice that log50(200000) = log (200000) / log(50)


Hash files

Hash files

Hashing is a form of address calculation technique which can convert an item’s key into a near-random number used to determine where the item is stored. The near-random number refer to the address where a record is stored. The number of logical records stored in this area is referred to as the bucket capacity. The benefit of hashing is fast response time but with poor space ultilzation.


Factors in hashing addressing

Factors in hashing addressing

  • The bucket size is a certain number of address spaces made available.

  • The packing density (number of buckets for a file of a given size)

  • The hashing key-to-address transaction.

    If the key is not numeric, convert it into numeric.

    The keys are converted into a spread of numbers of the order of magnitude of the address numbers required.

    The resulting numbers are multiplied by a constant which compresses them to the precise range of address.

  • The method of handling overflows. It is desirable to minimize the bucket-searching operation at the expense of more overflows.


Hashing algorithms

Hashing algorithms

  • Remainder method: divide key by a number and let the remainder as the physical address of the record.

  • Midsquare method: the key is multipled by itself and the middle few digits of the square are used as the index.

  • Folding method: breaks up a key into several segments that are added or exclusive ORed together to form a hash value.


Lecture 8 on physical database

Remainder hashing algorithm example

Suppose supplier number values are S100, S200, S300, S400, S500, and each stored supplier record requires an entire page to itself. By using hash function division/remainder. The page numbers for the five suppliers are then 9, 5, 1, 10, 6 and the divider is 13. For example, reminder of 100/13 is 9.

0

1

2

3

4

5

S200

S300

6

7

8

9

10

11

S500

S100

S400

12


Folding method hash algorithm example

Folding method hash algorithm example

Suppose that the internal bit string representation of a key is

010111001010110

and that 5 bits are allowed in the index. The three bit strings

01011, 10010 and 10110

are exclusive ORed to produce, i.e., 0 ORed 0 = 0

1 ORed 0 = 1

0 ORed 1 = 1

1 ORed 1 = 0

01111,

which is 15 as a binary integer.


Midsquare hashing algorithm example

MidSquare hashing algorithm example

This method treats the key as a single large number, square the number, and extract whatever number of digits is needed from the middle of the result.

Suppose you want to generate addresses between 0 and 99. If the key is the number 453, its square is 205,209. Extracting the middle two digits yields a number between 0 and 99, in this case 52.


Inverted list file

Inverted list file

Logical record order can be maintained using inverted list which is a table that cross references record addresses with some field value. The benefit of inverted file is to implement secondary index, that is, an alternative index besides prime index.


Multi list file

Multi-list file

There is one entry in the secondary key’s index for each value that the secondary key presently has in the data file.

The entry in the multi-list index for a key value has just one pointer to the first data record with that key value.

The data records contains a pointer to the next data record with that key value, and so forth.

There is a linked list of data records for each value of the secondary key.

Multi-list chains are bi-directional, and occasionally are circular to improve update performance of a database.

The benefit of multi-list file is to implement duplicate secondary indexes.


Lecture summary

Lecture Summary

Four kinds of file structures have been introduced to design physical database in order to implement logical database schema.

Firstly, B+ tree is good for both performance with indexing and space utilization with balanced blocks.

Secondly, hashing is good for fast response in searching key value.

Thirdly, inverted file is good for secondary indexing.

Fourthly, multi-linked lists is good for duplicate secondary indexing.


Review question

Review question

How to compare the efficiency of physical database storage by use of B+-tree file, hashing file, inverted list file and multi-list file?


Tutorial question

Tutorial Question

What is a B+-tree and what are its components.

Given number of search-key values that fit in one index node is 2 and in one leaf node is 3. Construct a B+-tree for the following set of key values (2, 3, 5, 7, 11, 17, 19, 23, 29, 31).

Show how to use this B-tree to find record with search-key value 11. Show B-tree after inserting search-value 9 and after deleting search value 17.


Reading assignment

Reading Assignment

Chapter 14 Indexing Structures for Files of “Fundamentals of Database Systems:, 5th edition, by Elmasri and Navathe, Pearson International Edition, 2007, pp.500-531.


  • Login