Physical Database

Physical Database

Physical Design Activities • Minimize the number of disk accesses • Minimize the amount of space used to store the data • Provide for data integrity and protect the data from loss

Disk Storage Devices • Disk units are referred to as direct access storage device or DASD ( daz_dee) • Offer high storage capacity and low cost • Data stored as magnetized areas on magnetic disk surfaces. • A disk pack contains several magnetic disks connected to a rotating spindle. • For disk packs, the tracks with the same diameter on the various surfaces are called a cylinder. (a set of tracks that can be accessed w/o moving arms)

Cont’ • Disks are divided into concentric circular tracks on each disk surface. Track capacity vary typically from 4 to 50 Kbytes. (about 800 tracks) • Each concentric circle stores the same amount of information, so bits are packed more densely on the smaller-diameter tracks. • A track is divided into sectors or blocks (or data page). The division of a track into equal-sized blocks is set by the OS during disk formatting. So, The block size B is fixed for each system. • A block is a unit of space read/written by the disk head. Typical block sizes range from 512 bytes to 4096 bytes. A whole block is transferred between disk and main memory for processing. (Typical Unix sys: 2 KB; mainframe: 4KB; 1.44 floppy: 512B) This means the bigger the block size, the faster the data read/write time. However, the storage could also be wasted.

Cont’ • In Oracle, the overhead to manage a block is approximately 90 bytes for a table, and 161 bytes for an index (about 80% are used for data). • Blocks are separated by fixed-size interblock gaps. • A section of memory set aside to receive a block of data is called a buffer. Typically, a buffer size is the same as the block size. There are many buffers in commercial systems. (500-3000). • A read-write head moves to the track that contains the block to be transferred (seek time). Disk rotation moves the block under the read-write head for reading or writing (rotational delay or latency).

Cont’ • Reading or writing a disk block is time-consuming because of the seek time s and rotational delay (latency) rd. • Thus, the access time (the average time required to access a single data record) will be: Average_Seek_time + Avg_Latency + Block_Transfer_time where Block_Transfer_time = Transfer_Rate x Block_Size • Typical access time ranges from 10-60 milliseconds • The seek time >> rotational delay >> BTT. So, it is common to transfer several consecutive blocks on the same track or cylinder. (to eliminate ST and RD)

Cont’ • Locating data on a disk is a major bottle neck in db applications. • A physical disk block address consists of a surface number, track number (within surface), and block number (within track).

Cont’ • Mainframe systems call each disk unit a volume, and each volume has a volume table of content (VTOC). • In the MS-DOS, the data entry info. is stored into a file allocation table (FAT). • Typically, relative addressing is used. Each record has a relative record number. The first record is 0 or certain displacement, etc. • To speed up processing, double buffering can be used. Once one buffer is full, the CPU can start processing; at the same time, the I/O processor can be reading and transferring the next block into a different buffer.

Files of Records • A file is a sequence of records, where each record is a collection of data values (or data items). • A file descriptor (or file header) includes info. that describes the file, such as field names and their data types, and the address of the file blocks on disk. • Records are stored on disk blocks. The blocking factor bfr for a file is the average number of records stored in a disk block.

Cont’ • A file can have fixed-length records or variable-length records. • File records can be unspanned (no record can span two blocks) or spanned (a record can be stored in more than one block). • In a file of fixed-length records, all records have the same format. Usually, unspanned blocking is used with such files. • Files of variable-length records require additional info. to be stored in each record, such as separator characters and field types. Usually spanned blocking is used with such files. • The physical disk blocks that are allocated to hold the records of a file can be contiguous, linked, or indexed

Typical Operations on Files OPEN, FIND, FIND NEXT, READ, INSERT, DELETE, MODIFY, CLOSE, REORGANIZE, READ_ORDERED

Primary File Organization • Unordered Files • Ordered Files • Hashed Files

Unordered Files • Also called a heap or a pile file, new records are inserted at the end of the file. • To search for a record, a linear search through the file records is necessary. This requires reading and searching half the file blocks on the average, and is hence quite expensive. • Record insertion is quite efficient. • Reading the records in order of a particular field requires sorting the file records.

Ordered Files • Also called a sequential (ordered) file. • File records are kept sorted by the values of an ordering field. • Insertion is expensive; records must be inserted in the correct order. It is common to keep a separate unordered overflow (or transaction) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file. • A binary search can be used to search for a record on its ordering field value. This requires reading and searching log2 (b) blocks on the average, an improvement over linear search. • Reading the records in order of the ordering field is efficient

Hashed (Direct) Files Static External Hashing • Hashing refers to the process of converting attribute values directly into addresses. • Hashing for disk files is called external hashing • The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1 , ...., bucketM-1. Typically, a bucket corresponds to one (or a fixed number of) disk blocks. • One of the file fields is designated to be the hash key of the file.

Cont’ • The record with hash key value K is stored in bucket i, where i = h(K), and h is the hash function. • Collisions occur when a new record hashes to a bucket that is already full. An overflow file is kept for storing such records. Overflow records that hash to each bucket can be linked together. • To reduce overflow records, a hash file is typically kept 70-80% full. If we expect to have r records to store in the table, we should choose M locations for the address space such that (r/M) is between 0.7 and 0.9.

Cont’ • Typically, M is a prime number since it distributes randomly via MOD fn. • The hash fn h should distribute the records uniformly among the buckets; otherwise search time will be increased because many overflow records will exist. • Main advantages: search is efficient; in most cases, needs a single block access to retrieve that record (fastest possible access, given the hash value)

Disadvantage • No ordering of records • No search other than hash field • Fixed buckets M is a problem if # of records grows or shrinks

Cont’ • Only OpenIngres and ORACLE support hashing. • Oracle supports hashing in terms of table clustering

Index Structures for Files 1. Iindexes as Access Paths 2. Types of Single-level Indexe : Primary Indexes, Clustering Indexes, Secondary Index 3. Multi-level Indexes 4. Using B-Trees and B+-Trees as Dynamic Multi-level Indexes 5. Indexes and Performance Optimization

Indexes as Access Paths • A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file. • The index is usually specified on one field of the file (could be multiple fields) - Oracle allows up to 16 attributes (or max 1000-2000 bytes) • One form of an index is a file of entries <field_value, pointer_to_record>, which is ordered by the field value so that a binary search can be used. •

Cont’ The index is called an access path on the field. • Indexes speed up retrieval of data. - Most beneficial for read-intensive and large files - Write/update-intensive (volatile) files produce overheads

Types of Indexes • Primary Index • Clustering Index • Secondary Index • Multi-Level Indexes

Primary Index • Defined on an ordered data file whose records are fixed length with 2 fields (ordering key field + pointer to a disk block) • The data file is physically ordered on a Primary Key or an ordering field • Includes one index entry(or index record) for each block in the data file (Not every ordering field value appears as an index entry)

Cont’ • The index entry has the ordering field value for the first record in the block, which is called the block anchor • A similar scheme can use the last record in a block

Cont’ A primary index is an example of a nondense (sparse) index; • A Nondense index: has an entry for each disk block of the data file • A Dense index: has an entry for each record

Cont’ • Major problems: insertion and deletion of records - Use either unordered overflow file or a linked list of overflow records for each block in the data file.

Example Suppose we have an ordered file EMP (Name, SSN, Address, Job, Sal, ....), record size R = 100 bytes (fixed and unspanned), block size B = 2048 bytes, #records r = 30,000 records, Blocking factor Bfr = B/R = CEIL(2048/100) = 20 records/block CEIL(X) returns a next higher integer value after truncation. E.g., CEIL(5.6) returns 6; or CEIL(12.01) returns 13. The number of file blocks b = CEIL(r/Bfr) = CEIL(30000/20) = 1500 blocks

Cont’ • No index; use linear search: An average linear search cost of (b/2) = (1500)/2 = 750 block accesses (b) No index; use binary search: A binary search on the data file would need log2 b = log2 (1500) = 11 block accesses

Cont’ (c) Index; use binary search: Suppose Length(SSN) = 9 bytes and a block pointer P = 6 bytes Index entry size IE_Size = (9 +6) = 15 bytes The Bfr for the index is (B/IE_size) = BOTTOM(2048/15) =BOTTOM(136.53) = 136 entries/block The total # of index entries = 1500 The number of index blocks = CEIL(1500/136) = 12 blocks A binary search on the index file would need CEIL(log2 b) = CEIL(log2 (12)) = 4 block accesses To search for a record using the index, needs one additional block access. So, a total of 4 + 1 = 5 block accesses is needed

Cont’ • Comparison: (Linear_Search: No_Index_&_BS : Index_&_BS) = (750 : 11 : 5) • BS = Binary Search

Clustering Index • Defined on a data file physically ordered on a non-key field that does not have a distinct value for each record. • A relation can have at most one clustering index. • The field is called the clustering field. • Includes one index entry (index record) for each distinct value of the field • The index entry points to the first data block that contains records w/ that indexed field value

Secondary Index • Defined on an unordered data file • Can be defined on a key field or a non-key field • Includes one index entry for each record in the data file: dense index. • A block anchor cannot be used in the secondary index because the records of the data file are not physically ordered by values of the secondary key field.

Cont’ • A secondary index for a key field is a dense index (one entry/record) • A secondary index for a non-key field could be densed or nondensed. • A secondary index provides a logical ordering on the records by the indexing field. • There can be many secondary indexes for a data file • A secondary index is called an inverted index

Multi-Level Indexes Because a single-level index is an ordered file, we can create a primary index to the index itself. • We call the original index the first-level index and the index to the index the second-level index • We can repeat this process, creating a 3rd, 4th, ..., top level until all entries of the top level fit in one disk block

Cont’ • A multi-level index can be created for any type of first-level index (primary, clustering, secondary) as long as the first-level index consists of more than one disk block • Such a multi-level index is a form of search tree; however, insertion and deletion of new index entries is a severe problem because every level of the index is an ordered file • Hence, most multi-level indexes use B-tree or B+ tree data structures, which leave space in each tree node (disk block) to allow for new index entries

Example of Multi-level Tree Index a) A table with 1 M records, Index value: 8 bytes and Pointer to row ID: 6 bytes, So, 14 bytes/index entry b) 1 block = 2 KB, Assume the block header overhead is 150 bytes and the fill factor is 0.80., Actual 1 block space = (block size - block header overhead)* fill_factor= (2 KB – 150) * 0.80 = (2048-150)*0.8 = 1518 bytes, Index Blocking Factor, BF = BOTTOM(1518/14) = 108 index records

Cont’ c) #disk blocks necessary for storing 1 M records CEIL (1,000,000/108) = 9,260 blocks (disk blocks) Therefore, we have 9,260 blocks at the leaf level and 9,260 node pointers at the directory level to the leaf level

Cont’ d) The space for directory entries at a higher level: CEIL (9,260/108) = 86 blocks e) Now we can create a root of the index tree with a single block. Thus, the index tree for this example consists of 3 levels: Root: 1 block 2nd level: 86 blocks 3rd level: 9,260 blocks

Cont’ f) Finding the index entry needs only 3 block accesses in this example. log108 (1 million) < 3 This is more efficient than disk binary search log2 (1 million) = 20 block access In general, for a tree of depth K with the blocking factor BF, The total #leaf-level entries = (BF)^k The max. #records for 3 level index tree with BF = 108 is (108)^3 = 1,259,712 records

Cont’ The access to the data requires: ( #access to the index) + ( 1 additional disk access to the data file )= (3 disk accesses to the index ) + ( 1 to the data file) = 4 disk accesses. This means over 1 M records can be searched by 4 disk accesses.If we assume that 1 disk access = 0.025 sec, 4 disk access = 0.1 sec + a few calculations » 0.1 seconds , Note that we used a simplified method for estimating the access time

B Trees and B+ Trees as Dynamic Multi-level Indexes A tree data structure • A node in a search tree of order P: - Each node contain at most P-1 search values and P pointers - Two constraints: 1. Within each node, K1 <K2< .... < Kq-1 2. Subtree nodes has larger values than the LHS search value and smaller values than the RHS search value - Example search tree of order P= 3

Cont’ • A tree is balanced if all of its leaf nodes are at the same level - Ensures that no nodes will be at very high levels (many block accesses) - Deletion may leave some nodes nearly empty (storage waste) • B Tree and B+ Tree structures are variations of search trees that allow efficient insertion and deletion of new search values

Cont’ • In B Tree and B+ Tree structures, each node corresponds to a disk block (or a disk block + offset) • Each node is kept between half-full and completely full • An insertion into a node that is not full is quite efficient; if a node is full, the insertion causes a split into two nodes

Cont’ • Splitting may propagate to other tree levels • A deletion is quite efficient if a node does not become less than half full • If a deletion causes a node to become less than half full, it must be merged with neighboring nodes

B tree: • B trees and its variations were designed to reside on disk, partially memory resident (the root node) • It may contain an entry for every record • In a B tree, pointers to data records exist at all levels of the tree • B tree insertion and deletion algorithm guarantees that the tree will always be balanced (performance vs. complexity).

B+ tree: • In a B+ tree, all pointers to data records exists only at the leaf-level nodes • The leaf nodes have an entry for every search value w/ a pointer to the record • The leaf nodes of the B+ trees are usually linked together to provide ordered access on the search field • Some search field values are repeated in the internal nodes, Causes duplicate key values and more maintenance cost • Faster sequential processing

B* tree: • A B+ tree whose nodes are at least two-thirds filled • Increase storage utilization and speed-up in search • Increase update cost (node full occurs more frequently) • Used in Oracle

Discussion: • Most systems allow users to choose a fill factor b/n 0.5 to 1 • The PK constraint is enforced by creating an index for the primary key • A fully inverted file is a file with a secondary index for every field • The most popular index organization is B trees or B+ trees • An ordered file with multi-level primary index on its ordering key field is called an Indexed Sequential File

Cont’ • IBM's ISAM (Indexed Sequential Access Method) has a 2-level index: - Cylinder index (the key value of an anchor record for each cylinder of a disk pack and a pointer to the track index) - Track index (the key value of an anchor record for each track and a pointer to the track) • IBM's VSAM (Virtual Storage Access Method) is similar to B+ tree structure

Physical Database