Loading in 2 Seconds...
Loading in 2 Seconds...
SQL Unit 19: Data Management: Databases and Organizations Richard Watson. Summary of Selections from Chapter 11 prepared by Kirk Scott. Outline of Topics. Relationship between O/S and dbms Indexing Hashing File organization and access Joining B+ trees.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Summary of Selections from Chapter 11 prepared by Kirk Scott
If the dbms and O/S are integrated, the db administrator may have the ability to specify physical storage characteristics of tables:
Indexes are one of the fundamental structures used to provide access to data
In simplest terms, the index can be visualized as a two column look-up table
Therefore, for any look-up key, there may be more than one corresponding record/address
In general, it is possible to have more than one index on a table, on different fields
In reality, an index is not typically implemented as a simple look-up table
The reality is that what you want is not a RRN or an address—what you want is the page that the desired record would be on
The term inter-file clustering refers to storing records from related tables in order
The classic hashing algorithm, which is relatively easy to illustrate, is division-remainder hashing
The idea is that for key values of integer form which are larger (or smaller) than p, you can do integer division by p
The table on the next overhead shows the results of hashing for a given set of values
The address space and the placement of records after the first six hashes in the table above is illustrated in the table on the following overhead
Note that in order to be practical, there have to be limitations on collision handling in this way
If such limitations didn’t exist, existence queries or incorrect key value input wouldn’t be practical
Notice that choosing hashed file organization is a specialized option that would be available to a database administrator
Mother 1, child a, child b, mother 2, child c, mother 3, child d, child e, child f, …
Even though progression through indexes is linear, the retrieval of records from the tables may not follow this pattern
A preliminary picture of main memory and secondary storage is given on the next overhead
This scheme only makes sense if the size of the tables is such that there would be significant I/O overhead
The term “bucket” refers to one collection of things (table records) that hash to the same value
Hashing now is being used to group together things that have something in common, not to map individual items into an address space
Under the earlier explanation, if you had m items, ideally they would hash to m contiguous values
4. The parameters that have to be tuned in order for this to work involve the size of the hash space (i.e., the number of different buckets) relative to the sizes of the tables to be joined.
During phase 2, all of the pages for corresponding buckets of tables A and B have to fit in memory at the same time
Use a memory resident algorithm to form the join of the bucket contents
Suppose bucket x of Table A consisted of p pages and bucket x of Table B consisted of q pages
Performance: Access to any record (page) is bounded by logn (number of records in file) where n = the number of records per tree (index) node
3. The balance of the tree is desirable because it places an upper bound on the number of pages that have to be read in order to get any value.
4. If + is included in the name of the data structure, this signifies that in addition to providing indexed access to file records, links are provided which allow the records to be accessed in sequential order without traversing the index tree.
In the discussion that follows, the examples will consist of a tree that only contains index values, not records
This field may not be a key field in the table, but in general, when indexing, the field that is being indexed on can be referred to as the key.
In the tree as given, which is pure index, the top two rows form the index set.
From the sequence set it is possible to point to the pages containing the actual table records containing those key values.
Observe that in this example, each index node can contain up to n = 4 pointers, and it can contain up to n – 1 = 3 key values.
That means that the total number of key values possible in the sequence set would be 4 * 4 = 16.
Another thing becomes apparent about B+ trees from looking at the example.
It is also apparent that the index set is sparse while the sequence set is dense.
It will become evident when looking at the rules for inserting values how this situation comes about.
If the index were on a non-unique field, the difference would show up only in the sequence set.
The kinds of test questions you should be able to answer about B+ trees would be like the assignment questions.
In order to maintain balance in a tree, it’s necessary to work from the leaves up.
Enforcing the requirements on the fullness of nodes leads to either splitting or merging.
3. If the destination leaf node is full, split it into 2 nodes and divide the key values evenly between them.
5. When a node is split, the two resulting nodes remain at the same level in the tree and become siblings.
7. In general, when a node is split, the leftmost value in the new right sibling is promoted to the parent.
When a value is inserted into a sequence set node and is promoted from there into the index set, what is promoted is a copy of that value.
9. When you split a child and promote, if the promotion causes a split in the parent, you end up with the following situation:
In other words, when the parent is split, 2 new pointers arise when the number of children only rises by one.
10. If the splitting and promoting process trickles all of the way back up to the root and the root is split, then a new root node is created.
This growth at the root explains why balance is maintained in the tree and no branches become longer than any others.
In developing rules of thumb for this there is another consideration with deletion that leads to more complication than with insertion.
If, however, the sibling to be merged with is over half full, merging alone would not result in the loss of a node.
Just as with splitting, merging can trickle all of the way back up to the root.
Finally, a simple deletion from the sequence set which does not even cause a merge can have an effect on the index set.
Here is one final note of explanation that is directly related to the examples given.
Take the case where an index set node contains 3 key values and 4 pointers for example
If a sequence set node falls to one key value, then it only has one pointer in it, the pointer to the record.
3. If the deletion causes the node to drop below half full, merge it with a sibling, taking the sibling immediately on the right if there is one.
4. If the total number of values merged together can fit into a single node, then leave them in a single node and adjust the values and the pointers in the parent accordingly.
5. If the total number of values merged together still have to be put into two nodes, then redistribute the values evenly between the two nodes and adjust the values and the pointers in the parent accordingly.
6. Now check the parent to see whether due to the adjustments it has fallen below half full.
7. If the parent is no longer half full, repeat the process described above, and merge at the parent level.
8. Deletions can be roughly grouped into four categories with corresponding concerns.
8.3. A deletion which causes a redistribution of values between nodes:
9. If the merging process trickles all of the way back up to the root and the children of the current root are merged into one node, then the current root is replaced with this new node.
12.4 For each B+-tree of Exercise 12.3, show the form of the tree after each of the following series of operations: