Loading in 2 Seconds...
Loading in 2 Seconds...
Data Management: Databases and Organizations Richard Watson. Summary of Selections from Chapter 11 prepared by Kirk Scott. Outline of Topics. Relationship between O/S and dbms Indexing Hashing File organization and access Joining B+ trees. Relationship between O/S and dbms.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Summary of Selections from Chapter 11 prepared by Kirk Scott
If the dbms and O/S are integrated, the db administrator may have the ability to specify physical storage characteristics of tables:
Indexes are one of the fundamental structures used to provide access to data
In simplest terms, the index can be visualized as a two column look-up table
Therefore, for any look-up key, there may be more than one corresponding record/address
In general, it is possible to have more than one index on a table, on different fields
In reality, an index is not typically implemented as a simple look-up table
The reality is that what you want is not a RRN or an address—what you want is the page that the desired record would be on
The term inter-file clustering refers to storing records from related tables in order
The classic hashing algorithm, which is relatively easy to illustrate, is division-remainder hashing
The idea is that for key values of integer form which are larger (or smaller) than p, you can do integer division by p
The table on the next overhead shows the results of hashing for a given set of values
The address space and the placement of records after the first six hashes in the table above is illustrated in the table on the following overhead
Note that the possibility of collisions imposes a restriction on this scheme
Notice that choosing hashed file organization is a specialized option that would be available to a database administrator
In theory, the cost of joining would be manageable under these conditions:
Nested loop would be the naïve option for files in arrival order with no indexes on the joining fields
If both tables are indexed on their joining fields, then merge join becomes possible on the indexes
This scheme only makes sense if the size of the tables is such that there would be significant I/O overhead
There is a significant difference in the use of hashing here compared to the earlier explanation
The term “bucket” refers to one collection of things (table records) that hash to the same value
What follows is a numbered list of assumptions needed in order to explain how hash join works:
4. The parameters that have to be tuned in order for this to work involve the size of the hash space (i.e., the number of different buckets) relative to the sizes of the tables to be joined.
For phase 1, this means that the number of buckets total can’t exceed the size of memory in pages allocated to the process
If all of the necessary conditions are met, then phase 1 of the hash join algorithm can be described as follows:
Observe that it doesn’t matter if the memory-resident algorithm is something inefficient like O(n2)
3. The balance of the tree is desirable because it places an upper bound on the number of pages that have to be read in order to get any value.
This field may not be a key field in the table, but in general, when indexing, the field that is being indexed on can be referred to as the key.
In the tree as given, which is pure index, the top two rows form the index set.
From the sequence set it is possible to point to the pages containing the actual table records containing those key values.
Observe that in this example, each index node can contain up to n = 4 pointers, and it can contain up to n – 1 = 3 key values.
That means that the total number of key values possible in the sequence set would be 4 * 4 = 16.
Another thing becomes apparent about B+ trees from looking at the example.
It is also apparent that the index set is sparse while the sequence set is dense.
It will become evident when looking at the rules for inserting values how this situation comes about.
The kinds of test questions you should be able to answer about B+ trees would be like the assignment questions.
In order to maintain balance in a tree, it’s necessary to work from the leaves up.
Enforcing the requirements on the fullness of nodes leads to either splitting or merging.
4. Notice that in all of the examples the nodes hold an odd number of values.
5. When a node is split, the two resulting nodes remain at the same level in the tree and become siblings.
7. In general, when a node is split, the leftmost value in the new right sibling is promoted to the parent.
When a value is inserted into a sequence set node and is promoted from there into the index set, what is promoted is a copy of that value.
9. When you split and promote, if the promotion causes another split in the parent, you end up with the following situation:
In other words, when the parent is split, 2 new pointers arise when the number of children only rises by one.
10. If the splitting and promoting process trickles all of the way back up to the root and the root is split, then a new root node is created.
This growth at the root explains why balance is maintained in the tree and no branches become longer than any others.
In developing rules of thumb for this there is another consideration with deletion that leads to more complication than with insertion.
If, however, the sibling to be merged with is over half full, merging alone would not result in the loss of a node.
Just as with splitting, merging can trickle all of the way back up to the root.
Finally, a simple deletion from the sequence set which does not even cause a merge can have an effect on the index set.
Here is one final note of explanation that is directly related to the examples given.
Take the case where an index set node contains 3 key values and 4 pointers for example
3. If the deletion causes the node to drop below half full, merge it with a sibling, taking the sibling immediately on the right if there is one.
4. If the total number of values merged together can fit into a single node, then leave them in a single node and adjust the values and the pointers in the parent accordingly.
5. If the total number of values merged together still have to be put into two nodes, then redistribute the values evenly between the two nodes and adjust the values and the pointers in the parent accordingly.
6. Now check the parent to see whether due to the adjustments it has fallen below half full.
7. If the parent is no longer half full, repeat the process described above, and merge at the parent level.
8. Deletions can be roughly grouped into four categories with corresponding concerns.
8.3. A deletion which causes a redistribution of values between nodes:
9. If the merging process trickles all of the way back up to the root and the children of the current root are merged into one node, then the current root is replaced with this new node.
12.4 For each B+-tree of Exercise 12.3, show the form of the tree after each of the following series of operations: