Chapter 8. Storage Management and Indexing Techniques - PowerPoint PPT Presentation

chapter 8 storage management and indexing techniques n.
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 8. Storage Management and Indexing Techniques PowerPoint Presentation
Download Presentation
Chapter 8. Storage Management and Indexing Techniques

play fullscreen
1 / 62
Chapter 8. Storage Management and Indexing Techniques
Download Presentation
Download Presentation

Chapter 8. Storage Management and Indexing Techniques

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Chapter 8. Storage Managementand Indexing Techniques Seoul National University Department of Computer Engineering OOPSLA Lab.

  2. Table of Contents • Storage Techniques for Relational DBMS • Storage Techniques for Objects • Clustering Techniques • Indexing Techniques for OODBMS • Object Identifiers • Swizzling

  3. Storage Techniques for Relational DBMS • Disk Organization • Storing Records in RDBMS • Addressing Records with a Slot Vector

  4. Disk Organization • Disk  partitions   segments   pages/blocks • Disk header • # of partitions • the address and  the size of each partition • log for recovery in case of a system crash • Page addresses for each segment are stored in tables • Page = page header + offsets of objects + objects

  5. l1 ln … … DISK partition1 N log partitionn l1 ln header … … segment1 segmentm page1 … … pagei total free space array of offset header adjacent free space Z A F B

  6. Storing Records in RDBMS • Fixed length records • normally stored contiguously on the disk • all the records of a relation can be stored in a single file • Variable length records • stored directly on the disk with an ID • structure of ID is important on the retrieval speed • Structure of ID in System R • high order bits for the segment and the page of the file • low order bits for a record within a page

  7. SLOT RECORD Addressing Records with a Slot Vector • Advantages • as fast as using the complete address of a record • the length of records can be changed • the records can be relocated • often faster than using the purely logical ID

  8. Storage Techniques for Objects • Structure of Objects • Access Patterns to Objects • Approaches to Storage Organization for Objects • Storage and Variable Length and Large Attributes • Storage and Inheritance Hierarchy

  9. Structure of objects • Storage/memory organization must support • objects with both atomic and complex attributes • objects with multi-valued attributes • objects with variant attributes • objects with long field attributes such as multimedia information, texts, images, voice, etc • Efficiency of storage organization depends on • structure of objects and their relations • access pattern which is the way in which the application programs access the objects

  10. Categories of Access Patterns • Access based on the whole object • for applications which execute complex manipulations of objects by means of specialized program • whole object is copied onto the application's memory • direct model • Access based on the attributes of the object • appropriate when large objects need to be accessed • used to retrieve attributes of objects along the aggregation hierarchy • normalized model

  11. Direct Model of Storage Organization(1) • Objects are stored in the same way in which they are defined in the conceptual schema • storage unit = semantic unit • objects of the same class are stored in the same file • Advantages • simplest and same as the one used in RDBMS • transferring of a whole objects is a very efficient • Disadvantages • accesses to a set of attributes of an object can be very expensive

  12. Direct Model of Storage Organization(2) • Situations where direct model is inefficient • variable length attribute • new attributes • the majority of attributes have the null value

  13. Normalized Model of Storage Organization • Decompose an object into atomic components • Each component are stored in different files • Relation between the components is maintained by OIDs

  14. Intermediate Approach • Complex objects are decomposed • Components are grouped together according to access patterns to be stored in the same file • Problem • efficiency depends on having prior knowledge of the exact access pattern for applications

  15. Variable-length and Large Attributes • Normalized method • Property list method • Stream (or demand-page) mechanism • portions of the object can be transferred in increments

  16. Property list(1) • Sequence of triples < identifier, size, value > • identifier : which attribute of the object is stored • size : # of bytes stored • value : that (of varying size) of the attribute

  17. Property List(2) • Advantages • variable length attributes • different set of attributes • sparse attributes • attributes can be stored in different physical locations • Disadvantages • whole property list scanning to find the desired attribute • transformation of the property list to the proper format for the application programming language

  18. Storage and Inheritance hierarchy • Attributes of the superclass should be stored • Single inheritance • storing the attributes of superclass first, then those of subclass • variable length attribute alongside with the property list • Multiple inheritance • property lists • storing objects separately • each of above contains the fields for superclass, and linked to one another

  19. Clustering Techniques • Clustering in DBMS • Clustering in RDBMS • Clustering in OODBMS • Static Clustering • Dynamic Clustering • Clustering for Multiple Relations

  20. Clustering in DBMS • Focus • partitioning objects in the database • placing these partitions on disk • Aim • reduce the number of I/O operations on disk • Consideration • structure of the objects • access pattern of applications

  21. Clustering Techniques for RDBMS • Tuples of a relation in the same page segment • on the basis of the value of an attribute or of a combination of attributes in a relation • Tuples of more than one relation in the same segment • one or more attributes in common with the same values • efficient for processing queries with join operation

  22. Clustering Techniques for OODBMS • New considerations compared with RDBMS • complex objects • single or multiple inheritance • methods • Linear clustering sequence for complex object • all the descendant nodes of each node p in the hierarchy are stored immediately after p in depth-first order • efficient on retrieval of an object and all its descendants

  23. Basic Options for Clustering for OODBMS • Proposed by Won Kim in 1990 • both clustering techniques as in RDBMS • clustering all the instances of classes which belong to an aggregation hierarchy • clustering all the instances of classes which belong to the inheritance hierarchy • combination of the two previous strategies • The clustering strategies above are static

  24. Static Clustering • Unchangeable at run-time • Problems • no considerations on the dynamic evolution of objects • objects can be shared among several objects • clustering schema based on the single access pattern

  25. Dynamic Clustering • The sequence of creation of objects would NOT be the same as the desired clustering sequence. • Reorganizing and recompacting pages in a cluster • Types of file reorganization • on-line : optimal one is NP-complete problem • off-line : when the reorganization will be done? • On-line reorganization technique by Chen, Hurson • chunks(set of pages) as the unit of clustering • cost model • ratio between the read and write operations

  26. Clustering for Multiple Relations • Certain relationships can be used more frequently • Direct graph • nodes for objects • arcs for relationships • weights for ordering relationships • Clustering algorithm with levels by Chen, Hurson • arranges all the nodes of the graph in a linear sequence • nodes connected by heavier arcs are nearer than others • access time is around half that for objects randomly

  27. Indexing Techniques for OODBMS • Indexing Techniques for Aggregation Hierarchy • Index Structures and Operations • Comparison of Index Organization • Indexing Techniques for Inheritance Hierarchy • Precomputing and Caching

  28. Preliminary Definitions • Path • a branch in an aggregation hierarchy •  Path instantiation • a sequence of objects obtained by instantiating the path • Nested index • an index for a direct connection between the starting object and the ending object of the path instantiation • Path index • an index for storing instantiation of a path • same index key as nested index Index Key

  29. Project Company Division Person Example of Aggregation Hierarchy

  30. Definition of Path • Given an aggregation hierarchy H, a path P is defined as C1.A1.A2…..An(n  1) where • C1 is a class in H • A1 is an attribute of class C1 • Ai is an attribute of class Ci in H, such that Ci is the domain of the attribute Ai - 1 of class Ci - 1 (1< i  n ) • length(P) : the length of the path • classes(P) : the set of classes along the path • dom(P) : the domain of attribute An of class Cn

  31. Examples of Path • • length( P1) = 4 • classes( P1) = { Project, Company, Division, Person } • dom( P1) = STRING • P2 : • length(P2) = 2 • classes(P2) = { Project, Division } • dom(P2) = STRING

  32. Definition of Complete Instantiation • Complete instantiation is a sequence of objects along path • Given the path P = C1.A1.A2…..An , CI is denoted as O1.O2…..On+1 , where • O1 is an instance of class C1 • Oi is the value of the attribute Ai - 1 of object Oi - 1 • Oi = Oi - 1 .Ai - I or Oi  Oi - i . Ai - i (1 i  n +1) • Examples of CI, where path is given as P1 • Project[i].Company[k].Division[k].Person[x].Jones • Project[j].Company[i].Division[h].Person[y].Smith

  33. Definition of Partial Instantiation • Partial instantiation is the part of CI, which ends at the last object of CI • Given a path P = C1.A1.A2…..An, PI is denoted as O1.O2…..Oj (j<n+1), where • O1 is an instance of class Ck in Class(P) such that k+j-1=n+1 • Oi is the value of attribute Ai - 1 of an object Oi - 1 • Examples of PI, where path is given as P1 • Division[k].Person[x].Jones • Division[h].Person[y].Smith

  34. Definition of Redundancy • Given a PI as O1.O2…..Oj, it is not redundant • if there are no CI or PI as O'1.O'2…..O’k, where k>j and Oi = O’k - j + 1 (i=1,...,j) • Examples of redundant PI • Division[k].Person[x].Jones is redundant to Project[i].Company[k].Division[k].Person[x].Jones • Division[h].Person[y].Smith is redundant to Project[j].Company[i].Division[h].Person[y].Smith

  35. Definition of Projection of Path • Projection of Path is the part of CI or PI, which begins from the first object of it • <m>(p) denotes a projection of p with a length m • P = C1.A1.A2…..An • as PI (or CI) of P, p= O1.O2.O3…..Oj(j  n+1) • <m>(p)= O1.O2.…..Om (m<j) • Example • <2>(Project[i].Company[k].Division[k].Person[x].Jones) == Project[i].Company[k]

  36. Multi-index • Index to each of the classes constituting the path • Multi-index is a set of n simple indices I1, I2 ,…,In • given a path P = C1.A1.A2…..An • Ii is an index defined on Ci .Ai, 1 i  n • Solving a nested predicate scans n indices • first scanning the last index In on the path • the results of the scan using Ii are used as keys for Ii-1 • Only for reverse traversal scanning strategies • Low updating cost

  37. Examples of Multi-index • First index I1 on Project.main_contracting_company • (Company[k], {Project[i]}) • (Company[i], {Project[j], Project[l]}) • Second index I2 on Company.divisions • (Division[h], {Company[i]}) • (Division[i], {Company[i]}) • (Division[k], {Company[k]}) • Third index I3 on • (Boston, {Division[h]}) • (New York, {Division[i]}) • (Los Angeles, {Division[k]})

  38. Example of Using Multi-index • Select all the projects with a main contracting company which has a division in Los Angeles • Scanning index I3 with the key-value = Los Angeles • {Division[k]} • Scanning index I2 with the key-value = Division[k] • {Company[k]} • Scanning index I1 with the key-value = Company[k] • {Project[i]} • Result: {Project[i]}

  39. Join Index • To perform joins in relational model efficiently • Binary join index for binary relation (r, s) • one index clustered on r • the other index clustered on s • BJI can be used in a multi-index organization • reverse traversal • faster forward traversal in cases of high access costs to objects since no database access for objects • more suitable for complex queries

  40. Nested Index • Direct association between the ending object and the starting object in path • Given a path P = C1.A1.A2…..An, nested index on P is defined as a set of pairs (O,S) • S = {O' such that there is O1.O2…..On+1 as a CIwhere O' = O1 and O = On+1} • Examples • (Boston, {Project[j]}) • (New York, {Project[j], Project[k], Project[l]}) • (Los Angeles, {Project[i]})

  41. Properties of Nested Index • Retrieval is quite fast for scanning only one index • Problem on update operation • the access to several objects • forward traversal to determine the value of the indexed attribute • reverse traversal to determine all instances at the beginning of the path ==> inverse references

  42. Path Index • Given a key, all the path instantiations are stored • Given a path P=C1.A1.A2…..An, a path index on P is defined as a set of pairs (O,S) where S={<j-1>(pi), • pi = O1.O2.O…..On (1 j n+1) is a CI or non-redundant PI of P • Oj = O } • Examples • (Boston, {Project[j].Company[i].Division[h]}) • (New York, {Project[j].Company[i].Division[i], Project[k].Company[m].Division[j], Project[l].Company[i].Division[i]})

  43. Properties of Path Index • For nested predicates in all classes along the path • Updates of a path index • only forward traversals are required • Identical with nested index where n = 1

  44. Access Relations • Similar to path indices • storing all instantiations along a path in a relation • Examples • <Project[i], Company[k], Division[k], Los Angeles> • <Project[j], Company[i], Division[h], Boston> • <Project[j], Company[i], Division[i], New York> • <Project[k], Company[m], Division[j], New York> • <Project[l], Company[i], Division[h], Boston> • <Project[l], Company[i], Division[i], New York> • Several subpaths to different relations

  45. Index Structures using B+tree • Structure of the internal node • n records of <key-length, key, pointer>  • A record of a leaf node in a nested index • record-length • key-length, key-value •  # of OIDs associated with the key • list of OIDs • A record of a leaf node in a path index • record-length • key-length, key-value •  # of the path instantiations associated with the key •  list of path instantiations

  46. Operations with Nested Index • To solve a predicate against a nested attribute An of class C1 • single index scan • same cost to solve the predicate on a simple attribute of C1 • For update operation • one forward traversal to find the old key value • another one forward traversal to find the new key value • one reverse traversal to find the OID of associated object

  47. Operations with Path Index • To solve a predicate against the nested attribute An of class Ci (1 i  n)   • one index scan • determine the PI or CI associated with the key value • extract the OIDs occupying the i-th position of them • For update operation • one forward traversal to find the old path instantiation • another one forward traversal to find the new path instantiation

  48. Comparisons of Index Organizations(1) • Degree of reference sharing • important in evaluating an index organization • reference is shared when two or more objects refer to the same object • Retrieval operation • nested index has the lowest cost • path index has a lower cost than the multi-index • nested index has better performance than the path index • path index allows predicates to be solved for all the classes along a path but not nested index

  49. Comparisons of Index Organizations(2) • Update operation • the multi-index has the lowest cost • for paths with a length 2 • nested index has slightly lower cost than the path index • for paths with a length greater than 2 • nested index has slightly lower cost than the path index if the updates are executed on the first two classes • In other cases • nested index involves a significantly higher cost

  50. Indexing Techniques for Inheritance Hierarchies • Scope of a query • only a given class C • the class C and the inheritance hierarchy rooted in C • Solution based on conventional indices • construct an index on an attribute for each of the classes of the subgraph • scan all these indices • perform the union of their result