Ordb implementation discussion
Download
1 / 36

ORDB Implementation Discussion - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

ORDB Implementation Discussion. From RDB to ORDB. Issues to address when adding OO extensions to DBMS system. Layout of Data. Deal with large data types : ADTs/blobs special-purpose file space for such data, with special access methods Large fields in one tuple :

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' ORDB Implementation Discussion' - carrie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

From rdb to ordb
From RDB to ORDB

Issues to address when

adding OO extensions

to DBMS system


Layout of data
Layout of Data

  • Deal with large data types : ADTs/blobs

    • special-purpose file space for such data, with special access methods

  • Large fields in one tuple :

    • One single tuple may not even fit on one disk page

    • Must break into sub-tuples and link via disk pointers

  • Flexible layout :

    • constructed types may have flexible sized sets, , e.g., one attribute can be a set of strings.

    • Need to provide meta-data inside each type concerning layout of fields within the tuple

    • Insertion/deletion will cause problems when contiguous layout of ‘tuples’ is assumed


Layout of data1
Layout of Data

  • More layout design choices (clustering on disk):

    • Lay out complex object nested and clustered on disk (if nested and not pointer based)

    • Where to store objects that are referenced (shared) by possibly several other and different structures

    • Many design options for objects that are in a type hierarchy with inheritance

    • Constructed types such as arrays require novel methods, like array chunking into (4x4) subarrays for non-continuous access


Why object identifier
Why (Object) Identifier ?

  • Distinguish objects regardless of content and location

  • Evolution of object over time

  • Sharing of objects without copying

  • Continuity of identity (persistence)

  • Versions of a single object


Objects oids keys
Objects/OIDs/Keys

  • Relational keys: RDB

    human meaningful name

    (mix data value with identity)

  • Variable name : PL

    give name to objects in program

    (mix addressability with identity)

  • Object identifier : ODB

    system-assigned globally unique name

    (location- and data-independent )


OIDs

  • System generated

  • Globally unique

  • Logical identifier (not physical representation; flexibility in relocation)

  • Remains valid for lifetime of object (persistent)


Oid support
OID Support

  • OID generation :

    • uniqueness across time and system

  • Object handling :

    • Operations to test equality/identify

    • Operations to manipulate OIDs for object merging and copying.

    • Deal with avoiding dangling references


Oid implementation
OID Implementation

  • By address (physical)

    • 32 bits; direct fast access like a pointer

  • By structured address

    • E.g., page and slot number

    • Both some physical and logical information

  • By surrogates

    • Purely logical oid

    • Use some algorithm to assure uniqueness

  • By typed surrogates

    • Contains both type id and object id

    • Determine type of object without fetching it


ADTs

  • Type representation: size/storage

  • Type access : import/export

  • Type manipulation: special methods to serve as filter predicates and join predicates

  • Special-purpose index structures : efficiency


ADTs

  • Mechanism to add index support along with ADT:

    • External storage of index file outside DBMS

    • Provide “access method interface” a la:

      • Open(), close(), search(x), retrieve-next()

      • Plus, statistics on external index

    • Or, generic ‘template’ index structure

      • Generalized Search Tree (GiST) – user-extensible

      • Concurrency/recovery provided


Query processing
Query Processing

  • Query Parsing :

    • Type checking for methods

    • Subtyping/Overriding

  • Query Rewriting:

    • May translate path expressions into join operators

    • Deal with collection hierarchies (UNION?)

    • Indices or extraction out of collection hierarchy


Query optimization core
Query Optimization Core

  • New algebra operators must be designed :

    • such as nest, unnest, array-ops, values/objects, etc.

  • Query optimizer must integrate them into optimization process :

    • New Rewrite rules

    • New Costing

    • New Heuristics


Query optimization revisited
Query Optimization Revisited

  • Existing algebra operators revisited : SELECT

  • Where clause expressions can be expensive

  • So SELECT pushdown may be bad heuristic


Selection condition rewriting
Selection Condition Rewriting

  • EXAMPLE:

  • (tuple.attribute < 50)

    • Only CPU time (on the fly)

  • (tuple.location OVERLAPS lake-object)

    • Possibly complex CPU-heavy computations

    • May Involve both IO and CPU costs

  • State-of-art:

    • consider reduction factor only

  • Now, we must consider both factors:

    • Cost factor : dramatic variations

    • Reduction factor: unrelated to cost factor



Ordering of select operators
Ordering of SELECT Operators

  • Cost factor : now could be dramatic variations

  • Reduction factor: orthogonal to cost factor

  • We want maximal reduction and minimal cost:

    Rank ( operator ) = (reduction) * ( 1/cost )

  • Order operators by increasing ‘rank’

  • High rank :

    • (good) -> low in cost, and large reduction

  • Low rank

    • (bad) -> high in cost, and small reduction


Access structures indices on what
Access Structures/Indices ( on what ?)

  • Indices that are ADT specific

  • Indices on navigation path

  • Indices on methods, not just on columns

  • Indices over collection hierarchies (trade-offs)

  • Indices for new WHERE clause expressions not just =, <, > ; but also “overlaps”,”similar”


Registering new index to optimizer
Registering New Index (to Optimizer)

  • What WHERE conditions it supports

  • Estimated cost for “matching tuple” (IO/CPU)

    • Given by index designer (user?)

    • Monitor statistics; even construct test plans

  • Estimation of reduction factors/join factors

    • Register auxiliary function to estimate factor

    • Provide simple defaults


Methods
Methods

  • Use ADT/methods in query specification

  • Achieves:

    • flexibility

    • extensibility


Methods1
Methods

  • Extensibility : Dynamic linking of methods defined outside DB

  • Flexibility : Overwriting methods for type hierarchy

  • Semantics :

    • Use of “methods” with implied semantics?

    • Incorporation of methods into query process may cause side-effects?

    • Performance of methods may be unpredictable ?

    • Termination may not be guaranteed?


Methods2
Methods

  • “Untrusted” methods :

    • corrupt server

    • modify DB content (side effects)

  • Handling of “untrusted” methods :

    • restrict language;

    • interpret vs compile,

    • separate address space of DB server


Query optimization with methods
Query Optimization with Methods

  • Estimation of “costs” of method predicates

    • See earlier discussion

  • Optimization of method execution:

    • Methods may be very expensive to execute

    • Idea: Similar as handling correlated nested subqueries

      • Recognize repetition and rewrite physical plan.

      • Provide some level of pre- computation and reuse


Strategies for method execution
Strategies for Method Execution

  • 1. If called on same input, cache that one result

  • 2. If on full column, presort column first (groupby)

  • 3. Or, in general use full precomputation:

    • Precompute results for all domain values (parameters)

    • Put in hash-table : fct (val );

    • During query processing lookup in hash-table val  fct (val)

    • Or, possibly even perform a join with this table


Query processing1
Query Processing

  • User-defined methods

  • User-defined aggregate functions:

    • E.g., “second largest” or “most brightest picture”

  • Distributive aggregates:

    • incremental computation


Query processing distribute aggregates
Query Processing: Distribute Aggregates

  • For incremental computation of distributive aggregates:

  • Provide:

    • Initialize(): set up state space

    • Iterate(): per tuple update the state

    • Terminate(): compute final result based on state; and cleanup state

  • For example : “second largest”

    • Initialize(): 2 fields

    • Iterate(): per tuple compare numbers

    • Terminate(): remove 2 fields


Following disk pointers
Following Disk Pointers?

  • Complex object structures with object pointers may exist (~ disk pointers)

  • Navigate complex objects following pointers

  • Long-running transaction like in CAD design may work with complex object for longer duration

  • Question : What to do about “pointers” between subobjects or related objects ?


Following disk pointers options
Following Disk Pointers: Options

  • Swizzle :

    • Swizzle = Replace OIDs references by in-memory pointers

    • Unswizzle = Convert back to disk-pointers when flushing to disk.

  • Issues :

    • In-memory table of OIDs and their state

    • Indicate in each object, pointer type via a bit.

  • Different policies for swizzling:

    • never

    • on access

    • attached to object brought in


Persistence
Persistence?

  • We may want both persistent and transient data

  • Why ?

    • Programming language variables

    • Handle intermediate data

    • May want to apply queries to transient data


Properties for persistence
Properties for Persistence?

  • Orthogonal to types :

    • Data of any type can be persistent

  • Transparent to programmer :

    • Programmer can treat persistent and non-persistent objects the same way

  • Independent from mass storage:

    • No explicit read and write to persistent database


Models of persistence
Models of Persistence

  • Persistence by type

  • Persistence by call

  • Persistence by reachability


Model of persistence by type
Model of Persistence : by type

  • Parallel type systems:

    • Persistence by type, e.g., int and dbint

    • Programmer is responsible to make objects persistent

    • Programmer must make decision at object creation time

    • Allow for user control by “casting” types


Model of persistence by call
Model of Persistence : by call

  • Persistence by explicit call

    • Explicit create/delete to persistent space

    • E.g., objects must be placed into “persistent containers” such as relations in order to be kept around

    • Eg., Insert object into Collection MyBooks;

    • Could be rather dynamic control without casting

    • Relatively simple to implement by DBMS


Model of persistence by reachability
Model of Persistence: by reachability

  • Persistence by reachability :

    • Use global (or named) variables to objects and structures

    • Objects being referenced by other objects that are reachable by application, then they are also persistent by transitivity

    • No explicit deletes; rather need garbage collection to garbage the objects away once no longer referenced

    • Garbage collection techniques :

      • mark&sweep : mark all objects reachable from persistent roots; then delete others

      • scavenging : copy all reachable objects from one space to the other; but may suffer in disk-based environment due to IO overhead and distruction of clustering



Summary
Summary

  • A lot of work to get to OO support :

    From physical database design/layout issues up to

    logical query optimizer extensions

  • ORDB:

    Reuses existing implementation base and incrementally adds new features on (but relation is first-class citizen)


ad