ordb implementation discussion n.
Skip this Video
Download Presentation
ORDB Implementation Discussion

Loading in 2 Seconds...

play fullscreen
1 / 36

ORDB Implementation Discussion - PowerPoint PPT Presentation

  • Uploaded on

ORDB Implementation Discussion. From RDB to ORDB. Issues to address when adding OO extensions to DBMS system. Layout of Data. Deal with large data types : ADTs/blobs special-purpose file space for such data, with special access methods Large fields in one tuple :

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'ORDB Implementation Discussion' - carrie

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
from rdb to ordb
From RDB to ORDB

Issues to address when

adding OO extensions

to DBMS system

layout of data
Layout of Data
  • Deal with large data types : ADTs/blobs
    • special-purpose file space for such data, with special access methods
  • Large fields in one tuple :
    • One single tuple may not even fit on one disk page
    • Must break into sub-tuples and link via disk pointers
  • Flexible layout :
    • constructed types may have flexible sized sets, , e.g., one attribute can be a set of strings.
    • Need to provide meta-data inside each type concerning layout of fields within the tuple
    • Insertion/deletion will cause problems when contiguous layout of ‘tuples’ is assumed
layout of data1
Layout of Data
  • More layout design choices (clustering on disk):
    • Lay out complex object nested and clustered on disk (if nested and not pointer based)
    • Where to store objects that are referenced (shared) by possibly several other and different structures
    • Many design options for objects that are in a type hierarchy with inheritance
    • Constructed types such as arrays require novel methods, like array chunking into (4x4) subarrays for non-continuous access
why object identifier
Why (Object) Identifier ?
  • Distinguish objects regardless of content and location
  • Evolution of object over time
  • Sharing of objects without copying
  • Continuity of identity (persistence)
  • Versions of a single object
objects oids keys
  • Relational keys: RDB

human meaningful name

(mix data value with identity)

  • Variable name : PL

give name to objects in program

(mix addressability with identity)

  • Object identifier : ODB

system-assigned globally unique name

(location- and data-independent )

  • System generated
  • Globally unique
  • Logical identifier (not physical representation; flexibility in relocation)
  • Remains valid for lifetime of object (persistent)
oid support
OID Support
  • OID generation :
    • uniqueness across time and system
  • Object handling :
    • Operations to test equality/identify
    • Operations to manipulate OIDs for object merging and copying.
    • Deal with avoiding dangling references
oid implementation
OID Implementation
  • By address (physical)
    • 32 bits; direct fast access like a pointer
  • By structured address
    • E.g., page and slot number
    • Both some physical and logical information
  • By surrogates
    • Purely logical oid
    • Use some algorithm to assure uniqueness
  • By typed surrogates
    • Contains both type id and object id
    • Determine type of object without fetching it
  • Type representation: size/storage
  • Type access : import/export
  • Type manipulation: special methods to serve as filter predicates and join predicates
  • Special-purpose index structures : efficiency
  • Mechanism to add index support along with ADT:
    • External storage of index file outside DBMS
    • Provide “access method interface” a la:
      • Open(), close(), search(x), retrieve-next()
      • Plus, statistics on external index
    • Or, generic ‘template’ index structure
      • Generalized Search Tree (GiST) – user-extensible
      • Concurrency/recovery provided
query processing
Query Processing
  • Query Parsing :
    • Type checking for methods
    • Subtyping/Overriding
  • Query Rewriting:
    • May translate path expressions into join operators
    • Deal with collection hierarchies (UNION?)
    • Indices or extraction out of collection hierarchy
query optimization core
Query Optimization Core
  • New algebra operators must be designed :
    • such as nest, unnest, array-ops, values/objects, etc.
  • Query optimizer must integrate them into optimization process :
    • New Rewrite rules
    • New Costing
    • New Heuristics
query optimization revisited
Query Optimization Revisited
  • Existing algebra operators revisited : SELECT
  • Where clause expressions can be expensive
  • So SELECT pushdown may be bad heuristic
selection condition rewriting
Selection Condition Rewriting
  • (tuple.attribute < 50)
    • Only CPU time (on the fly)
  • (tuple.location OVERLAPS lake-object)
    • Possibly complex CPU-heavy computations
    • May Involve both IO and CPU costs
  • State-of-art:
    • consider reduction factor only
  • Now, we must consider both factors:
    • Cost factor : dramatic variations
    • Reduction factor: unrelated to cost factor
ordering of select operators
Ordering of SELECT Operators
  • Cost factor : now could be dramatic variations
  • Reduction factor: orthogonal to cost factor
  • We want maximal reduction and minimal cost:

Rank ( operator ) = (reduction) * ( 1/cost )

  • Order operators by increasing ‘rank’
  • High rank :
    • (good) -> low in cost, and large reduction
  • Low rank
    • (bad) -> high in cost, and small reduction
access structures indices on what
Access Structures/Indices ( on what ?)
  • Indices that are ADT specific
  • Indices on navigation path
  • Indices on methods, not just on columns
  • Indices over collection hierarchies (trade-offs)
  • Indices for new WHERE clause expressions not just =, <, > ; but also “overlaps”,”similar”
registering new index to optimizer
Registering New Index (to Optimizer)
  • What WHERE conditions it supports
  • Estimated cost for “matching tuple” (IO/CPU)
    • Given by index designer (user?)
    • Monitor statistics; even construct test plans
  • Estimation of reduction factors/join factors
    • Register auxiliary function to estimate factor
    • Provide simple defaults
  • Use ADT/methods in query specification
  • Achieves:
    • flexibility
    • extensibility
  • Extensibility : Dynamic linking of methods defined outside DB
  • Flexibility : Overwriting methods for type hierarchy
  • Semantics :
    • Use of “methods” with implied semantics?
    • Incorporation of methods into query process may cause side-effects?
    • Performance of methods may be unpredictable ?
    • Termination may not be guaranteed?
  • “Untrusted” methods :
    • corrupt server
    • modify DB content (side effects)
  • Handling of “untrusted” methods :
    • restrict language;
    • interpret vs compile,
    • separate address space of DB server
query optimization with methods
Query Optimization with Methods
  • Estimation of “costs” of method predicates
    • See earlier discussion
  • Optimization of method execution:
    • Methods may be very expensive to execute
    • Idea: Similar as handling correlated nested subqueries
      • Recognize repetition and rewrite physical plan.
      • Provide some level of pre- computation and reuse
strategies for method execution
Strategies for Method Execution
  • 1. If called on same input, cache that one result
  • 2. If on full column, presort column first (groupby)
  • 3. Or, in general use full precomputation:
    • Precompute results for all domain values (parameters)
    • Put in hash-table : fct (val );
    • During query processing lookup in hash-table val  fct (val)
    • Or, possibly even perform a join with this table
query processing1
Query Processing
  • User-defined methods
  • User-defined aggregate functions:
    • E.g., “second largest” or “most brightest picture”
  • Distributive aggregates:
    • incremental computation
query processing distribute aggregates
Query Processing: Distribute Aggregates
  • For incremental computation of distributive aggregates:
  • Provide:
    • Initialize(): set up state space
    • Iterate(): per tuple update the state
    • Terminate(): compute final result based on state; and cleanup state
  • For example : “second largest”
    • Initialize(): 2 fields
    • Iterate(): per tuple compare numbers
    • Terminate(): remove 2 fields
following disk pointers
Following Disk Pointers?
  • Complex object structures with object pointers may exist (~ disk pointers)
  • Navigate complex objects following pointers
  • Long-running transaction like in CAD design may work with complex object for longer duration
  • Question : What to do about “pointers” between subobjects or related objects ?
following disk pointers options
Following Disk Pointers: Options
  • Swizzle :
    • Swizzle = Replace OIDs references by in-memory pointers
    • Unswizzle = Convert back to disk-pointers when flushing to disk.
  • Issues :
    • In-memory table of OIDs and their state
    • Indicate in each object, pointer type via a bit.
  • Different policies for swizzling:
    • never
    • on access
    • attached to object brought in
  • We may want both persistent and transient data
  • Why ?
    • Programming language variables
    • Handle intermediate data
    • May want to apply queries to transient data
properties for persistence
Properties for Persistence?
  • Orthogonal to types :
    • Data of any type can be persistent
  • Transparent to programmer :
    • Programmer can treat persistent and non-persistent objects the same way
  • Independent from mass storage:
    • No explicit read and write to persistent database
models of persistence
Models of Persistence
  • Persistence by type
  • Persistence by call
  • Persistence by reachability
model of persistence by type
Model of Persistence : by type
  • Parallel type systems:
    • Persistence by type, e.g., int and dbint
    • Programmer is responsible to make objects persistent
    • Programmer must make decision at object creation time
    • Allow for user control by “casting” types
model of persistence by call
Model of Persistence : by call
  • Persistence by explicit call
    • Explicit create/delete to persistent space
    • E.g., objects must be placed into “persistent containers” such as relations in order to be kept around
    • Eg., Insert object into Collection MyBooks;
    • Could be rather dynamic control without casting
    • Relatively simple to implement by DBMS
model of persistence by reachability
Model of Persistence: by reachability
  • Persistence by reachability :
    • Use global (or named) variables to objects and structures
    • Objects being referenced by other objects that are reachable by application, then they are also persistent by transitivity
    • No explicit deletes; rather need garbage collection to garbage the objects away once no longer referenced
    • Garbage collection techniques :
      • mark&sweep : mark all objects reachable from persistent roots; then delete others
      • scavenging : copy all reachable objects from one space to the other; but may suffer in disk-based environment due to IO overhead and distruction of clustering
  • A lot of work to get to OO support :

From physical database design/layout issues up to

logical query optimizer extensions

  • ORDB:

Reuses existing implementation base and incrementally adds new features on (but relation is first-class citizen)