Context

Context Tailoring the DBMS • To support particular applications • Beyond alphanumerical data • Beyond retrieve + process • To support particular hardware • New storage devices • To incorporate novel techniques • New join implementations

Extensibility • Language extensions • Abstract data types (ADT) • User defined functions (UDF) • Data management extensions • New access methods • New storage methods • Query processing extensions • New join methods • New optimization techniques

Starburst Contributions • Revisited internal data structures • Query graph model • Query execution plan: low-level operators and stars • Mechanisms for extensibility • Rules for query rewrite and plan optimization

Predator Contributions • Enhanced abstract data types • Encapsulation principle applied to storage, optimization and evaluation • Type centric DBMS design

Outline • Introduction • Starburst • Language extensions • Data management extensions • Query processing extensions • Predator • E-ADT processing • Summary

Starburst - Language Extensions • User defined functions (1) • Scalar functions • In: one or more field values from a single tuple • Out: a single value • Aggregate functions • In: one or more field values from several tuples • Out: a single value

Starburst - Language Extensions • User defined functions (2) • Set predicate functions • In: a simple predicate and a subquery (defines the range for the predicate) • Out: a boolean value • Table functions • In: one or several table expressions as well as field values • Out: a relation

Starburst – Language Extensions • Abstract data types • Considered useful for: • Type checking • Structuring of users’data • Add-on to the system design

Starburst – Data Management Extensions • Uniform record structure: • Header + offset directory + data area • Advantages: • Support for nested records • Treatment of null values and variable length fields • Inconvenients: • Overhead per record due to the offset directory • Core system services • Logging, recovery manager, predicate evaluator, event queues, lock manager, interface to OS services, debugging, tracing, error reporting.

Starburst – Data Management Extensions • Storage methods [associated to a relation] • Run-time methods for accessing relations: scan, fetch, insert, update, delete, destroy • Implementation: the run-time methods are registered in vector lists • Compile-time cost estimates • Attachments [associated to a relation] • Access methods, integrity constraints and trigger extensions

Starburst – Data Management Extensions • Advantages • New storage methods and attachments can be added without modifying existing code • Limitations • Attachments only called after storage methods • Order in which attachments are called in fixed order

Starburst – Query Processing Extensions Internal representation of queries • Query graph model • Beyond parse trees for the low-level plan operators • Used for query rewrite • Query execution plan • Operator based representation • Strategy alternative rules (stars) to represent execution plan • Used for query plan generation

Query Graph Model • Boxes • Stored relations • Derived relations • Vertices • Setformers iterators: produce tuples for a derived relation • Quantifiers iterators: restrict tuples for a derived relation • Edges • Range edges connecting a vertex and a box: access to a stored or a derived relation • Qualifier edges connecting one or more vertices: conjunction of predicates

Query Rewrite • Objectives: • Equivalent representation for alternative phrasings of a query • Only the DBMS can rewrite queries involving views • Example rules: • Views may be merged • Redundant joins may be eliminated • Selections may be pushed down

Query Rewrite Rules • A rule transforms a QGM into another QGM • Condition / action: IF THEN rules • Rule engine • Forward chaining • Various control strategies for rule application • Search strategy • Top down (depth first / breadth first)/ bottom up

How to Choose Between Alternative Rules? • Cost based decision • Problem: cost estimates are only known at the query execution plan level • Approach: several alternatives are kept in the QGM – CHOOSE operation

Query Execution Plan Execution plan represented using production rules: • Terminals: low-level plan operators • In: 0 or more streams of tuples • Out: 0 or more streams of tuples • Each stream of tuples is tagged with properties • Relational: schema information • Operational: order, location • Estimated: • Non terminals: STAR • Name • Alternative definitions in terms of low-level plan operators or other STARs

Query Execution Plan • A query execution plan is a tree of low-level plan operators • STAR production rules are used for generating query execution plans • General purpose STAR evaluator • Search strategy to choose next STAR to apply • Vector list of stars

Starburst Contributions • Revisited internal data structures • Query graph model • Query execution plan: low-level operators and STARs • Mechanisms for extensibility • Rules for query rewrite and plan optimization

Outline • Introduction • Starburst • Language extensions • Data management extensions • Query processing extensions • Predator • E-ADT processing • Summary

Basic Techniques for ADTs • Vector List of ADTs • Each ADT implements: • Common internal interface for access to ADT values • Functions for storage and indexed retrieval • Methods associated to ADT • ADT methods can be composed • DBMS understands minimal semantics about each method “Black box” ADT Approach

Motivation for E-ADTs • Basic observation: • ADT Methods can be expensive! • Need to identify optimizations on ADT methods • Need to define a framework for applying these optimizations systematically

Possible Optimizations • Algorithmic: • Using different algorithms for each method depending on data characteristics • Transformational: • Changing the order of methods • Constraint: • Pushing physical constraints through a method • Pipelining: • Avoiding materialization of intermediate results

Architectural Framework Each E-ADT supports some of the following enhancements: • Optimization: transforms a method expression into a query execution plan expression • Evaluation: routines to execute the query execution plan expression • Catalog management: routines to store schema information and maintain statistics • Storage management: physical representation of values of its type

E-ADT Rewrite Rules • Some of the optimizations for ADT methods can be applied on a logical representation of queries using rewrite rules

Predator Contributions • Enhanced abstract data types • Encapsulation principle applied to storage, optimization and evaluation • Type centric DBMS design

Context

Context

Presentation Transcript

Context

Context

Context

Context

Context

Context

Context

Context

Context

Context

context

Context

Context

Context values Context libraries

Context values Context libraries

Context

Context

Context

Context

Context

Context? What Context?

CONTEXT