260 likes | 331 Views
Explore the latest database management system (DBMS) advancements tailored to support diverse applications and extend beyond typical data processing. Learn about new storage and access methods, query processing techniques, and language extensions. Discover how extensions like abstract data types, user-defined functions, and novel optimization methods can enhance DBMS capabilities and system performance.
E N D
Context Tailoring the DBMS • To support particular applications • Beyond alphanumerical data • Beyond retrieve + process • To support particular hardware • New storage devices • To incorporate novel techniques • New join implementations
Extensibility • Language extensions • Abstract data types (ADT) • User defined functions (UDF) • Data management extensions • New access methods • New storage methods • Query processing extensions • New join methods • New optimization techniques
Starburst Contributions • Revisited internal data structures • Query graph model • Query execution plan: low-level operators and stars • Mechanisms for extensibility • Rules for query rewrite and plan optimization
Predator Contributions • Enhanced abstract data types • Encapsulation principle applied to storage, optimization and evaluation • Type centric DBMS design
Outline • Introduction • Starburst • Language extensions • Data management extensions • Query processing extensions • Predator • E-ADT processing • Summary
Starburst - Language Extensions • User defined functions (1) • Scalar functions • In: one or more field values from a single tuple • Out: a single value • Aggregate functions • In: one or more field values from several tuples • Out: a single value
Starburst - Language Extensions • User defined functions (2) • Set predicate functions • In: a simple predicate and a subquery (defines the range for the predicate) • Out: a boolean value • Table functions • In: one or several table expressions as well as field values • Out: a relation
Starburst – Language Extensions • Abstract data types • Considered useful for: • Type checking • Structuring of users’data • Add-on to the system design
Starburst – Data Management Extensions • Uniform record structure: • Header + offset directory + data area • Advantages: • Support for nested records • Treatment of null values and variable length fields • Inconvenients: • Overhead per record due to the offset directory • Core system services • Logging, recovery manager, predicate evaluator, event queues, lock manager, interface to OS services, debugging, tracing, error reporting.
Starburst – Data Management Extensions • Storage methods [associated to a relation] • Run-time methods for accessing relations: scan, fetch, insert, update, delete, destroy • Implementation: the run-time methods are registered in vector lists • Compile-time cost estimates • Attachments [associated to a relation] • Access methods, integrity constraints and trigger extensions
Starburst – Data Management Extensions • Advantages • New storage methods and attachments can be added without modifying existing code • Limitations • Attachments only called after storage methods • Order in which attachments are called in fixed order
Starburst – Query Processing Extensions Internal representation of queries • Query graph model • Beyond parse trees for the low-level plan operators • Used for query rewrite • Query execution plan • Operator based representation • Strategy alternative rules (stars) to represent execution plan • Used for query plan generation
Query Graph Model • Boxes • Stored relations • Derived relations • Vertices • Setformers iterators: produce tuples for a derived relation • Quantifiers iterators: restrict tuples for a derived relation • Edges • Range edges connecting a vertex and a box: access to a stored or a derived relation • Qualifier edges connecting one or more vertices: conjunction of predicates
Query Rewrite • Objectives: • Equivalent representation for alternative phrasings of a query • Only the DBMS can rewrite queries involving views • Example rules: • Views may be merged • Redundant joins may be eliminated • Selections may be pushed down
Query Rewrite Rules • A rule transforms a QGM into another QGM • Condition / action: IF THEN rules • Rule engine • Forward chaining • Various control strategies for rule application • Search strategy • Top down (depth first / breadth first)/ bottom up
How to Choose Between Alternative Rules? • Cost based decision • Problem: cost estimates are only known at the query execution plan level • Approach: several alternatives are kept in the QGM – CHOOSE operation
Query Execution Plan Execution plan represented using production rules: • Terminals: low-level plan operators • In: 0 or more streams of tuples • Out: 0 or more streams of tuples • Each stream of tuples is tagged with properties • Relational: schema information • Operational: order, location • Estimated: • Non terminals: STAR • Name • Alternative definitions in terms of low-level plan operators or other STARs
Query Execution Plan • A query execution plan is a tree of low-level plan operators • STAR production rules are used for generating query execution plans • General purpose STAR evaluator • Search strategy to choose next STAR to apply • Vector list of stars
Starburst Contributions • Revisited internal data structures • Query graph model • Query execution plan: low-level operators and STARs • Mechanisms for extensibility • Rules for query rewrite and plan optimization
Outline • Introduction • Starburst • Language extensions • Data management extensions • Query processing extensions • Predator • E-ADT processing • Summary
Basic Techniques for ADTs • Vector List of ADTs • Each ADT implements: • Common internal interface for access to ADT values • Functions for storage and indexed retrieval • Methods associated to ADT • ADT methods can be composed • DBMS understands minimal semantics about each method “Black box” ADT Approach
Motivation for E-ADTs • Basic observation: • ADT Methods can be expensive! • Need to identify optimizations on ADT methods • Need to define a framework for applying these optimizations systematically
Possible Optimizations • Algorithmic: • Using different algorithms for each method depending on data characteristics • Transformational: • Changing the order of methods • Constraint: • Pushing physical constraints through a method • Pipelining: • Avoiding materialization of intermediate results
Architectural Framework Each E-ADT supports some of the following enhancements: • Optimization: transforms a method expression into a query execution plan expression • Evaluation: routines to execute the query execution plan expression • Catalog management: routines to store schema information and maintain statistics • Storage management: physical representation of values of its type
E-ADT Rewrite Rules • Some of the optimizations for ADT methods can be applied on a logical representation of queries using rewrite rules
Predator Contributions • Enhanced abstract data types • Encapsulation principle applied to storage, optimization and evaluation • Type centric DBMS design