Inclusion of New Types in Relational Database Systems

Inclusion of New Types in Relational Database Systems Michael Stonebraker

Why ORDBMS? • Allows the addition of complex data and the use of a query language to access this data (e.g. insurance) • Conveniently supports new applications (e.g. Internet, photography)

Complete Extended Type System • Allow the definition of user-defined data types (e.g. 2-D boxes) • Allow the definition of new operators for these data types (e.g. overlaps, contained in) • Allow the implementation of new access methods for data types (e.g. R-trees) • Allow optimized query processing for commands containing new data types and operators

Motivating Example Consider a relation consisting of data on two dimensional boxes. It can be represented by an identifier and the coordinates of two corner points as follows: create box (id = i4, x1 = f8, x2 = f8, y1 = f8, y2 = f8) Now consider the query to find all the boxes that overlap the unit square, i.e. the box with coordinates (0, 1, 0, 1). A representation of this request in QUEL is as follows: retrieve (box.all) where not (box.x2 <= 0 or box.x1 >= 1 or box.y2 <= 0 or box.y1 >= 1)

Problems • The command is too hard to understand. • The command is too slow because the query planner will not be able to optimize something this complex. • The command is too slow because there are too many clauses to check.

Solution Support a box data type so that the box relation and the resulting user query can be defined as follows: create box (id = i4, desc = box) retrieve (box.all) where box.desc !! “0, 1, 0, 1” Here “!!” is an overlaps operator with two operands of data type box which returns a boolean.

Intuition • New user-defined type and operator make resulting query more readable • New user-defined access methods can allow query planner to optimize query

Consequences • Need ability to define operators for user defined types • Require support for fast access paths for queries - Extend current access methods (e.g. B-trees for boxes using ascending area) - Define new access methods (e.g. R-trees for boxes using contained in) • Require support for the query optimizer to construct an efficient plan

Overview • What will we discuss: • ADTs • New Access Methods • Query Processing and Access Path Selection

Examples of Operators for Box Data Type

Definition of New Types The new type can be implemented as follows: define type-name length = value, input = file-name, output = file-name

Definition of New Operators Zero or more operators can be implemented for the new type as follows: define operator token = value,left-operand = type-name,right-operand = type-name,result = type-name,precedence-level like operator-2,file = file-name

Example define operator token = !!, left-operand = box, right-operand = box, result = boolean, precedence like *, file = /usr/foobar

Comments on the Prototype • Problem: ADT routines are a security loophole • Possible solutions: - Run in separate address space - Interpret ADT procedure - Use hardware support for protected procedure calls • Author’s solution: Provide two environments for ADT procedures - Protected environment for debugging - Unprotected one for performance

Registering New Access Methods • Basic idea: - Access methods contain a small number of procedures that define its characteristics - Replace these by others which operate on a different data type

Example • Consider a B-tree and the following generic query: retrieve (target-list) where relation.key OPR value - Supports fast access if OPR is in {=, <, <=, >=, >} - Includes procedure calls to support these operators for a particular data type • We just have to write procedures for our new operators which must have properties P1, …, P7 and the B-tree will function correctly!

Example (cont’d) • Appropriate information recorded on two access method templates: - TEMPLATE-1 describes conditions which must be true for the operators provided by the access method (only used by humans) - TEMPLATE-2 provides necessary information on the data types of operators • In AM relation, designer can implement one or more collections of operators which satisfy the template

F1 = (value – low-key) / (high-key – low-key) F2 = (high-key – value) / (high-key – low-key)

Example (cont’d) • User can modify relations to B-tree using any class of operators defined in AM relation as follows: modify box to B-tree on desc using area-op • Secondary index can also be constructed as follows index on box is box-index (desc) using area-op

Implementing New Access Methods • Collection of procedure calls that retrieve and update records. • Need to construct open, close, get-first, get-next, get-unique, insert, delete, replace, and build • Open and close are usually universally usable and designer only needs to construct the remaining procedures • Replace and delete do not require modification if the same physical page layout as some existing access method is used

Implementation Problems • Interface to transaction management code • Concurrency control subsystem issues • Interface to buffer manager • Only briefly discussed in the paper

Query Processing And Access Path Selection • Require four pieces of information when defining an operator to allow optimization: - Selectivity factor, Stups, estimates the expected number of record satisfying the clause: where rel-name.field-name OPR value - Selectivity factor, S, is the expected number of records which satisfy the clause: where relname-1.field-1 OPR relname-2.field-2 - Feasibility of merge-sort - Feasibility of hash-join

Example define operator token = AE, left-operand = box, right-operand = box, result = boolean, precedence like *, file = /usr/foobar, Stups = 1, S = min (N1, N2), merge-sort with AL, hash-join

Generating Query Processing Plan • Assumptions: - Relations stored keyed on one field in a single file - Secondary indexes can exist for other fields • Queries involving a single relation can be processed as follows: - Scan of the relation - Scan of a portion of the primary index - Scan of a portion of a secondary index

Generating Query Processing Plan (cont’d) • Joins can be processed as follows: - Iterative substitution - Merge-sort - Hash-join • Modify standard query planner to compute best plan using appropriate rules to generate legal plans and the selectivities provided

Summary • Main contributions of paper: - Shows how to adapt existing access methods for new data types - Explains how to code new access methods - Demonstrates how to support automatic generation of optimized query plans

The Postgres Next-Generation Database Management System Michael Stonebraker Greg Kemnitz

Applications of DBMS • Data management (traditional) • Object management (new) • Knowledge management (new) • An example which requires services in all three dimensions is an application that stores and manipulates text and graphics to facilitate the layout of newspaper copy.

Postgres Data Model and Query Language • Orientation toward database access from a query language - Emphasis on query language, optimizer, and run-time system • Orientation toward multilingual access - No programming language-specific tight integration • Small number of concepts - Classes, inheritance, types, and functions

Classes (Constructed Types, Relations) • Named collection of instances (records, tuples) of objects - Each instance has same collection of named attributes - Each attribute is a specific type - Each instance has a unique (never-changing) identifier (OID) • Can be created as follows: create EMP (name = c12, salary = float, age = int) • Can inherit data elements from other classes: create SALESMAN (quota = float) inherits EMP • POSTGRES allows a class to inherit from an arbitrary collection of other parent classes (multiple inheritance)

Classes (cont’d) • Three kinds of classes in POSTGRES: real classes, derived classes, and versions. - A real (or base) classes’ instances are stored in the database. - A derived (or view or virtual class) classes’ instances are not physically stored but are materialized only when necessary. - A version of another class is stored as a differential relative to its parent class.

Types • Three kinds of types in POSTGRES: base types, arrays of base types, and composite types. • Base types include hard-wired types (e.g. integers, floats, character strings) and constructed ADTs - Can assign values to attributes of base types in POSTQUEL by either specifying a constant or a function which returns the correct type • Arrays of base types are supported using standard bracket notation and we could define a class as follows: create EMP (name = c12, salary = float[12], age = int) retrieve (EMP.name) where EMP.salary[4] = 4000

Types (cont’d) • Composite types allow a user to construct complex objects, that is, attributes which contain other instances as part or all of their value. - Complex objects have a hierarchical internal structure - Zero or more instances of any class is automatically a composite type. For example: create EMP (name = c12, salary = float[12], age = int, manager = EMP, coworkers = EMP) • Note, each time a class is constructed, a type is automatically available to hold a collection of instances of the class. • POSTGRES also supports a final constructed type, set, whose value is a collection of instances from all classes. For example, hobbies information can be added to the EMP class as follows: add to EMP (hobbies = set)

Types (cont’d) • Path expressions: - Elements of an attribute that are a composite type can be hierarchically addressed by nested dot notation. For example, one could write: retrieve (EMP.manager.age) where EMP.name = “Joe” • Composite types can have a value that is a function which returns the correct type. For example: replace EMP (hobbies = compute-hobbies(“Jones”)) where EMP.name = “Jones”

Functions • Three different kinds of functions in POSTGRES: - C functions - Operators - POSTQUEL functions

C Functions • To be able to perform complex calculations on objects, POSTGRES supports C functions. • Can define an arbitrary number of C functions whose arguments are base or composite types • Can have an argument which is a class name. For example: retrieve (EMP.name) where overpaid (EMP) - Inherited down the class hierarchy in the standard way - Can be considered as a new attribute for the class whose type is the return type of the function. For example: retrieve (EMP.name) where EMP.overpaid • Queries with C functions in the qualification cannot be optimized by the POSTGRES query optimizer. For example, the preceding query will result in a sequential scan of all instances of the class.

Operators • To be able to use indexes in processing queries, POSTGRES supports operators. • Operators are functions with one or two operands which use the standard operator notation in the query language. For example: retrieve (DEPT.dname) where DEPT.floorspace AGT “(0,0), (1,1), (0,2)” • Only available for operands which are base types - Access methods support fast access to specific fields in records - Unclear what an access method should do for a constructed type

Operators (cont’d) • To assist the query optimizer, hints such as the negator of an operator can be included in the definition of an operator. • For example, the following query cannot be optimized, but it can be written as the previous query which can be: retrieve (DEPT.dname) where not DEPT.floorspace ALE “(0,0), (1,1), (0,2)” • Information on available access paths is stored in the POSTGRES system catalogs.

POSTQUEL Functions • Any collection of commands in the POSTQUEL query language can be packaged together and defined as a function. For example: define function high-pay returns EMP as retrieve (EMP.all) where EMP.salary > 50000 • POSTQUEL functions can also have parameters. For example: define function sal-lookup (c12) returns float as retrieve (EMP.salary) where EMP.name = $1 • Can be placed in a query or directly executed using the fast path facility

POSTQUEL Functions (cont’d) • Attributes of a composite type automatically have values which are functions that return the correct type. For example, consider the following function and command: define function mgr-lookup (c12) returns EMP as retrieve (EMP.all) where EMP.name = DEPT.manager and DEPT.name = $1 append to EMP (name = “Sam”, salary = 1000, age = 40, manager = mgr-lookup(“shoe”)) • Like C functions, POSTQUEL functions can have a specific class as an argument and can either be thought of as functions or as new attributes.

POSTGRES Query Language • We already saw: User-defined functions and operators, arrays, path expressions • Support for nested queries • Transitive closure • Support for inheritance • Support for time travel

Nested Queries • POSTQUEL allows queries to be nested and has operators that have sets of instances as operands. For example: retrieve (DEPT.dname)where DEPT.floor NOT-IN {D.floor from D in DEPT where D.dname != DEPT.dname}

Transitive Closure • Allows a user to explode an ancestor hierarchy. For example, consider the class parent (older, younger) and the following query: retrieve* into answer (parent.older) from a in answer where parent.younger = “John” or parent.younger = a.older - * after retrieve indicates that associated query should be run until the answer fails to grow - * can also be used to indicate that a query should be run over a class and all classes under it in the inheritance hierarchy. For example: retrieve (E.name) from E in EMP* where E.age > 40

Time Travel • Allows a user to run historical queries. For example (T is a time): retrieve (EMP.salary) from EMP [T] where EMP.name = “Sam” - POSTGRES will find the version of Sam’s record valid at the correct time and get the appropriate salary

Fast Path • Reason for fast path: Application may require direct access to user-defined or internal POSTGRES function. • POSTQUEL has been extended with: function-name (param-list) • User can execute any function known to POSTGRES. (e.g. parser, optimizer, executor, access methods, buffer manager, utility routines) • Validity of parameters not checked • Allows user program to call a function in another address space rather than its own

Rule System • Reasons for rule system: Users require support for views, triggers, integrity constraints, referential integrity, protection, and version control. • POSTGRES rule system is a general-purpose rules system that can perform all of these functions.

Rule System (cont’d) • Rules have the form: ON event (TO) object WHERE POSTQUEL-qualification THEN DO [instead] POSTQUEL-command(s) - events: retrieve, replace, delete, append, new (replace or append), or old (delete or replace) - objects: name of a class or class.column - POSTQUEL-commands: set of POSTQUEL commands with the following two changes: - new or current can appear instead of the name of a class in front of any attribute - refuse (target-list) is added as a new POSTQUEL command

Versions • Innovative application of rule system • Goal of versions: Create a hypothetical version of a class with the following properties: - Initially, the hypothetical class has all the instances of the base class - The hypothetical class can be freely updated to diverge from the base class - Updates to the hypothetical class do not cause physical modifications to the base class - Updates to the base class are visible in the hypothetical class, unless the instance updated has been deleted or modified in the hypothetical class

Inclusion of New Types in Relational Database Systems

Inclusion of New Types in Relational Database Systems

Presentation Transcript

Introduction to Relational Database Systems

Comparison of Relational database management systems

Object / Object-Relational Database Management Systems

Relational Database

Database Systems I Relational Algebra

Database Systems I Relational Algebra

Relational Database

Relational Database

Inclusion of new media types in INIS/ETDE databases

05027240 Database Systems Relational Normalization Theory

Relational Database

Understand Relational Database Management Systems

Integrating XQuery and Relational Database Systems

Relational Database Systems

MD807: Relational Database Management Systems

Relational Database Systems

Basic Definitions Relational Database Concepts Relational Database Management Systems

Relational Database Systems

Object-relational database systems

Database Systems The Relational Database Model