Developing Object-Relational Database Applications

Developing Object-Relational Database Applications Paul BrownChief Plumber INFORMIX Software

What We Will Talk About • Plenty of Focus on Features & Functionality • Question is, “How to use it all?” Object-Relational technology makes a fresh look at the process of database analysis and design useful. • What is different, what is the same? • Many things are the same . . . • . . . some things are new. Developers using an ORDBMS benefit by taking a more holistic approach to development than was common with SQL-92 DBMS products.

Overview of Tutorial • Design Objectives of Databases • How are OR Database Similar/Different? • What about the theory? • Methodology Description • Steps of a development procedure • Analysis and design • Implementation advice • Examples as we go through • Architecture Advice • Back-end and middle-ware • Component-centric development

Design Objectives Meet the Needs of End Users • Complete • The Database needs to model everything that is relevant to the problem domain (PD). • Correct • The Database needs to be an accurate model of the PD. • Consistent • The database model should not contain redundant data. • Flexible • The database should adapt to changing requirements. • Efficient • Fast, scaleable and simple to administer.

Object-Relational Data Model • Based on Relational Model of Codd [1971] • Relation as storage abstraction and organizing principle • Declarative programming as interface model • An Object-Relational DBMS Provides • Physical abstraction (no byte level details for OR-SQL) • Formal modeling techniques (normalization) 20 years of R&D have addressed most performance, reliability and scalability issues.

Theoretic Overview ( 1/2 ) • Domains ( kinds or classes of data objects) • { a, b, c . . . } • Attributes(named columns of a domain) • { A, B, C . . . } • Dom ( A )  a • Relation Schema (set of attributes) • { . . . X, Y, Z } where X  {A, B, C . . } • Relation (set of n-ary tuples over a relation schema) • { . . P, Q, R . . } For P over X, P   Dom ( Xi ) • We also say t: X  Dom ( X )

Theoretic Overview ( 2/ 2 ) Operations to Manipulate Relations • Projection (select sub-set of columns from relation) • Given P over X, and Y X,  < P, Y >  Q over Y s.t.  t  P,  t’  Q where t ( A ) = t’ ( A )  A  Y. • Restriction (select sub-set of tuples from relation ) • Given P over X, and  ( X ),  < P,  >  Q over X s.t.  t  Q,  (t)  true. • Join ( cross-product of two relations ) • Given P over X, and Q over Y, < P, Q >  R over ( X  Y ) s.t.  t  R,  t’  P and t’’  Q where t [ X ] = t’, and t [ Y ] = t’’. • Plus (Aggregate, Intersection, Division, Union . .)

Enough Greek! What does it look like in practice? • Declarative language programming • Dynamic, run-time interpretation CREATE TABLE Employees ( Id Employee_Number PRIMARY KEY, Name PersonName NOT NULL, Date_of_Birth DATE NOT NULL, Address MailAddress NOT NULL, LivesAt st_point NOT NULL, Resume Document NOT NULL, Voice_Key Audio_Recording NOT NULL, Holidays SET ( Period NOT NULL ) ); How do you get to here, starting with a bunch of users, a small herd of programmers, and a budget.

Development Road-map 1. Analyze the Problem Domain • Document the high level structure (E-ER) • Describe each kind of data (UML Class Diagrams) 2. Design and Implementation • Use User-Defined Types (UDT) and User-Defined Functions (UDF) for Objects • Combine Objects into Schema Tables/Views • Queries to map to user views 3. Verify Each Step • Prototype UDT/UDFs • Normalize • Test performance against production scale loads

Outline of Procedure: 3 Tier Model . CREATE OPAQUE TYPE PersonName ( internallength = variable, maxlen = 64 ); CREATE FUNCTION getFamilyName ( . . CREATE TABLE Employees ( Id Employee_Id PRIMARY KEY, Dept Dept_Id REFERENCES Dept ( Id ), Name PersonName NOT NULL, DOB DATE NOT NULL . ); . . 3. Physical 1. Conceptual 2. Logical

Phase 1: Conceptual Modeling • Describe User Views of Problem Domain • Pictures instead of words • Multiple Semantic Model Possibilities • Extended Entity-Relationship (E-ER) Modeling • Object-Role Modeling (ORM or NIAM) • Universal Modeling Language (UML) • Use E-ER in this Tutorial • Familiar to most developers and analysts • Concepts introduced here apply everywhere

E-ER Diagrams for Dummies Entity: Principle data objects in the problem domain. Typically identified with a noun term; Branch, Employee, etc Branch has_a Relationship: Association between Entities. Typically identified with a verb term; Manufacturers, WorksFor, etc. has_a relationships can have additional attributes. Branch Product Employees Branch Full_Time is_a Relationship: Inheritance relationship between Entities. Production Sales

1.1 Describe High-Level E-ER worked for Branch Employees manufactures When Contractors Full_Time Customer Production Product Sales sold Qty

Employee Products Customer Name Kind Name Kind Name Kind Id Employ_Num Id Cust_Num Id UPC_Code Name Person_Name Contact Person_Name Name LabelString Address Mail_Addr Address Deliver_Addr Dimensions Physical_Size Resume Document Bill_Address Mail_Addr Capacity Mass Lives_At Geo_Point Available_In SET{ Color } Date_of_Birth DATE Price Currency 1.2 Describe Each Entity For each entity, name its attributes, and decide what kind of data is in each of them.

E-ER Modeling Review • This is Really Nothing New • Better modeling of is_a relationships (inheritance) • What Other Tasks? • Note Logical Keys in Entities • Record arity of has_a relationships (1:N, N:M, etc) • Record mandatory and optional relationships • What is New? Resist the temptation to analyze down to SQL-92 types. Within E-ER modeling, maintain a degree of abstraction.

Handling Multiple Conceptual Views • Multiple Semantic Model Possibilities • Extended Entity-Relationship Modeling • Object-Role Modeling • Universal Modeling Language • What about their Object Model? User Object Models often contain elements of interest to OR database developers. A UDT defined in the database need not be used in a table.

The ‘Object’ Concept “The most brutalized term in computer science” • What is an ‘Object’? • Atomic unit of meaning encapsulating both state and behavior. • Objects mostly map to domains in Relational Theory • Why is the Concept Useful? • Intuitive way of describing ‘things’ in the data model • These intuitions can drive user-defined type and user-defined function design.

1.3 Catalog the Kinds of Data Create a list of all of the different kinds of data identified in the schema and in the object models. Some of these will be in more than one entity.

Mail_Address Delivery_Address + Address_Line_One String + Address_Line_One String + Address_Line_Two String + Address_Line_Two String + City String + City String + State String + State String + ZipCode Zip_Code + ZipCode Zip_Code + Country + Country String String + Delivery_Notes Document + Mail_Address(String, String, String, Zip_Code) + Delivery_Address(String, String, String, Zip_Code, Document) + approximateMatches () + Label () + approximateMatches () + Equal () + Label () - Compare() + Equal () - Compare() 1.4 Use UML Class Diagrams

Operators Extract Data Modifiers Constructors Class Diagram Details Kind of Data in Element State Elements (Data Structure) GeoPoint - Longitude FLOAT Interface Methods FLOAT - Latitude - Ellipsoid Ellispoid_Enum + GeoPoint ( String ) -> GeoPoint, + GeoPoint( FLOAT, FLOAT) -> GeoPoint, + Latitude ( GeoPoint ) -> FLOAT, + Longitude ( GeoPoint ) -> FLOAT + Distance ( GeoPoint, GeoPoint ) -> FLOAT + Quadrant ( GeoPoint ) -> CHAR(2) + SetLatitude( FLOAT ) -> GeoPoint + SetLongitude ( FLOAT ) -> GeoPoint [ Spatial Operators, R-Tree Support ] [ Compare(), Equal, NotEqual ] Why These?

Object Classes are used in OR-SQL queries SELECT DISTINCT, UNION are SQL operations Merge-Join and Hash-Join for internal efficiencies Implementation Tip # 1 Always create a Compare() for any object that has an Equal(). The resulting order might be meaningless, but within the context of the query processor it can make sense.

1.5 Minimize Set of Object Classes • Use Detailed Specifications to Identify • synonyms and • antonyms • Identify ‘like’ Object Classes • Inheritance allows re-use • { Mail_Address, Delivery_Address } • Describe Algorithms for all Behaviors • What does it mean to have ‘Equal’ UPC? • Space efficient internal representation

Document Common Queries and Processes Helps to flesh out Object Class specifications Useful for identifying additional kinds of data Useful for later functionality/scalability testing Birthday INTEGER + Month INTEGER + Day - From_Leap_Year boolean + Birthday( DATE ) -> Birthday , + Print( Birthday ) -> String, + Equal( Birthday,Birthday)-> boolean . etc . Workload and Common Queries SELECT E.Name FROM Employees E WHERE Birthday(E.Date_of_Birth) = Birthday(TODAY); Note: Types don’t have to be used to define tables. They can be used to extend SQL.

By this Point you Out to Have: • An E-ER Model Representing High Level View • List of entities, and their structure • Understanding of relationships among entities • List of rules-- keys, constraints etc • Class Diagrams of each Distinct Kind of Data • Described in terms of Structure, and Behavior • Analysis of algorithms in methods Next Step: Transform this conceptual model into a sound OR database design and implementation.

Phase 2: Database Implementation • UDTs and UDFs for each Domain • Use DataBlade extension libraries • Develop business objects from scratch • Build Relational Schema • Naïve translation of E-ER conceptual model • Normalize the initial model • Some special considerations • Performance and Scalability Testing • Create volume data quickly • Workload test harness

2.1 Implement Domains/Kinds of Data • Several Mechanisms to Choose From • Do I buy a DataBlade™ product? • Do I build objects from scratch? • What about the client side? • If Build, Which Technique? • Built-in type, ROW TYPE, DISTINCT TYPE, OPAQUE TYPE • Each mechanism has different properties • How to Implement the UDFs? • SPL, Java or ‘C’?

Implementation Options

DISTINCT TYPES Re-Use Pre-Existing Types Create EXPLICIT Cast between Parent and DISTINCT Overload that CAST to Enforce Data Integrity Implementation Tip # 2 CREATE DISTINCT TYPE Quantity AS INTEGER; -- CREATE FUNCTION INTEGER2Quantity ( Arg1 INTEGER ) RETURNS Quantity IF ( Arg1 < 0 ) THEN RAISE EXCEPTION –746,0, “Error: Quantity must be non-negative”; END IF; RETURN Arg1::Quantity; END FUNCTION; -- DROP CAST ( INTEGER AS Quantity ); CREATE IMPLICIT CAST ( INTEGER AS Quantity WITH INTEGER2Quantity );

UDF Implementation Performance • Choice of Three Languages • Stored Procedure Language, Java and ‘C’ • Experiment: Pop( INTEGER ) and IsPrime ( INTEGER )

Avoid SQL Callbacks Inside User-Defined Functions UDFs with SQL are variant and can’t be parallelized SQL inside a UDF kind of violates the philosophy Implementation Tip # 3 CREATE OPAQUE TYPE Mass ( internallength = 16, alignment = 8 ); -- -- The code implementing this external function includes all -- of the conversion rates compiled into it. This permits it -- to perform conversion operations hyper-fast. -- CREATE FUNCTION Compare (Mass ,Mass) RETURNS INTEGER WITH ( NOT VARIANT, PARALLELIZABLE ) EXTERNAL NAME ”$INFORMIXDIR/extend/Massbin/Mass.bld(MassCompare)" LANGUAGE C;

2.2 Create the Schema Tables • Combine the Types into Schema Tables • Simply CREATE TABLE if it is not in hierarchy • Create Hierarchy of ROW TYPE instances if it is • For simple has_a relationships, extend the dependent table with additional column for the first table’s primary key column. • For multi-variant has_a relationships, create a table with one column per key, and extra columns Similar procedure to the one used transforming ER to RDBMS, except in the handling of Inheritance.

A Note on Keys • The Relational Key Concept is Important • Identifies correctness constraints within schema • Drives normalization, which helps ensure consistency • Formerly, a Key is defined as: A column where no two values are equal.

Example Schema (1/3) -- CREATE TABLE Products ( Id Product_Num NOT NULL PRIMARY KEY, Name LabelString NOT NULL, Dimensions Physical_Size NOT NULL, Capacity Mass NOT NULL, Available_In SET{Color NOT NULL}, Price Currency NOT NULL ); -- CREATE TABLE Customers ( Id Customer_Num NOT NULL PRIMARY KEY, Contact Person_Name NOT NULL, Address Mail_Address NOT NULL, Delivery Delivery_Address );

Example Schema (2/3) -- CREATE ROW TYPE Sales_Employee_Type ( Base_Salary Currency NOT NULL, Sales_YTD Currency NOT NULL ) UNDER Full_Time_Employee_Type; -- CREATE TABLE Sales_Reps USING Sales_Employee_Type UNDER Full_Timers; -- CREATE TABLE Product_Sales_to_Customers ( Customer Customer_Num NOT NULL REFERENCES Customers ( Id ), Product Product_Num NOT NULL REFERENCES Products ( Id ), Sales_Rep Employee_Num NOT NULL REFERENCES Sales_Reps ( Id ), Qty_Sold Quantity NOT NULL );

Mass -Unit { Ounce, LB, Gram, KG . .} -Quantity float + Mass( String ) -> Mass ,+ Print( Mass ) -> String, + Equal( Mass, Mass ) -> boolean, + LessThan( Mass, Mass ) -> boolean, +LessThanOrEqual( Mass, Mass ) ->boolean, +GreaterThan( Mass, Mass ) ->boolean, + GreaterThanOrEqual( Mass, Mass ) -> boolean, +NotEqual( Mass, Mass ) -> boolean, - Compare( Mass, Mass ) -> INTEGER, +Plus( Mass, Mass ) ->Mass, +Minus( Mass, Mass ) ->Mass, +Divide ( Mass, INTEGER ) ->Mass, +Times ( Mass, INTEGER ) -> Mass Example Schema (3/3) -- -- Query against the Product.Capacity column showing -- behavior of the Mass object class invoked. -- SELECT P.Id, P.Capacity FROM Products P WHERE P.Capacity > ’3 KG’ ORDER BY P.Capacity ASC;

2.3 Normalization • Normalization Concept & Procedure Same • Key, Whole Key, Nothing but the Key • Keys can be User-defined Types (of course) • New “corner cases”: spatial key (topography), temporal key • What about COLLECTION Attributes? • Good for domains of limited cardinality • { Work Days, Sizes_Available, Other_Phone_Nums } • Bigger sets, bigger domains, less useful • No Indexing! • Nested Tables are Right Out. COLLECTION works well for UDF varargs.

2.4 Load and Test (and test) • Can Use ORDBMS to Generate Data INSERT INTO Products ( Name, Dimensions, Capacity, Available_In, Price ) SELECT MakeString( Random(1000), 24 ), Random_Dimensions(), Random_Mass(), Random_Color_Set(), Random_Currency() FROM TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N1 ( Num ), TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N2 ( Num ), TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N3 ( Num ); • Physical Tuning • IDS.2000 Little Different from 7.X • Remember to partition SBLOB data, as well as table data (text indexing particularly!)

Organizing Development Teams • Developers Writing Extensions Get own Instance • Writing UDFs means debugging, which will halt the IDS instance. • Testing with the DBMS • Development Staging Process • Individual Instances, to Test Environment • only released to Production when happy. • Looks like Mainframe development. • Prototyping an Annealing • SPL pass 1, Java or ‘C’ for what’s slow

JDBC, ODBC, ESQL/C, RMI, Foo-ML, etc Java, C, etc JDBC, ODBC, ESQL/C, RMI Java or SPL UDF ‘C’, SPL, Java Application Process Application Process SQL SQL Types in Schema Built-in, DISTINCT, or OPAQUE UDF ‘C’, SPL, Java Architectural View JDBC (if Java)

Err..What was that? The Following Principles Seem to Work: • No SQL Inside a UDT Method • Objects are orthogonal to schema structure • Do not put SQL inside ‘C’ User-defined Routines • No performance advantage • Makes development hard • UI calls App Logic calls SQL Still Applies • Impedance mis-match between SQL DBMS and languages like Java are minimized.

User-defined Functions Work in SPL Any ‘C’ or Java (or SPL) UDF can be called as subroutine Most useful with iterator functions Implementation Tip # 4 CREATE FUNCTION GetFirstName ( Arg1 LVARCHAR ) RETURNING lvarchar; DEFINE lIntVals LIST(lvarchar NOT NULL); DEFINE lvRetVal lvarchar; LET lIntVals = Split(Arg1, ' ' ); FOREACH cursor1 FOR SELECT * INTO lvRetVal FROM TABLE(lIntVals) EXIT FOREACH; END FOREACH; RETURN lvRetVal; END FUNCTION; -- EXECUTE FUNCTION GetFirstName (“John Fitzgerald Kennedy”) John

On the Client Side • 4GL is Perfectly Reasonable for ORDBMS • OR-SQL queries can take & return strings & numbers, and can invoke stored procedures. • ‘C’, C++ • ESQL/C, ODBC, CLI are workable • DataBlade™ products ship with client libraries and header files • Java • JDBC 2.0 is majestic mojo • Type <-> Class mapping can make the interaction seamless

‘Organic’ Applications Another way to look at Database Development • What is an ORDBMS? • An ORDBMS is a Software Back-Plane . . . • supporting a declarative/query-centric interface. • How do you use it? • Combine objects to solve problems. • No more system-wide upgrades. • Component-centric development.

Summary and Conclusions • ORDBMS Development Is • Best approached in a ‘holistic’ fashion. • Consider the design of at high-level (relational) • and low-level (objects and components) • More Implementation Tips http://www.informix.com/idn http://www.iiug.org/software http://examples.informix.com/

References and Further Reading INFORMIX Tech Notes: Q1, Q2 of 2000 contains two long papers describing this in more detail. Roy, Jacques Server-Side Programming in C: INFORMIX Dynamic Server 2000 INFORMIX Press. 1999 And lastly! Brown, Paul G. Developing Object-Relational Database ApplicationsINFORMIX Press. Dec, 2000.

Developing Object-Relational Database Applications

Developing Object-Relational Database Applications

Presentation Transcript

Developing MySQL Database Applications

Object Oriented vs Relational Database

Object / Object-Relational Database Management Systems

Object-Relational Databases

Relational Database

Object relational database managmement systems (ORDBMS)

Relational Database

Object Relational Databases

Relational Database

Relational Database

Database Design for Object-Relational Systems

Object Oriented vs Relational Database

Object Relational Database Comparison

Object Relational Databases

Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

Pertemuan 25 Object Relational Database Management System

Object-relational database systems

Object-relational databases

Advanced Database Systems Module 6 Object Oriented and Object Relational

Object-Relational Database Extensions and JDBC