Database Design &Implementation

Database Design &Implementation • First priority of database management: Never lose the content of an operational database. • This ability to automatically preserve data off-line is called persistence. • Hybrid Object-Relational DataBases (ORDB’s) are one way to solve the persistence problem when implementing object-oriented applications with persistent data content. • Understanding ORDBs requires more details about database architecture (next slide).

Hybrid Object-Relational DataBases (ORDB’s) • ORDB’s can preserve network-structured data content by what is called ‘pointer swizzling’: Every address pointer in a memory-resident structured data object is mapped to/from an object key or unique identifier in external RDB tables. This must identify the table/class type and the row/object number within the table. • The COOL framework supports ORDB management with separate components for data and method access: • GEN generates C/C++ code for persistent I/O and navigation through data structures in the ORDB. This code depends only on the database model or schema, not on the application code. • LCP captures event-mediated flow of control information in a state model database. Application programmers write the State Actions

LCP (Life Cycle Prototype) Interpreter • LCP controls application behavior by capturing event-mediated flow-of-control information in a state model database. It dispatches events to object instances by ‘delegating’ execution to a prototype ‘Active Class’. This is one way of implementing inheritance. • LCP supports method access inside prototype object instances by (compile-time) mapping of named state actions to function pointers for each Active Class. Developers write the application-specific code for the State Actions.

LCP Interpreter Performance LCP performance depends on two kinds of overhead: • Control-flow interpretation from the State Model database: this overhead increases as state actions become more fine-grained. • Event-dispatching overhead: this is inherent in distributed and multi-processor (non-shared-memory) applications, but not in shared-memory multi-threaded ones.

Relational Databases • An RBD ‘instance’ (a snapshot of its value at time t) is a set of ‘tuples’; each tuple contains a sequence of scalar attribute values for its coordinates or fields. • One tuple is stored externally as a record in a ‘flat’ file, and internally as a sequence of data members for a single object. • Any homogeneous set of tuples can be viewed conceptually as a table with rows and columns, with one column per field or data member and one row per tuple or object instance. • ‘Tuples generalize Vectors in the sense that coordinates of the tuple space can be different types, while vectors have homogeneous type for their coordinates. • A table of tuples has an abstract geometrical interpretation as a set of points that ranges over a multi-typed coordinate space.

Relational Databases • An RBD is a set of relations, each of which is a set of ‘tuples’. • Each tuple represents an object with scalar attributes. • Each relation may be stored externally as a set of similarly structured records in a ‘flat’ file. • A single relation or file can be viewed conceptually as a table of rows. • The tuples of one relation define a geometrical set of points in a multi-typed coordinate space. Tuples generalize Vectors in the sense that coordinates of the tuple space can be different types, while vectors have one homogeneous type for all of their coordinates.

Database Normalzation • Complex structured data types (and object instances) are decomposed or ‘normalized’ into simple parts of ‘Second Normal Form’ (2NF): (no composite=structured attributes or repeating groups of values are allowed). • For maintenance and reliability reasons, the design is further normalized (3NF): (There are no redundant or indirectly computable field values and all properties are stored in only one place.) • [Refs: Sanders Ch. 3 & Appndx A; Kroenke Chapter 2 ]

Composite pkeys in RDB’s • Every tuple must have a unique field (or set of fields) called its primary key (pkey) which uniquely identifies it. • A composite pkey for a child or component tuple is often built by concatenating multiple key fields from a chain of ancestors. This complicates pkey-to-fkey matching. • Example: Dept--->Course--->Section (ERD on next slide) • CS has Dept# = 91 and OOAD has course# = 91.523. • Many courses have the same section # 201, so 201 is only a unique identifier within sections of a particular course, just as 523 is only a unique course# within a particular Department. • In my syllabus the course id 05f523 assumes only one Dept (91) and one Section (291), but it combines a semester id (05f) for the term=Fall 2005 with the course number 523 so I can partition directories for this course over multiple terms.

Composite Pkeys (2 Examples) Examples: SWEngI: (Schedule id vs. my directory id A unique pkey for my section of SWEngI is a composite of Dept, Course and Section number: My directory id for SWEngI 91.523 in fall 2005 associates 05f with 523: RJL Course pkey: 523 Term pkey: 05f CS Dept. pkey: 91 SWEngCourse pkey: 91+523 Directory (one) Pkey: 01f523 (These are ’instance diagrams’, not ERD’s. They show field values for a single table row; an ERD only shows entity and field types. Section (one of many) pkey: 91+523+291

Surrogate Keys in RDB’s • The composite key for my section of OOAD in the Student Information System (SIS) Database is a composite of Dept, Course and Section number: 91.523.291. • For IBM’s RDB, EFCodd advocated a hidden ‘surrogate’ pkey to replace the user-defined composite keys. This improves code quality and performance (by expediting the fundamental RDB operation ‘join’: match pkeys to fkeys). • [Ref: EFCodd: Extending the RDB Model…, ACM TODS Dec’79] Example: Entity with old and new key name and value: • Entity: alternate (old pkey): surrogate (name = value): • DEpt deptNo = 91 DEid = DE0000xx • COurse courseNo = 91+523 COid = CO000yyy • SEction sectNo= 91+523+291 SEid = SE000zzz

RDB with Surrogate pfkeys: • A GEN Example: CS Dept View of SIS Database ERD: • Entity: alternate (old pkey): surrogate (name = value): • Dept deptNo = 91 DEid = DE000001 • Course courseNo = 91+523 COid = CO000220 • Section sectNo= 91+523+291 SEid = SE002601* (Note that the fkey only references the immediate ancestor or container of an object or tuple.) Department DE DE000001 91 Course CO CO000220 DE000001 523 (*Sections of ALL Courses are in table SE) Section SE* SE002601 CO000220 291 A Persistence Requirement(WHY?): Each table has a mnemonic abbreviation (DE,CO,SE) encoded into the pkey value of its objects.

Surrogate Keys in COOL/GEN • GEN uses surrogate pkeys and matching fkeys, but does not hide them. (OK for CAD/CASE tools with hi-tech users.)* • Pkeys can never be re-used for new objects, as long as fkeys exist that can reference their former object (in old but still-in-use database versions). ____________ * CJDate: The RDB Model (A-W 2001) faults EFCodd for recommending that users should not see surrogate pfkey values.

Persistent Object Identifiers • C++ and Java runtime objects have an object-id (oid), typically represented by its virtual memory address on the heap. This oid corresponds at least conceptually to the pkey of an RDB tuple. This type of oid is not visible and not persistent, because it disappears when the program terminates. • Large RDBMS’s can avoid loss of information and achieve persistence by taking over or duplicating OS virtual memory management functions: moving large segments of virtual memory to/from mass storage in a fail-safe manner, saving relative addresses on hard disk. • Another way to achieve persistence is to convert pkey/fkey relationships to/from object references during database import/export. This is done by COOL/GEN.

Persistent Databases • Persistence means that pkeys and fkeys are preserved during export to mass storage or remote sites and re-import by the same or another DataBase Management System (DBMS) • A relational database (RBD) supports inter-object relationships by foreign key (fkey) fields. These are both user-visible and persistent: they get saved in mass storage if the program terminates. • The process of mapping RDB pkey-fkey associations to and from C++ pointers is called ‘pointer swizzling’. import Database in Main Memory Database in Mass Storage export

Referential Integrity • The principle of ’Referential Integrity’: • To maintain valid database content, all fkey values must match the unique primary key or object identifier of another tuple, or else have the reserved ‘null’ (unknown or undefined) value. • N-ary relations (N-way associations) can be implemented by a new associative entity, whose tuples contain exactly N fkeys (plus optional non-key attributes). • Most relations are binary (N = 2). Note that fkeys may refer to the same or different types. • See Example 2, Composite Pkeys (slide 9) and next slide 16

N-ary Relation (ERD Styles) • N-ary relations are many-to-many associations among N object instances (of the same or different types). • N-way associations can be implemented by introducing a new associative entity, whose tuples contain exactly N fkeys (plus optional non-key attributes). (Most relations are binary: N = 2). • The diamond indicates a ternary relation among types AA, BB and CC. [It is superfluous if N=2, if the relation is one to many, or if an associative entity replaces it.] Example for N=3: BB AA BB CC AA CC (3 fkeys inside) AABBCC Optional attributes New Entity AABBCC gives these attributes a decent home, and replaces the diamond.

Extended ER Diagrams • When an RDB implements an Extended ERD (EERD), a tuple’s fkeys or inter-object cross-references can identify either a super-class object or an associated parent or container object (instance of a class). • These two fkey types have different semantic interpretations. • The principle of ‘Referential Integrity’ constrains fkey values to the pkey value of some existing parent (AA or BB) instance. • To improve readability, EERD’s use different link layouts for inheritance and for instance-level associations: In this example, CC both inherits from AA and is a component of the composite entity BB. It contains two fkeys, (say) AAid and BBid. AA BB 1..1 0..* CC

Multiple Inheritance on EERD’s • Multiple inheritance requires an fkey to each superclass object whose properties (attributes or methods) are inherited. • In a prototype or delegation implementation of multiple inheritance, superclass object[s] actually exist apart from their corresponding subclass object[s]. Each sub-object has one fkey to each of its direct ancestor object types. • For a C or C++, an object with multiple inheritance paths to the same superclass type must be constrained to inherit data from the same instance as well. (Path divergence is risky!) • In an ORDB, fkeys also support dynamic method delegation. The COOL/LCP interpreter implements such a dynamic map (from a concrete object to its generic or prototype Active Instance, from object class to generic Active Class).

ORDB via Prototype Delegation • An Extended ERD (EERD) can be implemented as either a relational RDB, object-oriented OODB, or object-relational ORDB. An OODB is supported by its own class-based data representations.* • An ORDB can be class-based or prototype-based with delegation. (GEN is prototype-based.)** • Prototype delegation does not rely on Class membership for method inheritance - it creates object-level relationships to support method delegation: ANY client object can ‘delegate’ any of its behavior to another server object via the oid equivalent of an fkey. * CJDate insists that OODBs are fully supported by the relational model without any extensions. (I agree :-) [Ibid] ** Breugg:OOSE (10.4.4) calls this a vertical mapping.

GEN Database: Persistence Our GEN tool imports an external RDB to a memory-resident object-relational database (ORDB): • Its external persistent RDB format is a union of records representing tuples of different types. • During import, fkeys are augmented or replaced by parent and first-child and next-sibling object reference pointers, which follow strict GEN naming conventions. • During export, pointers are discarded; fkeys are preserved or restored for persistent storage in external RDB tuples.

GEN Database: Schema Constraints • The external RDB schema (or EER Diagram) is first converted to Third Normal Form. • Other attributes that would normally comprise a user-defined (and typically composite) primary key can be removed during schema or EERD conversion to Third Normal Form. • This eliminates redundant attributes that functionally depend on some fkey instead of the pkey attribute. • The corresponding EERD network tends to have links which are all 1 to many relations with mandatory parents and optional children (1..1 to 0..* on UML class diagrams). • Inheritance links are 1..1 to 0..1 on class diagrams, because at most one subclass instance can inherit from any given ancestor class instance.

GEN Database: External Format • Our GEN tool imports an external RDB to a memory-resident object-relational database (ORDB): • Its external RDB format is a union of records representing tuples of different types: • Every tuple record has an integral and immutable ‘surrogate’ primary key attribute (and object id). • Different tuple types have pairwise disjoint pkey ranges. • All foreign keys (fkeys) use this surrogate pkey value to refer to their parent (container or superclass) record type. • Chgen and gencpp are external-database-compatible.

GEN Database: Internal Format • During import, fkeys are augmented or replaced by direct parent object pointers plus first-child and next-sibling object reference pointers. These are constructed from fkey names following strict GEN naming conventions. • This results in an internal ORDB format which is a set of multiply-threaded linked lists of parent-to-children and super-to-subclass object (tuple instance) reference pointers. • Parent-pointers support direct access to parent table attributes, replacing pair-wise join queries in an RDB. • For each 1-to-many parent-child relationship, chgen provides a child_loop macro while gencpp provides a for-each iterator. • Gencpp has two optional modes: chgen or C++/STL.

GEN Database: Import/Export GEN creates two schema-based import/export utilities: • pr_load parses tuples and imports an external RDB into a memory-resident object-relational database (ORDB); • pr_dump exports the modified ORDB back to the persistent external RDB. • During import, fkeys are augmented or replaced by direct parent pointers plus first-child and next-sibling object reference pointers. These are constructed from fkey names following strict GEN naming conventions, Super- and sub-class objects are also connected in the same way. • This results in an internal ORDB format which is a set of multiply-threaded linked lists from each parent through each of its child-sets, that supports parent-child JOINs. [Ref: Jiri Soukup: Taming C++, A-W 1994; see also $PH/JiriSoukup]

Importing RDB’s to C++/Java • If the RDB is imported to an object-relational database implemented in C++ or Java, then during import the fkey fields of RDB tuple types should be converted to corresponding C++/Java object reference types. • Caveat/pre-condition: All fkeys implied by links on the RDB’s data model or EERD must conform to inheritance and type constraints of the language (C++ or Java). • Fkeys in an RDB can also support non-exhaustive or over-lapping subclasses (going beyond C++ constraints). • Fkeys and object references can also support dynamic migration (of an object among the subclasses of its class). • Example: An object may make transitions among OLC states (states become subclasses of the object’s class).

Object-Relational Databases - Prototypes and Delegation The last few slides were inspired by Shlaer-Mellor-User Group email related to Divergent Inheritance (parallel hierarchies). This motivates the use of prototypes and delegation to explain the static information architecture that is supported by COOL’s chGEN/GENcpp code generator, and illustrates concurrent sub-state machine models for dynamic behavior. • To: shlaer-mellor-users@projtech.com • Subject: Re: (SMU) Polymorphic events and other paranormal activity • Message 10/734 From lschneid@eng.delcoelect.com • Sep 04, 01 08:45:33 AM • responding to Fontana: . . .

Divergent Hierarchies • responding to Fontana: • > I think Jay was driving at divergent hierarchies, not multiple inheritance, eg: • > relationship S1 - supertype Dog, subtypes BigDog and SmallDog • > relationship S2 - supertype Dog, subtypes BlackDog and WhiteDog DOG CLASS Relationship S2: (BIG xor SMALL) (Mutex and exhaustive): Relationship S1: (BLACK xor WHITE) (Mutex and exhaustive): Black Dog Small Dog Big Dog White Dog Divergent Hierarchies Example: > relationship S1 - supertype Dog, subtypes BigDog xor SmallDog > relationship S2 - supertype Dog, subtypes BlackDog xor WhiteDog

OLC’s with Concurrent Sub-states (Illustrates direct product composition of concurrent STDs) • > Assume each of the 4 subclasses has its own ‘object lifecycle’ (OLC): • > BigDog: Woofing <--> Sleeping • > SmallDog: Yipping <--> Skittering • > BlackDog: Panting <--> Drooling • > WhiteDog: Shedding <--> Scratching • > Now create one instance of Dog - let's say it is a big black dog, with a • > dogId = 13. It must be in one of the BigDog states (Woofing or Sleep-> ing), • > AND in one of the BlackDog states (Panting or Drooling). Big: Dog #13 (Big and Black): Woofing Black: Panting Drooling Sleeping

Merging OLC Behaviors of Concurrent Subclasses: • Each of the 4 subclasses has its own ‘object lifecycle’ (OLC); • E.g. every Big&Black Dog must be in one of the BigDog states (Woofing or Sleeping), AND in one of the BlackDog states (Panting or Drooling). • Dog #13 (Big and Black) has the behavior/activity of both BigDogs and BlackDogs: Black Dog OLC: Panting Drooling Big Dog OLC: Woofing Woof& Pant Woof& Drool Sleep &Pant Sleep& Drool Sleeping

Divergent Hierarchies - revisited (1) DOG CLASS Partition S1: (BLACK xor WHITE) (Mutex and exhaustive): Partition S2: (BIG xor SMALL) (Mutex and exhaustive): • C++ does not support divergent class hierarchies. • One alternate is prototype objects with delegation. • RDB’s can support prototypes and delegation: • In our example, each dog object belongs to one subclass for color, and simultaneously to another subclass for size. • That is, a ‘real’ dog object simultaneously belongs to, and inherits from, exactly one of the subclasses in each inheritance tree above. • The next slide shows (by its messiness) that multiple inheritance is best avoided. Black Dog Small Dog Big Dog White Dog

Divergent Hierarchies - revisited (2) DOG CLASS Partition S2: (BIG xor SMALL) (Mutex and exhaustive): Partition S1: (BLACK xor WHITE) (Mutex and exhaustive): • Level 3 models concrete ‘leaf’ objects or ‘real’ dogs, which simultaneously belong to a distinct pair of subclasses at level 2 of the inheritance tree (compositional inheritance of properties). • So there are really 4 leaf classes at level 3, below level 2 above. • Each leaf class instance at level 3 has exactly two paths up to level 1; both paths must end up at the same root object (Dog instance). Black Dog Small Dog Big Dog White Dog Big Black Dog Big White Dog Small Black Dog Small White Dog

Composition or Implementation Inheritance • With compositional inheritance, dogs will inherit from two ‘component’ classes: Color and Size. • This is ‘impure’ multiple inheritance in C++ ( impure because the two ancestor classes have nothing in common with animals, which may not behave well as clients of Color or Size ancestor methods). • Java does not have multiple inheritance - but any class may ‘implement’ the interfaces Color’ and ‘Size’ instead. • Dogs must then be eligible to inherit (C++) or implement (Java) all the methods of the Color and Size classes - an undesirable compromise. Over-riding only hides the mis-match between class Dog and Color or Size classes.

References • Frank & Ulrich: ”Delegation: An Important Concept for the Appropriate Design of Object Models”, JOOP June 2000 (p.13-17,44) • Eliens: Principles of OO Software Dev. 2ed., AWL 2000 (Sect. 5.4: Prototypes - delegation vs. inheritance) • Kilov/Ross: Information Models, PH 1994 (Not about delegation, but covers multiple/concurrent/overlapping subclass membership.) • Lee &Tepfenhart: UML and C++: A Practical Guide to OO Dev, 2ed, PH 2001(pp206-210) (Multiple Inheritance examples Fig. 12-4,12-5) • Sanders: Data Modeling, Boyd-Fraser/ITP 1995 (Ch. 3 and Appendix) • Jiri Soukup: Taming C++, A-W 1994; see also $PH/JiriSoukup • CJDate: The RDB Model, A-W 2001 • CJDate: Database Systems, 7ed! A-W 2000 • EFCodd: Extending the RDB Model…, ACM TODS Dec’79 • Kroenke: Database Concepts, 2nd Ed, P-H 2005 Chapter 2 • Breugge & Dutoit: OO Software Engineering, A-W 2004.

Database Design &Implementation