1.32k likes | 1.46k Views
Information Management CSC824. Part 2 Nick Rossiter b.n.rossiter@ncl.ac.uk. Interoperability in Information Systems. Interoperability 1. Interoperability: the ability to request and receive services between various systems and use their functionality. More than data exchange.
E N D
Information Management CSC824 Part 2 Nick Rossiter b.n.rossiter@ncl.ac.uk
Interoperability 1 • Interoperability: the ability to request and receive services between various systems and use their functionality. • More than data exchange. • Implies a close integration
Interoperability 2 • Features: • exchange of messages and requests • use of each other’s functionality • client-server abilities • distribution • operate multiple systems as single unit • communication despite incompatibilities • extensibility and evolution
Motivations 1 • Diversity of modelling techniques • Distributed businesses may exercise local autonomy in platforms • Data warehousing requires heterogeneous systems to be connected • Data mining enables new rules to be derived from heterogeneous collections
Motivations 2 • Pervasive Computing : networks supporting many diverse nodes to be driven by users specifying policy and function. • Policy: statements governing how a solution will be achieved. Statements are derived from requirements • Function: mechanism for achieving objectives. • Mobile Computing: wireless networks
Motivations 3 • E-Science • Distribution of functionality transparently across many different platforms • Grid • Layered components available in network: • Computational, information and knowledge layers and probably more.
Basic Definitions 1 • Distribution: information bases are stored on multiple computer systems interconnected by a communication medium. • Homogeneous system: one that adheres to the same software at all sites. • Heterogeneous system: one that does not adhere to the same software at all sites.
Basic Definitions 2 • Autonomy: the ability of a site to control its own activities with respect to one or more of: • design • communication • execution • association
Basic Definitions 3 • Model: a representation of policies in a structured form according to some perceived view of reality e.g. • Relational model – world is tabular • Hierarchical model – world is tree-like • Security model – world is task-based • Object model – world is based on o-o paradigm
Basic Definitions 4 • Mechanism – how a particular model or policy is to be realised (implemented). The design of a system. • Implementation – the coding and compilation of a system. • ‘Instantiation’: • populating a system with data. • executing a program.
Semantic Problems in Interoperability 1 • People call different properties by different names • People classify properties differently: • Different contexts e.g. colour is property in describing a car but a table to the paint shop. • Different normalization priorities e.g. many tables optimised for updates versus few tables optimised for searching.
Semantic Problems in Interoperability 2 • People make use of facilities in different ways. For instance: • In SQL-92 can achieve uniqueness in tables by: Defining keys Modifying table storage method on various properties Defining a unique index - So many legacy problems
Further Legacy Problems • May ostensibly have systems with relational model, but may vary between: • SQL-89, SQL-92, SQL-1999. • Foreign key -- Primary Key for association: • 1st class definition only in SQL-92, SQL-1999 • Inheritance -- UDT: • 1st class definition only in SQL-1999
Constraints and Types • May differ between systems: • e.g. student ids may be held as: • integers (leading zeros removed) 65275 • integers (padded out with so many leading zeros) 0065275 • strings (fixed length) ‘0065275’ • Ids may have checksum function or not
Semantic Problems in Interoperability 3 • Structural problems are bad enough. But also: • Functionality can be applied in many different ways: • Procedures or functions; • different module layout. • Rules can be in: • Model structures, model coding, procedures or application programs.
Relational Model Definitions Relational Table Definitions Format of command (upper-case entered literally, lower-case to be substituted by user, [..] indicates optional) is: CREATE TABLE rt (a1 type [nn], a2 type [nn], ..., an type [nn], PRIMARY KEY (ak, al, ...), {[FOREIGN KEY (af, ag, ...) REFERENCES rx, ...] }) where r is table (relation) name a1 ... an are attribute names type {INT, REAL, MONEY, DATE, CHAR(p)} (plus few others in some systems) n is degree of table p is length (fixed) of character field nn = 'NOT NULL' KEYS give uniqueness and reference points.
Relational Definition CREATE TABLE EXAM ( Module_no char(6), Student_id char(10), Date_Exam date, Mark int, PRIMARY KEY (module_no, student_id, date_exam), FOREIGN KEY (module_no) REFERENCES Modules, FOREIGN KEY (student_id) REFERENCES Students )
Abstractions • Attribute is a property (classification abstraction) • Table name is aggregation (of properties) • Foreign Key is an association (relationship)
Typing • Of attributes by simple means (integer, float, …) • Of primary key attributes by uniqueness (can only be picked once from domain) • Of foreign key attributes by occurrence in another table as cross-reference
Simple Problem in Interoperability 1 • Two schemas in SQL-1999 AB author char(50) author_surname char(50) author, initials char(10) title varchar(300) title varchar(200) keyword set(char(30)) keywd array(8) (char(30)) Note: homogeneous model -- both SQL-1999 -- but difficulties.
Different Standards • For example -- Names: • Person(surname, first_name, ..) • or Person(first_name, surname, …) • or Person(name, …) • First two may easily be made equivalent but convention in third needs to be understood. • Note also possibilities of A.N.Other, AN Other, A N Other.
Possible Solutions • In schema B define function which amalgamates the two parts of author into one value. • Will need to look manually at format of author in schema A. • If format inconsistent, near some pre-processing. • Other inconsistencies require decisions: • variable set versus array dimension 8. • Different name for keyword attribute • different size for title fields (presumably adopt higher). • In heterogeneous environment, need also to relate schema constructions. Is class same as table?
Simple Problem in Interoperability 2 • Homogeneous Models • the same information may be held as attribute name, relation name or a value in different databases • e.g. fines in library; • could be held in a dedicated relation Fine(amount, borrowed_id) • or as an attribute Loan(id, isbn, date_out, fine) • or as a value Charge(1.25, ‘fine’)
Object-oriented Databases Modelling and Abstractions
O-O DB Starting Point • Persistent Programming Languages with: • programming paradigm • complex abstractions • manipulations of general data structures • theoretical basis less obvious • complex user interface • functional completeness
Relational Starting Point • Relational Data Model with: • data models • relational structuring • manipulations of relations • strong set-based theoretical basis • simple user interface • limited functionality
Evolutionary Pressures • Same user pressures in data handling requirements apply to both so resulting enhancements/softening lead to a number of similarities in end-products: • Users want: • Complex abstractions • Complete functionality • Ease of use • Reliability (hence provability -- hence theory)
Thrust • So thrust is to provide database systems with: • Underlying complex structures • Powerful manipulation mechanisms AND • Declarative manipulation languages
Ideal OODBMS Properties • Main drive came in 1980s. • Ideal properties of OODBMS: • object-oriented (programming) system • persistence for (some) objects • fast retrieval of persistent objects • concurrency (transactions) • high-level (declarative) query language
Alternative Approaches 1 • Adapt imperative: take an imperative programming language and add library extensions through embedded techniques. All structures and database functionality are defined in this way. So more extensive add-ons for database functionality than in embedded SQL. • (Example O2 -- extensions to C).
Alternative Approaches 2 • Adapt o-o: take an o-o language and add library facilities for additional classes to provide persistence, aggregation, .. (Examples: Ontos, Versant, ObjectStore). Note -- not quite like embedded SQL as that defines only extra functionality not structures as well.
Alternative Approaches 3 • Evolve: take an o-o language and add features 2-5 above directly into the language; that is, extend the language with database 'extras' as first-class facilities. • Example: GemStone which extends Smalltalk, Java, C++
Alternative Approaches 4 • Revolutionise: start from scratch and develop an o-o database system with required facilities, independent of existing programming languages. Based on object and semantic models. • Example SIM -- Semantic Information Manager
Alternative Approaches 5 • Adapt Relational • e.g. Object-relational model • SQL-1999 • Start with SQL92 and introduce: • User-defined types (UDT) • Inheritance (sub-types) • Complex objects, references
An Example Object-oriented Database System Objectivity/C++
Overview • From Objectivity, inc. • Available for unix, VMS, Windows • Supports C++, Java, SmallTalk • Classification: object-oriented database system derived from C++ by making objects persistent; SQL-like language provided for declarative interface. Pre-compiler. • Approach 2 (adapt o-o). • Newcastle University (UCS) had this system (unix) on trial on Aidan. Used in CSC313 in 1999.
Object Lifetimes by Class Either: • Persistent-capable: • whose objects may have a lifetime greater than that of the programs which create them. • Non-persistent capable: • whose objects cannot be made persistent directly but can be made persistent in the federated database as, for instance, data member-types. • Transient: • whose objects have a lifetime no greater than that of the programs which creates them.
Federated Database • The federated database is the basic unit, holding potentially many databases defined by many schema. • Object identifiers are unique within the federated database.
Objectivity File Structures for Persistent Data Federated DB 1:N Database (D) Container (C ) Basic Objects, held in slot addresses (S) on pages (P)
Addressing - Federated Database • Schema for federation: • Catalogue of databases in federation + their export schemas • Database: • held in 1+containers – complete schemas • Container: • physical layout in terms of pages allocated • Pages: • Unit of storage for disk fetches and stores • Objects: • persistent objects with object identifiers (OIDs)
Object Identifiers • OIDs: addresses within a page (slot number) • OIDs: addresses D-C-P-S (database-container-page-slot) • Total 64 bits (16 bits per level) • e.g. • 03-05-26-32 • addresses object in slot 32 of page 26 in container 05 in database 03.
Object-oriented DBMS Objectivity (continued)
Federated Database -- Addressing • Schema for federation • Catalogue of databases in federation + their schemas • Database: held in 1+containers (D) • Container: determines physical layout (C) • Objects -- persistent held on pages (P) • OIDs: addresses D-C-P-S (S is slot number) • Total 64 bits (16 bits per level)
Persistent Capable Classes • Define a class and inherit persistent properties from a predefined Objectivity/C++ class ooObj • Example: • class employee : public ooObj • // inherits persistence from ooObj • class manager : public employee • //inherits properties, functions and persistence from employee
Persistent-capable classes • Create a data definition file (.ddl) for each such class. • If the classes already exist in non-database environment as .h files, then simply change extension to .ddl for use in Objectivity/C++.
DDL Processor • Takes as input a .ddl file and outputs: • .A header file (.h) -- the original file with added Objectivity member functions for storing, retrieving and modifying objects. • A secondary header file (_ref.h) -- ooRef for object reference declarations -- included also as part of (.h) -- may be needed explicitly for forward declarations (boot-strap problem). • .A C++ implementation file -- (_ddl.c) for unix -- implements in C++ Objectivity member functions declared in the header file; result is to be later compiled and linked with C++ application.
Setting up a Database: Unix DDL Processor • ls employee.ddl • oonewfd -fdfilepath company.FDB -lockserverhost machine95 company • ls company employee.ddl company.FDB • ooddlx employee.ddl company • ls company company.FDB employee.ddl employee.h employee_ref.h employee_ddl.c
Notes on DDL processor • oonewfd -- tool -- sets up boot file and registers database • ooddlx -- DDL processor • machine95 -- lock server machine • company is boot (start-up) file • company.FDB is federated database file • employee.ddl is original (input) class definition • employee.h is output (enhanced) class definition • employee_ref.h is output reference class • employee_ddl.c is C++ implementation file
Similarity to Embedded SQL • Some similarities to Ingres/ESQL but database is held in your disk space with Objectivity.