1.32k likes | 1.46k Views
This document explores the concept of interoperability in information systems, emphasizing its critical role in enabling diverse systems to interact efficiently. It delves into essential features of interoperability, motivations for its necessity, and fundamental definitions relevant to distributed and heterogeneous systems. Semantic problems that arise due to different classification and naming conventions are addressed, along with structural and implementation challenges. The discussion extends to practical applications within relational models, highlighting the complexities faced in achieving seamless integration across various platforms.
E N D
Information Management CSC824 Part 2 Nick Rossiter b.n.rossiter@ncl.ac.uk
Interoperability 1 • Interoperability: the ability to request and receive services between various systems and use their functionality. • More than data exchange. • Implies a close integration
Interoperability 2 • Features: • exchange of messages and requests • use of each other’s functionality • client-server abilities • distribution • operate multiple systems as single unit • communication despite incompatibilities • extensibility and evolution
Motivations 1 • Diversity of modelling techniques • Distributed businesses may exercise local autonomy in platforms • Data warehousing requires heterogeneous systems to be connected • Data mining enables new rules to be derived from heterogeneous collections
Motivations 2 • Pervasive Computing : networks supporting many diverse nodes to be driven by users specifying policy and function. • Policy: statements governing how a solution will be achieved. Statements are derived from requirements • Function: mechanism for achieving objectives. • Mobile Computing: wireless networks
Motivations 3 • E-Science • Distribution of functionality transparently across many different platforms • Grid • Layered components available in network: • Computational, information and knowledge layers and probably more.
Basic Definitions 1 • Distribution: information bases are stored on multiple computer systems interconnected by a communication medium. • Homogeneous system: one that adheres to the same software at all sites. • Heterogeneous system: one that does not adhere to the same software at all sites.
Basic Definitions 2 • Autonomy: the ability of a site to control its own activities with respect to one or more of: • design • communication • execution • association
Basic Definitions 3 • Model: a representation of policies in a structured form according to some perceived view of reality e.g. • Relational model – world is tabular • Hierarchical model – world is tree-like • Security model – world is task-based • Object model – world is based on o-o paradigm
Basic Definitions 4 • Mechanism – how a particular model or policy is to be realised (implemented). The design of a system. • Implementation – the coding and compilation of a system. • ‘Instantiation’: • populating a system with data. • executing a program.
Semantic Problems in Interoperability 1 • People call different properties by different names • People classify properties differently: • Different contexts e.g. colour is property in describing a car but a table to the paint shop. • Different normalization priorities e.g. many tables optimised for updates versus few tables optimised for searching.
Semantic Problems in Interoperability 2 • People make use of facilities in different ways. For instance: • In SQL-92 can achieve uniqueness in tables by: Defining keys Modifying table storage method on various properties Defining a unique index - So many legacy problems
Further Legacy Problems • May ostensibly have systems with relational model, but may vary between: • SQL-89, SQL-92, SQL-1999. • Foreign key -- Primary Key for association: • 1st class definition only in SQL-92, SQL-1999 • Inheritance -- UDT: • 1st class definition only in SQL-1999
Constraints and Types • May differ between systems: • e.g. student ids may be held as: • integers (leading zeros removed) 65275 • integers (padded out with so many leading zeros) 0065275 • strings (fixed length) ‘0065275’ • Ids may have checksum function or not
Semantic Problems in Interoperability 3 • Structural problems are bad enough. But also: • Functionality can be applied in many different ways: • Procedures or functions; • different module layout. • Rules can be in: • Model structures, model coding, procedures or application programs.
Relational Model Definitions Relational Table Definitions Format of command (upper-case entered literally, lower-case to be substituted by user, [..] indicates optional) is: CREATE TABLE rt (a1 type [nn], a2 type [nn], ..., an type [nn], PRIMARY KEY (ak, al, ...), {[FOREIGN KEY (af, ag, ...) REFERENCES rx, ...] }) where r is table (relation) name a1 ... an are attribute names type {INT, REAL, MONEY, DATE, CHAR(p)} (plus few others in some systems) n is degree of table p is length (fixed) of character field nn = 'NOT NULL' KEYS give uniqueness and reference points.
Relational Definition CREATE TABLE EXAM ( Module_no char(6), Student_id char(10), Date_Exam date, Mark int, PRIMARY KEY (module_no, student_id, date_exam), FOREIGN KEY (module_no) REFERENCES Modules, FOREIGN KEY (student_id) REFERENCES Students )
Abstractions • Attribute is a property (classification abstraction) • Table name is aggregation (of properties) • Foreign Key is an association (relationship)
Typing • Of attributes by simple means (integer, float, …) • Of primary key attributes by uniqueness (can only be picked once from domain) • Of foreign key attributes by occurrence in another table as cross-reference
Simple Problem in Interoperability 1 • Two schemas in SQL-1999 AB author char(50) author_surname char(50) author, initials char(10) title varchar(300) title varchar(200) keyword set(char(30)) keywd array(8) (char(30)) Note: homogeneous model -- both SQL-1999 -- but difficulties.
Different Standards • For example -- Names: • Person(surname, first_name, ..) • or Person(first_name, surname, …) • or Person(name, …) • First two may easily be made equivalent but convention in third needs to be understood. • Note also possibilities of A.N.Other, AN Other, A N Other.
Possible Solutions • In schema B define function which amalgamates the two parts of author into one value. • Will need to look manually at format of author in schema A. • If format inconsistent, near some pre-processing. • Other inconsistencies require decisions: • variable set versus array dimension 8. • Different name for keyword attribute • different size for title fields (presumably adopt higher). • In heterogeneous environment, need also to relate schema constructions. Is class same as table?
Simple Problem in Interoperability 2 • Homogeneous Models • the same information may be held as attribute name, relation name or a value in different databases • e.g. fines in library; • could be held in a dedicated relation Fine(amount, borrowed_id) • or as an attribute Loan(id, isbn, date_out, fine) • or as a value Charge(1.25, ‘fine’)
Object-oriented Databases Modelling and Abstractions
O-O DB Starting Point • Persistent Programming Languages with: • programming paradigm • complex abstractions • manipulations of general data structures • theoretical basis less obvious • complex user interface • functional completeness
Relational Starting Point • Relational Data Model with: • data models • relational structuring • manipulations of relations • strong set-based theoretical basis • simple user interface • limited functionality
Evolutionary Pressures • Same user pressures in data handling requirements apply to both so resulting enhancements/softening lead to a number of similarities in end-products: • Users want: • Complex abstractions • Complete functionality • Ease of use • Reliability (hence provability -- hence theory)
Thrust • So thrust is to provide database systems with: • Underlying complex structures • Powerful manipulation mechanisms AND • Declarative manipulation languages
Ideal OODBMS Properties • Main drive came in 1980s. • Ideal properties of OODBMS: • object-oriented (programming) system • persistence for (some) objects • fast retrieval of persistent objects • concurrency (transactions) • high-level (declarative) query language
Alternative Approaches 1 • Adapt imperative: take an imperative programming language and add library extensions through embedded techniques. All structures and database functionality are defined in this way. So more extensive add-ons for database functionality than in embedded SQL. • (Example O2 -- extensions to C).
Alternative Approaches 2 • Adapt o-o: take an o-o language and add library facilities for additional classes to provide persistence, aggregation, .. (Examples: Ontos, Versant, ObjectStore). Note -- not quite like embedded SQL as that defines only extra functionality not structures as well.
Alternative Approaches 3 • Evolve: take an o-o language and add features 2-5 above directly into the language; that is, extend the language with database 'extras' as first-class facilities. • Example: GemStone which extends Smalltalk, Java, C++
Alternative Approaches 4 • Revolutionise: start from scratch and develop an o-o database system with required facilities, independent of existing programming languages. Based on object and semantic models. • Example SIM -- Semantic Information Manager
Alternative Approaches 5 • Adapt Relational • e.g. Object-relational model • SQL-1999 • Start with SQL92 and introduce: • User-defined types (UDT) • Inheritance (sub-types) • Complex objects, references
An Example Object-oriented Database System Objectivity/C++
Overview • From Objectivity, inc. • Available for unix, VMS, Windows • Supports C++, Java, SmallTalk • Classification: object-oriented database system derived from C++ by making objects persistent; SQL-like language provided for declarative interface. Pre-compiler. • Approach 2 (adapt o-o). • Newcastle University (UCS) had this system (unix) on trial on Aidan. Used in CSC313 in 1999.
Object Lifetimes by Class Either: • Persistent-capable: • whose objects may have a lifetime greater than that of the programs which create them. • Non-persistent capable: • whose objects cannot be made persistent directly but can be made persistent in the federated database as, for instance, data member-types. • Transient: • whose objects have a lifetime no greater than that of the programs which creates them.
Federated Database • The federated database is the basic unit, holding potentially many databases defined by many schema. • Object identifiers are unique within the federated database.
Objectivity File Structures for Persistent Data Federated DB 1:N Database (D) Container (C ) Basic Objects, held in slot addresses (S) on pages (P)
Addressing - Federated Database • Schema for federation: • Catalogue of databases in federation + their export schemas • Database: • held in 1+containers – complete schemas • Container: • physical layout in terms of pages allocated • Pages: • Unit of storage for disk fetches and stores • Objects: • persistent objects with object identifiers (OIDs)
Object Identifiers • OIDs: addresses within a page (slot number) • OIDs: addresses D-C-P-S (database-container-page-slot) • Total 64 bits (16 bits per level) • e.g. • 03-05-26-32 • addresses object in slot 32 of page 26 in container 05 in database 03.
Object-oriented DBMS Objectivity (continued)
Federated Database -- Addressing • Schema for federation • Catalogue of databases in federation + their schemas • Database: held in 1+containers (D) • Container: determines physical layout (C) • Objects -- persistent held on pages (P) • OIDs: addresses D-C-P-S (S is slot number) • Total 64 bits (16 bits per level)
Persistent Capable Classes • Define a class and inherit persistent properties from a predefined Objectivity/C++ class ooObj • Example: • class employee : public ooObj • // inherits persistence from ooObj • class manager : public employee • //inherits properties, functions and persistence from employee
Persistent-capable classes • Create a data definition file (.ddl) for each such class. • If the classes already exist in non-database environment as .h files, then simply change extension to .ddl for use in Objectivity/C++.
DDL Processor • Takes as input a .ddl file and outputs: • .A header file (.h) -- the original file with added Objectivity member functions for storing, retrieving and modifying objects. • A secondary header file (_ref.h) -- ooRef for object reference declarations -- included also as part of (.h) -- may be needed explicitly for forward declarations (boot-strap problem). • .A C++ implementation file -- (_ddl.c) for unix -- implements in C++ Objectivity member functions declared in the header file; result is to be later compiled and linked with C++ application.
Setting up a Database: Unix DDL Processor • ls employee.ddl • oonewfd -fdfilepath company.FDB -lockserverhost machine95 company • ls company employee.ddl company.FDB • ooddlx employee.ddl company • ls company company.FDB employee.ddl employee.h employee_ref.h employee_ddl.c
Notes on DDL processor • oonewfd -- tool -- sets up boot file and registers database • ooddlx -- DDL processor • machine95 -- lock server machine • company is boot (start-up) file • company.FDB is federated database file • employee.ddl is original (input) class definition • employee.h is output (enhanced) class definition • employee_ref.h is output reference class • employee_ddl.c is C++ implementation file
Similarity to Embedded SQL • Some similarities to Ingres/ESQL but database is held in your disk space with Objectivity.