Understanding Distributed Databases: Architecture & Data Dictionary Usage

Lecture 7 Term 2 17/2/14 Distributed Databases and Constraints

A distributed database • Appears to a user as a single database but is, in fact, a set of databases stored on multiple computers. • The data on several computers can be simultaneously accessed and modified using a network. • Each database server in the distributed database is controlled by its local DBMS, and each cooperates to maintain the consistency of the global database.

Control in Distributed DBs • Different portions of the overall database reside at different locations • these portions are controlled by different processors running sometimes different DBMSs • common schema means queries can involve any portion of the DB residing at any location

Continued… • A database server is the software managing a database, and a client is an application that requests information from a server. Each computer in a system is a node. • A node in a distributed database system can be a client, a server, or both. • Oracle supports heterogeneous client/server environments where clients and servers use different character sets.

Data Dictionary • A read-only set of tables that provides information about the database. A data dictionary contains: • The definitions of all schema objects in the database (tables, views, indexes, clusters, synonyms, sequences, procedures, functions, packages, triggers, and so on) • How much space has been allocated for, and is currently used by, the schema objects • Default values for columns • Integrity constraint information • The names of Oracle users • Privileges and roles each user has been granted • Auditing information, such as who has accessed or updated various schema objects • Other general database information

Use of a Data Dictionary • The data dictionary has three primary uses: • Oracle accesses the data dictionary to find information about users, schema objects, and storage structures. • Oracle modifies the data dictionary every time that a data definition language (DDL) statement is issued. • Any Oracle user can use the data dictionary as a read-only reference for information about the database

Continued.. • The Oracle Database user SYS owns all base tables and user-accessible views of the data dictionary. • Data in the base tables of the data dictionary is necessary for Oracle Database to function. • Therefore, only Oracle Database should write or change data dictionary information. • No Oracle Database user should ever alter rows or schema objects contained in the SYS schema because such activity can compromise data integrity. The security administrator must keep strict control of this central account.

Site Autonomy • Each server participating in a distributed database is administered independently (for security and backup operations) from the other databases, as though each database was a non-distributed database. • Although all the databases can work together, they are distinct, separate repositories of data and are administered individually.

Database Schema • A schema is the set of metadata (data dictionary) used by the database, typically generated using DDL. A schema defines attributes of the database, such as tables, columns, and properties. A database schema is a description of the data in a database. • SCOTT is a schema that includes the EMP, DEPT and BONUS tables with various grants, and other stuff. • SYS is a schema that includes tons of tables, views, grants, etcetc etc. • SYSTEM is a schema.....

Continued

Schema Objects and Naming in a Distributed Database • A schema object (for example, a table) is accessible from all nodes that form a distributed database. • Therefore, just as a non-distributed local DBMS architecture must provide an unambiguous naming scheme to distinctly reference objects within the local database, a distributed DBMS must use a naming scheme that ensures that objects throughout the distributed database can be uniquely identified and referenced. • To resolve references to objects (a process called name resolution) within a single database, the DBMS usually forms object names using a hierarchical approach.

Continued.. • For example, within a single database, a DBMS guarantees that each schema has a unique name, and that within a schema, each object has a unique name. • Because uniqueness is enforced at each level of the hierarchical structure, an object's local name is guaranteed to be unique within the database and references to the object's local name can be easily resolved. • DDBMs simply extend the hierarchical naming model by enforcing unique database names within a network.

Database Links • To facilitate connections between the individual databases of a distributed database, Oracle uses database links. • A database link defines a "path" to a remote database. • These are transparent to users of a distributed database, because the name of a database link is the same as the global name of the database to which the link points.

Remote and Distributed Statements • A remote query is a query that selects information from one or more remote tables, all of which reside at the same remote node. • A remote update is an update that modifies data in one or more tables, all of which are located at the same remote node.

Continued.. • A distributed query retrieves information from two or more nodes. • A distributed update modifies data on two or more nodes. • A distributed update is possible using a program unit, such as a procedure or trigger, that includes two or more remote updates that access data on different nodes. • Statements in the program unit are sent to the remote nodes, and the execution of the program succeeds or fails as a unit

Remote and Distributed Transactions • A remote transaction is a transaction that contains one or more remote statements, all of which reference the same remote node. • A distributed transaction is any transaction that includes one or more statements that, individually or as a group, update data on two or more distinct nodes of a distributed database.

Two-Phase Commit Mechanism • A DBMS must guarantee that all statements in a transaction, distributed or non-distributed, are either committed or rolled back as a unit, so that if the transaction is designed properly, the data in the logical database can be kept consistent. • The effects of a transaction should be either visible or invisible to all other transactions at all nodes; this should be true for transactions that include any type of operation, including queries, updates, or remote procedure calls (RPCs).

Transparency Continued.. • A distributed DBMS architecture should also provide facilities to transparently replicate data among the nodes of the system. • Maintaining copies of a table across the databases in a distributed database is often desired so that tables that have high query and low update activity can be accessed faster by local user sessions because no network communication is necessary. • If a database that contains a critical table experiences a prolonged failure, replicates of the table in other databases can still be accessed. • A DBMS that manages a distributed database should make table replication transparent to users working with the replicated tables. • Finally, the functional transparencies explained above are not sufficient alone. The distributed database must also perform with acceptable speed.

Transparency in a Distributed Database System • The functionality of a distributed database system must be provided in such a manner that the complexities of the distributed database are transparent to both the database users and the database administrators. • Location transparency exists if a user can refer to the same table the same way, regardless of the node to which the user connects. Location transparency is beneficial for the following reasons: • Access to remote data is simplified, because the database users do not need to know the location of objects. • Objects can be moved with no impact on end-users or database applications.

Continued.. • A distributed database system should also provide query, update, and transaction transparency. For example, standard SQL commands, such as SELECT, INSERT, UPDATE, and DELETE, should allow users to access remote data without the requirement for any programming. • Transaction transparency occurs when the DBMS provides the functionality described below using standard SQL COMMIT, SAVEPOINT, and ROLLBACK commands, without requiring complex programming or other special operations to provide distributed transaction control. • The statements in a single transaction can reference any number of local or remote tables.

Distributed Databases Homogenous Environment Heterogonous Environment • In a non-heterogeneous environment, the client and server character sets should be either the same as or subsets of the main server character set In a heterogeneous environment, the NLS settings of the client, the transparent gateway and the non-Oracle data source, should be either the same or a subset of the Oracle server NLS character set

Data Replication • store a separate copy of the full tables in each location • if a copy is stored at every site: Full Replication • Advantages: • reliability • fast response • Disadvantages • storage requirements • complexity and cost of updating

Database Replication • Replication is the process of copying and maintaining database objects in multiple databases that make up a distributed database system. • Changes applied at one site are captured and stored locally before being forwarded and applied at each of the remote locations. • Replication provides user with fast, local access to shared data, and protects availability of applications because alternate data access options exist. • Even if one site becomes unavailable, users can continue to query or even update the remaining locations.

Continued • In a multimaster replication environment, the replication groups are called master groups. Corresponding master groups at different sites must contain the same set of replication objects • Master group "SCOTT_MG" contains an exact replica of the replicated objects at each master site.

Master Group

Snapshot site "Group A" at the snapshot site maintains only a partial replica of master group "Group A" at the master site, while the "Group B" snapshot and master groups maintain a complete replica

Replication Sites • A master site maintains a complete copy of all objects in a replication group. • All master sites in a multimaster replication environment communicate directly with one another to propagate data and schema changes in the replication group. • A replication group at a master site is more specifically referred to as a master group. • A replication group's master definition site is a master site serving as the control point for managing the replication group and objects in the group. • A snapshot site supports read-only and updateable snapshots of the table data at an associated master site. • A snapshot site's table snapshots can contain all or a subset of the table data within a replication group. • May contain snapshots for only selected tables in a replication group. • And a particular snapshot might be just a selected portion of a certain replicated table.

Replication Conflicts • Asynchronous multimaster and updateable snapshot replication environments must address the possibility of replication conflicts that may occur when, for example, two transactions originating from different sites update the same row at nearly the same time. • When data conflicts do occur, you need a mechanism to ensure that the conflict will be resolved in accordance with your business rules and that the data converges correctly at all sites. • In addition to logging any conflicts that may occur in your replicated environment, Oracle replication offers a variety of conflict resolution methods that will allow you to define a conflict resolution system for your database that will resolve conflicts in accordance with your business rules. • If you have a unique situation that Oracle's pre-built conflict resolution methods cannot resolve, you have the option of building and using your own conflict routines.

Horizontal partitioning • some of the rows of the tables are stored in one location; others are stored at other locations • eg: customers banking out of a particular branch • Advantages: • efficiency • local optimisation • security • Disadvantages: • inconsistent speed access • backup vulnerability

Vertical partitioning • some columns are projected into base relationship at different sites • all relations share a common domain so the full table can be reconstructed • Advantages: • tailor-made support for functional areas • same as horizontal partitioning • Disadvantages: • some queries might be very slow • users must understand some design issues

Combinations of the three methods • most of the time, companies will use different methods • each method is efficient in certain situations + some other security requirements • eg: local customers, information originating at a certain site, shared processes that require the same data at all sites • it is a design issue to try to identify the optimal distribution - data at the sites where it is used most

Examples of Industries using DDBs • Manufacturing, especially multi-plant manufacturing • Military command and control • Airlines • Hotel chains • Any organization which has a decentralized organization structure

Promises of DDBs • Higher reliability • Replication of components • No single points of failure e.g., a broken communication link or processing element does not bring down the entire system • Distributed transaction processing guarantees the consistency of the database and concurrency Improved performance • Proximity of data to its points of use – Reduces remote access delays – Requires some support for fragmentation and replication • Parallelism in execution – Inter-query parallelism – Intra-query parallelism • Update and read-only queries influence the design of DDBSs substantially – If mostly read-only access is required, as much as possible of the data should be replicated – Writing becomes more complicated with replicated data

Continued.. Easier system expansion • Issue is database scaling • Emergence of microprocessor and workstation technologies • Network of workstations much cheaper than a single mainframe computer • Data communication cost versus telecommunication cost • Increasing database size

Continued… Transparency of distributed and replicated data • Refers to the separation of the higher-level semantics of the system from the lower-level implementation issues • A transparent system “hides” the implementation details from the users. • A fully transparent DBMS provides high-level support for the development of complex applications. User reduced complexity Programmer – physical implementation (many databases)

Creating Constraints Constraints are basically rules to stop a user doing something with a table that you don’t want to happen. • Constraints are used by Oracle to enforce rules whenever a row is added, changed, or removed. • There are two types of constraints : Table constraints; and Column Constraints. Constraints (cont) • The following are the different constraints. NOT NULL DEFAULT UNIQUE PRIMARY KEY FOREIGN KEY CHECK

Constraints continued.. • The following are the different constraints. • NOT NULL • DEFAULT • UNIQUE • PRIMARY KEY • FOREIGN KEY • CHECK

Not Null Columns, by default, allow NULL values. By specify NOT NULL we are saying that NULLs are not allowing in a particular column. Create table addresses ( name varchar2(30) not null, address varchar2(50) not null, phone number(10) ); • Note that the constraint can also be given a name. Create table phones (name varchar2(30) constraint namenull not null, phone number(9));

Default If a value isn’t supplied for a column, then a default can be supplied. If we don’t use the default constraint, a NULL value would be entered. Create table addresses (name varchar2(30) not null, address varchar2(50) default 'None Given', phone number(10) ); • In this example, if no value is specified for address, “None Given” is inserted in the column. • The default constraint cannot be given a name

Unique • Unique constraints ensure that the contents of a column (or columns) are unique. There can be no duplicate entries in a column. An index is automatically created to handle the rule. We will discuss indexes in detail later. • Unique constraints can be applied at the column level or at the table level. At the column level we simply add the constraint after the column definition. For table level, we are putting a constraint on two or more columns, so we add the constraint after all the column definitions.

Unique (cont) • The following is an example of a column level unique constraint. create table phone (name char(20) constraint onename unique, phonenumnumber(10)); • Nulls are allowed in a column with a unique constraint. • The following is an example of a table level constraint create table phones (firstname char(10), secondname char(20), course char(15), startdate date, constraint onename unique(firstname,secondname));

Primary Key There can only be one primary key per table and it is used to ensure that each row is distinctive. Each row is identified uniquely by its primary key. • A primary key, like a unique constraint, can be made up of one or more columns. The difference is that NULLs are allowed in single column unique constraints, there can be no nulls in a primary key.

Primary Key (cont) To create a column level constraint, we run (for example) create table phone (name char(20) constraint keyone primary key, address char(50), phone number(10)); • An example of a table level constraint is create table phone (name char(20), address char(50), phone number(10), constraint keyone primary key(name));

Foreign Key • A foreign key is used to link the information in two tables

Foreign Key (cont) The command to create the phone table is create table phone (name char(8) primary key, areanamechar(8), phone number(8), constraint fkeyarea foreign key (areaname) references area(areaname)); • This is a table level constraint, the equivalent command for a column level constraint is create table phone (name char(8) primary key, areaname char(8) constraint forkey references area(areaname),phone number(8));

Composite Foreign Keys • The idea of composite foreign keys is simply referencing more than one column. • To create composite foreign key constraints (which can only be table level) for the example above, we run Create table( … … , constraint fkeyarea foreign key (division,section) references departments(division,section) ); We can combine up to 16 columns in a composite foreign key constraint.

Referential Integrity • This means that the foreign key ensure that references between tables (links) can not “get lost”

Referential Integrity (cont.) The original constraint we used in creating the phone table was Create table( … … , constraint fkeyarea foreign key (areaname) references area(areaname) ); To allow “cascading deletions” (deleting in one table cascades into tables referencing it) we change this command to Create table( … … , constraint fkeyarea foreign key (areaname) references area(areaname) on delete cascade );

Check Constraints • Check constraints offer the most flexibility. Rather than Oracle defining how a constraint works, we can impose our own rules. For example create table money (name char(10), sal number(7,2) constraint mycheckcon check (sal > 4.8) );

Understanding Distributed Databases: Architecture & Data Dictionary Usage

Understanding Distributed Databases: Architecture & Data Dictionary Usage

Presentation Transcript

Year 5 Term 2 Unit 7

Topics in Financial Economics Term 2, Lecture 2

Mid-term Review Chapters 2-7

Lecture 8 Term 2

Lecture 9 Term 2

Lecture 9 Term 2

Lecture 4 Term 2

Year 7 Chinese Term 2, 2012

Lecture 5 Term 2

IS6172 Lecture 12 Term 2

Lecture 4 Term 2

Lecture 7 Group Life Insurance: Term Coverage

Lecture 10 Term 2

Lecture 5 Term 2

Lecture 2 Outline (Ch. 7)

Lecture 3.5: Mid-Term 2 Review

Eco 100 Lecture 7-2

Lecture 7: Features Part 2

September 7, 2006 Lecture 2

Lecture 5 Term 2

Lecture 2 Term 2

Week 7 Lecture 1+2