Basis for distributed database technology
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Basis for Distributed Database Technology PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

Basis for Distributed Database Technology. Database System Technology (DST) controlled access to structured data aims towards centralized (single site) computing Computer Networking Technology (CNT) facilitates distributed computing goes against centralized computing

Download Presentation

Basis for Distributed Database Technology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Basis for distributed database technology

Basis for Distributed Database Technology

  • Database System Technology (DST)

    • controlled access to structured data

    • aims towards centralized (single site) computing

  • Computer Networking Technology (CNT)

    • facilitates distributed computing

    • goes against centralized computing

  • Distributed Database Technology = DST + CNT

    • aims to achieve integration without centralization


What is distributed

What is distributed?

  • Processing Logic

  • Function

  • Data

  • Control

    All the above modes of distribution are necessary and important for distributed database technology


Distributed database system

Distributed database system

A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network.

A distributed database management system (DDBMS) is a software system that permits the management of the distributed databases and makes the distribution transparent to the users.


What is not a ddbms

What is not a DDBMS?

A DDBMS is not a “collection of files” that can be stored at each node of a computer network.

A multiprocessor system based DBMS (parallel database system) is not a DDBMS.

A DDBMS is not a system wherein data resides only at one node.


Aims of distributed dbms transparent management of distributed replicated data

Aims of Distributed DBMS - Transparent Management of Distributed & Replicated Data

Transparency refers to separation of the higher-level semantics of a system from lower-level implementation details.

From data independence in centralized DBMS to fragmentation transparency in DDBMS.

Who should provide transparency? - DDBMS!


Aims of distributed dbms reliability through distributed transactions

Aims of Distributed DBMS - Reliability through Distributed Transactions

Distributed DBMS can use replicated components to eliminate single point failure.

The users can still access part of the distributed database with “proper care” even though some of the data is unreachable.

Distributed transactions facilitate maintenance of consistent database state even when failures occur.


Aims of distributed dbms improved performance

Aims of Distributed DBMS - Improved Performance

Since each site handles only a portion of a database, the contention for CPU and I/O resources is not that severe. Data localization reduces communication overheads.

Inherent parallelism of distributed systems may be exploited for inter-query and intra-query parallelism.

Performance models are not sufficiently developed.


Aims of distributed dbms easier system expansion

Aims of Distributed DBMS - Easier System Expansion

Ability to add new sites, data, and users over time without major restructuring.

Huge centralized database systems (mainframes) are history (almost!).

PC revolution (Compaq buying Digital, 1998) will make natural distributed processing environments.

New applications (such as, supply chain) are naturally distributed - centralized systems will just not work.


Complicating factors

Complicating Factors

Data may be replicated in a distributed environment. Therefore, DDBMS is responsible for (i) choosing one of the stored copies of the requested data, and (ii) making sure that the effect of an update is reflected on each and every copy of that data item.

Maintaining consistency of distributed/replicated data.

Since each site cannot have instantaneous information on the actions currently carried out in other sites, the synchronization of transactions at multiple sites is harder than centralized system.

and Complexity, Cost, Distribution of control, Security,...


Problem areas

Problem Areas

Distributed Database Design

Distributed Query Processing

Distributed Directory Management

Distributed Concurrency Control

Distributed Deadlock Management

Reliability of Distributed Databases

Operating Systems Support

Heterogeneous Databases


Relationship among problems

Relationship among Problems

Directory Management

Query Processing

Distributed DB Design

Reliability

Concurrency Control

Deadlock Management


Transparency and architecture issues in ddbmss

Transparency and Architecture issues in DDBMSs


Top down ddbms architecture classical

Global Schema

Fragmentation Schema

Allocation Schema

Local Mapping Schema I

DBMS I

Local Mapping Schema I

DBMS I

Local Database I

Top-Down DDBMS Architecture - Classical

Site Independent

Schemas

Other sites

Local Database 2

Site 1

Site 2


Top down ddbms architecture classical1

Top-Down DDBMS Architecture - Classical

Global Schema: a set of global relations as if database were not distributed at all

Fragmentation Schema: global relation is split into “non-overlapping” (logical) fragments. 1:n mapping from relation R to fragments Ri.

Allocation Schema: 1:1 or 1:n (redundant) mapping from fragments to sites. All fragments corresponding to the same relation R at a site j constitute the physical image Rj. A copy of a fragment is denoted by Rji.

Local Mapping Schema: a mapping from physical images to physical objects, which are manipulated by local DBMSs.


Global relations fragments and physical images

R

R1

R11

R1

(Site 1)

R12

R2

R21

R2

(Site2)

R3

R22

R32

R3

(Site3)

Global

Relation

Fragments

R33

Physical Images

Global Relations, Fragments and Physical Images

  • Separating concepts of fragmentation and allocation

  • Explicit control of redundancy

  • Independence from local databases

  • Allows for:

  • Fragmentation Transparency

  • Location Transparency

  • Local Mapping Transparency


Rules for data fragmentation

Rules for Data Fragmentation

Completeness: All the data of the global relation must be mapped into fragments.

Reconstruction: It must always be possible to reconstruct each global relation from its fragments.

Disjointedness: It is convenient if the fragments are disjoint so that the replication of data can be controlled explicitly.


Types of data fragmentation

Types of Data Fragmentation

  • Vertical Fragmentation

  • Projection on relation (subset of attributes)

  • Reconstruction by join

  • Updates require no tuple migration

  • Horizontal Fragmentation

  • Selection on relation (subset of tuples)

  • Reconstruction by union

  • Updates may requires tuple migration

  • Mixed Fragmentation

  • A fragment is a Select-Project query on relation.

Vertical Fragmentation

Horizontal Fragmentation


Levels of distribution transparency

Levels of Distribution Transparency

Fragmentation Transparency: Just like using global relations.

Location Transparency: Need to know fragmentation schema; but need not know where fragments are located. Applications access fragments (no need to specify sites where fragments are located).

Local Mapping Transparency: Need to know both fragmentation and allocation schema; no need to know what the underlying local DBMSs are. Applications access fragments explicitly specifying where the fragments are located.

No Transparency: Need to know local DBMS query languages, and write applications using functionality provided by the Local DBMS


Why is support for transparency difficult

Why is support for transparency difficult?

There are tough problems in query optimization and transaction management that need to be tackled (in terms of system support and implementation) before fragmentation transparency can be supported.

Less distribution transparency the more the end-application developer needs to know about fragmentation and allocation schemes, and how to maintain database consistency.

Higher levels of distribution transparency require appropriate DDBMS support, but makes end-application developers work easy.


Some aspects of top down architecture

Some Aspects of top-down architecture

Distributed database technology is an “add-on” technology, most users already have populated centralized DBMSs. Whereas top down design assumes implementation of new DDBMS from scratch.

In case of OODBMs, top-down architecture makes sense because most OODBMs are going to be built from scratch.

In many application environments, such as semi-structured databases, continuous multimedia data, the notion of fragment is difficult to define.

Current relational DBMS products provide for some form of location transparency (such as, by using nicknames).


Bottom up architecture present future

Bottom up Architecture - Present & Future

Possible ways in which multiple databases may be put together for sharing by multiple DBMSs.

The DBMSs are characterized according to

  • Autonomy - degree to which individual DBMSs can operate independently. Tightly coupled - integrated (A0), Semiautonomous -federated (A1), Total Isolation - multidatabase systems(A2)

  • Distribution - no distribution - single site (D0), client-server - distribution of DBMS functionality (D1), full distribution - peer to peer distributed architecture(D2)

  • Heterogeneity - homogeneous (H0) or heterogeneous (H1)


Distributed dbms implementation alternatives

Distributed DBMS Implementation Alternatives

Distribution

(A0,D2,H0)

(A2,D2,H1)

Autonomy

Heterogeneity


Architectural alternatives

Architectural Alternatives

(A0,D0,H0): multiple DBMSs that are logically integrated at single site - composite systems.

(A0,D0,H1): multiple database managers that are heterogeneous but provide integrated view to the user.

(A0,D1,H0): client-server based DBMS.

(A0,D2,H0): Classical distributed database system architecture.

(A1,D0,H0): Single site, homogeneous, federated database systems - not realistic.

(A1,D0,H1): heterogeneous federated DBMS, having common interface over disparate cooperating specialized database systems.


Architectural alternatives1

Architectural Alternatives

(A1,D1,H1): heterogeneous federated database systems with components of the systems placed at different sites.

(A2,D0,H0): homogeneous multidatabase systems at a single site.

(A2,D0,H1): heterogeneous multidatabase systems at a single site.

(A2,D1,H1) & (A2,D2,H1): distributed heterogeneous multidatabase systems. In case of client-server environments it creates a three layer architecture. Interoperability is the major issue.

Autonomy, distribution, heterogeneity are orthogonal issues.


Client server database systems

Client/Server Database Systems

Distinguish and divide the functionality to be provided into two classes: server functions and client functions. That is, two level architecture. Made popular by relational DBMS implementations.

DBMS client: user interface, application, consistency checking of queries, and caching and managing locks on cached data.

DBMS Server: handles query optimization, data access and transaction management.

Typical scenarios: multiple clients/single server; multiple client/multiple servers (dedicated home-server or any server)


Basis for distributed database technology

Client/Server Reference Architecture

User Interface

Application Program

Operating System

Client DBMS

Communication software

SQL Queries

Result Relation

Communication software

Semantic Data Controller

Query Optimizer

Operating

Transaction Manager

Recovery Manager

Runtime Support Processor

System

Database


Distributed database reference architecture

GCS

ES1

ES2

ESn

LCS1

LCS2

LCSn

LIS1

LIS2

LISn

Distributed Database Reference Architecture


Basis for distributed database technology

User

Global Conceptual Schema

Local Internal Schema

External Schema

Local Conceptual Schema

Global Execution Monitor

Runtime Support Processor

Global Query Optimizer

Local Query Processor

User Interface Handler

Semantic Data Controller

Local Recovery Manager

Components of Distributed DBMS

System Responses

User Requests

User Processor

GD/D

Data Processor

System Log

Database


Mdbs architecture with global schema

GCS

GES1

GES2

GES3

LCS1

LCSn

LIS1

LISn

LES11

LES12

LES13

LESn1

LESn2

LESn3

MDBS Architecture With Global Schema


Mdbs architecture without global schema

ES1

ES2

ESn

LCS1

LCS2

LCSn

LIS1

LIS2

LISn

MDBS Architecture without Global Schema

Multidatabase

Layer

Local Database

System Layer


Components of mdbs

User

Runtime Support Processor

Runtime Support Processor

Transaction Manager

Transaction Manager

Recovery Manager

Recovery Manager

Query Processor

Query Processor

Scheduler

Scheduler

Components of MDBS

System Responses

User Requests

Multi-DBMS Layer

Database

Database


Basis for distributed database technology

Global Directory Issues

Directory is itself a database that contains meat-data about the actual data stored in the database. It includes the support for fragmentation transparency for the classical DDBMS architecture.

Directory can be local or distributed.

Directory can be replicated and/or partitioned.

Directory issues are very important for large multi-database applications, such as digital libraries.


Basis for distributed database technology

Impact of new technologies

Internet and WWW

  • Semi-structured data, multimedia data

  • Keyword based search - browsing versus querying

  • What does integration mean?

    Applied technologies

  • Workflow systems

  • Data warehousing & Data mining

  • What is the role of distributed database technology?


Basis for distributed database technology

Research Issues - DDBMS Technology

Evaluation of state of the art data replication strategies.

On-line distributed relational database redesign.

Distributed object-oriented database systems - design (fragmentation, allocation), query processing (methods execution, transformation), transaction processing

WWW and Internet - transparency issues, implementation strategies (architecture, scalability), On-line transaction processing, On-line analytical processing (data warehousing , data mining), query processing (STRUDEL, WebSQL), commit protocols


Basis for distributed database technology

Research Issues - Applications

Workflow systems - High throughput (supply chain, Amazon,..) short, sweet, and robust versus ad-hoc (office automation) problem solving.

Electronic commerce - reliable high throughput, distributed transactions.

Distributed multimedia - QoS, real-time delivery, design and data allocation, MPEG-4 aspects.


  • Login