Distributed database systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 66

Distributed Database Systems PowerPoint PPT Presentation


  • 151 Views
  • Uploaded on
  • Presentation posted in: General

Distributed Database Systems. A Distributed Database on a Geographically Dispersed Network. A Distributed Database on a Local Network. A Multi-Processor System. Types of Accesses to a Distributed Database. Distributed Access Plan. At site 1 Send sites 2 and 3 the supplier number SN

Download Presentation

Distributed Database Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Distributed database systems

Distributed Database Systems


A distributed database on a geographically dispersed network

A Distributed Database on a Geographically Dispersed Network


A distributed database on a local network

A Distributed Database on a Local Network


A multi processor system

A Multi-Processor System


Types of accesses to a distributed database

Types of Accesses to a Distributed Database


Distributed access plan

Distributed Access Plan

  • At site 1

    Send sites 2 and 3 the supplier number SN

    2)At sites 2 and 3

    Execute in parallel, upon receipt of the supplier number, the following program:

    Find all PARTS records having

    SUP # = SN;

    Send result to site 1

    3)At Site 1

    Merge results from sites 2 and 3;

    Output the result.


Components of a commercial ddbms

Components of a Commercial DDBMS


Data distribution

Data Distribution

Problem:

Choose a unit of the logical database to use for assignment to data modules.

Possibilities:

Relations –Distribution issues will influence logical database design.

Columns –Distribution issues will influence logical database design.

Rows –Too many; Directories become too large.

Data Items -Too many; Directories become too large.


Data distribution1

Fragments – Logically defined rectangular subsets of relations

Fragment 1

Fragment 2

Fragment 3

Relation 1

Fragment 1

Fragment 2

Relation 2

Data Distribution


Data distribution2

Logical definition of fragments -

Name

Age

$

Job-Title

Supervisor

Dept.

Jones

35

32K

Salesman

Black

A

$ > 30K

Fragment 1

$ < 30K

Fragment 2

Fragment 3

Data Distribution


Data distribution3

Data Distribution

Datamodules

DM1

DM2

DM3

F1

F2

F3

F1

F2

Personnel

Inventory

Assignment of Fragments to Datamodules


Data distribution4

Advantages of fragments as units of distribution.

Very flexible in size and definition.

Distribution choices are largely independent of logical design.

Data Distribution


System considerations

System Considerations

  • Reliable Network

  • Pipelining

    Logical Data Items

    Database Operations:Read

    Write

    Transactions:Read Set

    Write Set

    Atomic – “All or Nothing” Effect


System considerations cont d

System Considerations (cont’d)

Each site in the DDBMS has one or both of the following software modules:

  • Transaction Manager (TM)

  • Data Manager (DM)

    TM’s

  • Read, Parse, and Optimize user queries

  • Handle all interface with the user

    DM’s

  • Maintain physical database

  • Perform actual reads and writes


System considerations cont d1

Transaction

TM

DM

Data

Data

Data

Transaction

DM

TM

Transaction

Transaction

TM

DM

TM’s communication only with DM’s

DM’s communication only with TM’s

System Considerations (cont’d)


Transaction execution

Transaction Execution

TransactionTM’s Action.

BeginSet up temporary workspace.

Read (X)Select a DM which stores X,

Send a message to this DM requesting X,

Place X in workspace.

Read (X)No Action necessary

X is already in workspace.

Write (X)Change the value of X.

Read (X)No action necessary.

EndSend a pre-commit to each DM that stores a copy of X,

Await acknowledgements,

Send commit message


Optimal file allocation in a distributed database system

Optimal File Allocation In A Distributed Database System

  • Given a number of computers that process common information files, how can we:

    • allocate the files optimally so that the allocation yields minimum overall operating costs (storage and communication)?

    • meet access time requirements for each file?

    • not exceed the storage capacity of each computer?

      Note: A File may be viewed as a segment.


System parameters

System Parameters

  • n Computers

  • m Files

    • Size of each file

    • Usage distribution for each file at each computer

    • Frequency of modification of each file at each computer during usage

    • Access time requirement for each file at each computer

      Storage capacity of each computer.

      Cost of storage per unit file length per computer.

      Cost of transmission per unit file length per second per pair of computers.


Model

Model

COSTS

Total Cost= Storage Costs + Transmission Costs

TC= CS + CT

Transmission Costs = Costs for Retrievals + Cost for Updates

CT= CTR + CTU

CONSTRAINTS

  • Each file must be stored in at least one computer.

  • The storage capacity of each computer must not be exceeded.

  • The probability of exceeding the required access time for each file must be less than a specified bound.


Mathematical representation model

Mathematical Representation Model


Transmission paths between each pair of computers

Transmission Paths Between Each Pair of Computers


Reliability constraint

Reliability Constraint

Assuming processors and channels each have identical reliability,

ap = availability of the processor

ac = availability of the channel

rj = # of redundant copies of the jth file

Aj = Availability of the jth file

Aj= ap [1 - (1 - acap)rj

For example ap = 0.98, ac = 0.99, then

Aj = 0.951 for rj = 1

Aj = 0.979 for rj = 2


File directory for distributed databases

File Directory for Distributed Databases


Distributed database systems

Legend

High-Level Request

Standard Database Call

Physical Access Call

Non-Local Request

User Transaction

DDBMS

Transaction Manager

Directory Manager

To Other Nodes

Database Manager

Directory Fragment

Database

Overview of the Directory Manager


Content of directory

Content of Directory

  • Global description

  • Fragmentation description

  • Allocation description

  • Mappings to local names

  • Access method description

  • Statistics on the database

  • Consistency information


Content of a directory system

Content of a Directory System

Security

(File, User, C);

C=Read/Write;

Read Only;

Write Only;

Operation

Compression ratio (Logical Operation Query Data Value);

Query Access Optimizer;

Statistical Data Gathering;

Protocols

Logical (Dynamic)

File Status (R, W)

Number of Backlog Jobs;

Site Availability;

Resource Requirement;

Processing Cost;

Communication Cost;

Translation Cost;

Physical (Static)

Location (Site, Copy #, Disk, Page);

Creator;

Creation Date;

Version of the File Size;

Code Format;

Date of Last Update;


The functional objectives of integrated dictionary directory

The Functional Objectives ofIntegrated Dictionary/Directory

  • To support the control of data resources

    • Maintaining data independence, security, and integrity

  • To support applications development

    • Offering standardized data definitions and usage characteristics

    • Established program entities, DDL

  • To provide independence of directory data elements

    • Different hardware and software environments

    • Changes in these environments


Possible data types in idd

Possible Data Types In IDD

  • Data names, definitions, formats and sizes.

  • Integrity constraints, authorization tables, and usage statistics for transaction management.

  • Schemas and sub-schemas.

  • Description of standardized transactions and reports.

  • Characteristics of hardware, such as processors, lines, and terminals.

  • Description of users.

  • The IDD must support the maintenance of relationships between various entities such as:

    • Associations between

      • Authorization tables and data,

      • Users and transactions

      • Reports

    • The IDD supplies version control


Distributed database systems

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Entity

Entity

Relationship

Figure 1


Distributed database systems

Comments

Entity Created 820114

Social Security Number

Entity Created 820519

Maximum Length 400 Characters

Relationship Created 820708

Payroll Record

Contains

Length

9 Characters

Figure 2


Distributed database systems

SchemaModelLevel

SchemaLevel

DictionaryLevel

Typical

Entities, Relationships, and Attributes

Typical

Entity-Types, Relationship-Types,and Attribute-Types

Typical

Meta-Entity-Types

Social-Security-Number

Agency-Name

Element

Employee Record

Payroll Record

Entity-Type

Record

Form 1040

FIPS Guideline

Document

Payroll-Record-Contains-Employee-Name

Relationship-Type

Record-Contains-Element

Length

9 Characters

Attribute-Type

Creator

ADP Division

Table 1


Classes of directory

Classes of Directory

  • Centralized Directory

    • Single Master Directory

    • Extended Centralized Directory

    • Multiple Master Directory

  • Local Directory

  • Distributed Directory


Causes for directory update

Causes For Directory Update

  • Changing the description or structure of the user database.

  • Moving user database entities from one node to another.

  • Changing the description of a user or node.

  • Changing a user view.

  • Changing a network node’s status.


Specific drawbacks with globally replicated directories

Specific Drawbacks with Globally Replicated Directories

  • Additional remote activity to maintain directory coherence.

  • Difficulty of posting directory changes to a down site.

  • Difficulty of integrating a new site.

  • Storage of directory entries where they are not referenced.

  • Blurred responsibility for maintaining the directory.


Performance measure

Performance Measure

Operating Cost/Unit Time = Communication Cost

(Query+Update)

+Storage Cost + Code Translation Cost(Query+Update)

Response Time


Operating cost for the centralized directory system

Operating Cost for the Centralized Directory System


Cost trade offs of directory systems

Cost Trade-offs of Directory Systems

Assume

  • Communication cost much greater than storage cost

  • No Translation cost

  • All computers have same directory update rate

    Then the cost trade-off point is at directory update rate.

    P(C,EC) = 2/(N – 1)b

    P(C,D) = 2/(N – 1)

    P(L,D) = 1


Distributed database systems

Type

Centralized

Extended Centralized

Multiple Master

Distributed Master

Localized

Description

Single Master directory

Advantages

Disadvantages

Simplicity

Ease of update

Transmission costs and delays

Variation of the centralized case in which the directory information is permanently appended in the local node once it is obtained from the master directory

Reduces transmission costs and delays

Coordinating updates of local directories

Knowledge of appended directories

Variation of the centralized case in which redundant copies of the master directory exist

Reduces transmission costs and delays

Fall-soft Characteristics

Storage requirements

Coordinating update of redundant copies

Master at every node

Fast Response

Storage costs

Transmission costs for updates to the directory

Local directory at each node without replication

Simple update procedure

Transmission costs for non-local queries

Directory Design Alternatives


Distributed ingres dictionary directory contain four types of data

Distributed Ingres Dictionary/Directory Contain Four Types of Data:

  • Relation name and location

  • Information for parsing queries

    (domain names, formats, etc.)

  • Performance information

    (number of tuples, storage structures, etc.)

  • Consistency information

    (protection, integrity constraints, etc. Does not include control data for concurrency control and synchronization)


Sdd 1 dictionary directory

SDD-1 Dictionary/Directory

The directory itself is defined and maintained like any other user data. It can be logically fragmented, distributed, and replicated across the distributed DBMS’s.

A directory locator (a small highly static file of directory fragment locations) is kept at every site and is used by the TMs and DMs to plan and control transactions and to help ensure DB integrity and consistency across concurrent accesses of data elements.

The transaction modules are capable of caching remotely accessed directory data for subsequent usage. This facility is provided on the presumption that DB operations will exhibit the locality-of-reference characteristic.


Distributed database systems

Vpatient : Patient Class

name

SSN

age

patID

{report}

PatientDB1

name

SSN

age

PatientDB2

name

SSN

patID

PatReportDB2

patID

report

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 17: Pictorial diagram showing usefulness of keys.


Distributed database systems

personDB1

name

sex

age

ssn

Vperson : PersonClass

name

sex

age

ssn

job

Character_to_String

Character_to_String

personDB2

name

gender

ssn

job

LargePositiveInteger_to_String

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

People

V person

Virtual Collection

Figure 15: Pictorial diagram showing correspondence between virtual and real attributes.


Distributed database systems

Vretiree:retireClass

financeDB1

name

stockAmount

name

income

Vincome: incomeClass

financeDB2

stockAmount

pension

name

pension

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 18: Pictorial diagram for aggregation.


Distributed database systems

Vname: nameClass

first

middle

last

personDB1

name

getfirst

getmiddle

getlast

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 19: Pictorial diagram of computed attribute.


Distributed database systems

financeDB1

name

stockAmount

1

Vretiree:retireClass

name

income

financeDB2

name

pension

2

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 20: Pictorial diagram of computed attribute.


Distributed database systems

carInsuranceDB1

carOwner

amount

Vinsurance:insuranceClass

name

{insuranceAmounts}

houseInsuranceDB2

houseOnwer

amount

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 21: Pictorial diagram showing grouping.


Distributed database systems

Vpatient : patientClass

name

{doctors}

patientDB1

name

docID

(key)

patientDB2

(pointer)

name

physician

relationship

patientDB1

Vdoctors : doctorClass

name

docID

name

docID

salary

patientDB1

name

salary

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 22: Pictorial diagram showing relationship.


Distributed database systems

VtreatedBy : treatedByClass

patientDB1

(key)

patient

doctor

amountOwed

name

docID

amountOwed

(key)

Vpatient : PatientClass

Vdoctor : DoctorClass

.

.

.

.

.

.

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 23: Pictorial diagram showing a named relationship.


Distributed database systems

VpersonPatient : personClass

name

patientDB1

name

SSN

payment

Vpatient : patientClass

patID

amount

VpersonDoctor : personClass

name

doctorDB2

name

docID

salary

Vdoctor : DoctorClass

docID

salary

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

person

patient

doctor

VpersonPatient

Vpatient

Vdoctor

VpersonDoctor

Virtual collections

Figure 24: Pictorial diagram showing relationship.


Distributed database systems

ConceptSemType

conceptID

semTypeID

Vconcept

(key)

conceptID

semType

{termSet}

Vterm

termID

{stringSet}

Concept

conceptID

termID

stringType

stringID

stringVal

Vstring

stringName

stringID

stringType

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 30: Derivation of Virtual Entity Vconcept.


Distributed database systems

SemTypeDef

ID

name

definition

DsemType

ID

name

definition

{relatedTo}

SemTypeRel

DsemRelate

name1

rel

name2

status

relName

semName

status

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 31: Derivation of Virtual Entity VsemType.


  • Login