Distributed dbmss concepts and design
Download
1 / 40

Distributed DBMSs – Concepts and Design - PowerPoint PPT Presentation


  • 160 Views
  • Uploaded on

Distributed DBMSs – Concepts and Design. Chapter 24 in Textbook. Overview. Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design. Fragmentation. Replication.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Distributed DBMSs – Concepts and Design' - cedric-grimes


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Overview
Overview

  • Concepts.

    • What is a distributed DBMS?

    • Distributed Processing.

    • Homogeneous vs. Heterogeneous.

  • Functions of a DDBMS.

  • Components of a DDBMS.

  • Advantages and Disadvantages.

  • DDBMS Design.

    • Fragmentation.

    • Replication.

    • Allocation.

  • DDBMS Transparencies.

  • Date’s 12 Rules for a DDBMS.


Concepts
Concepts

  • Centralized DBMS systems with a single logical database located at one site under the control of a single DBMS.

  • Distributed DBslogically interrelated collection of shared data physically distributed over a computer network.

  • Applications can be classified into:

    • Local applications.

    • Global applications.


Distributed dbms
Distributed DBMS

  • Distributed DBMS The software system that:

    • manages the distributed DBs.

    • makes distribution transparent to users.

    • allows users to access data on their own site as well as remote sites.

  • Transparent distribution is the fundamental principle of DDBMS.


Characteristics of ddbms
Characteristics of DDBMS

  • A collection of logically related shared data.

  • The data is split into a number of fragments.

  • Fragments may be replicated.

  • Fragments/replicas are allocated to sites.

  • The sites are linked by a communications networks.

  • The data at each site is under the control of a DBMS.

  • The DBMS at each site can handle local applications.

  • Each DBMS participates in at least one global application.


Distributed dbms topology
Distributed DBMS Topology

Site 1

Site 4

Site 2

Computer

Network

Site 3

Data itself is distributed and access to it can be local or remote.


Distributed processing
Distributed Processing

Site 1

Site 4

Site 2

Computer

Network

Site 3

Data itself is centralized but access to it can be local or remote.


Homogeneous vs heterogeneous ddbms
Homogeneous vs. Heterogeneous DDBMS

  • Homogenous system: all sites use the same DBMS product.

  • Heterogeneous system: sites may run different DBMS products & data model.

    • Possible differences between data in different DBS:

      • Data type difference.

      • Value difference.

      • Semantic difference.


Functions of a ddbms
Functions of a DDBMS

  • Provide access to remote sites and allow transfer of queries & data among the network’s site.

  • Store data distribution details.

  • Distributed data processing.

  • Security control.

  • Concurrency control.

  • Recovery services.


Components of a ddbms
Components of a DDBMS

Global system catalog

Site 1

DDBMS

DC LDBMS

GSC

DDBMS

DC

Computer

Network

GSC

DB

Data communication component

Site 3


Advantages of ddbms
Advantages of DDBMS

  • Reflects organizational structure.

  • Improve sharability & local autonomy.

  • Improved availability.

  • Improved reliability.

  • Improved performance.


Disadvantages of ddbms
Disadvantages of DDBMS

  • Complexity.

  • Cost.

  • Security.

  • Integrity control more difficult.

  • Lack of standards.

  • Lack of experience.

  • DB design more complex.


Distributed relational db design
Distributed Relational DB Design

  • We have a group of tables and we want to distribute them between a group of sites.

  • Consists of 3 major steps:

  • Fragmentation divide a relation into a number of sub-relations (fragments). (Horizontal & vertical).

  • Replication make a copy of a fragment.

  • Allocation decide where (which site) each of the fragments and replicas are to be stored.


Distributed relational db design1
Distributed Relational DB Design

  • When we fragment, replicate and allocate, we try to achieve:

    • Locality of reference.

    • Improved reliability and availability.

    • Good performance.

    • Balanced storage capacities and costs.

    • Minimal communication costs.


Rules of fragmentation
Rules of Fragmentation

  • Completeness: Nothing (rows or columns) gets lost while we fragment.

  • Reconstruction: We can get back the original table after we fragmented it.

  • Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).


Types of fragmentation
Types of Fragmentation

Horizontal

fragmentation

Vertical

fragmentation

Mixed

fragmentation



Horizontal Fragmentation

Based on type of property.

P1: Type=‘House’ (PropertyForRent)

P2: Type=‘Flat’ (PropertyForRent)

Fragment P1

Fragment P2

BranchNo



Vertical Fragmentation

S1: staffno,Position,sex,DOB, Salary(STAFF)

S2: staffno,fname,lname,BranchNo(STAFF)

Fragment S2

Fragment S1


Mixed Fragmentation – Vertical then Horizontal

Fragment S2.1

S1: staffno,Position,sex,DOB, Salary(STAFF)

S2: staffoo,fname,lname,BranchNo(STAFF)

StaffNo

LName

BranchNo

FName

S2.1:  BranchNo=‘B005’ (S2)

S2.2:  BranchNo=‘B003’ (S2)

S2.3:  BranchNo=‘B007’ (S2)

Fragment S2.2

StaffNo

LName

BranchNo

Fragment S1

Fragment S2.3

LName

BranchNo

FName


Derived horizontal fragmentation
Derived Horizontal Fragmentation

  • Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2.

  • It is not explicitly specified in design but implied from fragmentation of T2.

  • T1 (child) has a foreign key that belongs to T2 (parent).

  • Relationship between T1 and T2 either 1-to-1 or Many-to-1.

  • Use Semi-join operation: 


Derived horizontal fragmentation1
Derived Horizontal Fragmentation

  • You were required by the design to horizontally fragment Staff table.

    • S1:  BranchNo=‘B003’ (Staff)

    • S2:  BranchNo=‘B005’ (Staff)

    • S3:  BranchNo=‘B007’ (Staff)


Derived horizontal fragmentation2
Derived Horizontal Fragmentation

Fragment S1

Fragment S2

Fragment S3


Derived horizontal fragmentation3
Derived Horizontal Fragmentation

  • After we fragmented Staff, we found out that there is a table related to it, PropertyForRent.

  • Because Staff is now fragmented, it makes sense to fragment PropertyForRent too.

Staff

PropertyForRent

handles

1

N

S1:  BranchNo=‘B003’ (Staff)

S2:  BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si

S3:  BranchNo=‘B007’ (Staff)



Derived horizontal fragmentation4
Derived Horizontal Fragmentation

Fragment P1

Fragment P2

Fragment P3


Transparencies in a ddbms
Transparencies in a DDBMS

  • 4 main transparencies:

    • Distribution Transparency.

      • Fragmnetation.

      • Location.

      • Replication.

      • Local Mapping.

      • Naming.

    • Transaction Transparency.

    • Performance Transparency.

    • DBMS Transparency.


1 distribution transparency
1. Distribution Transparency

  • Allows the user to perceive the DB as a single, logical entity.

  • Types:

    • Fragmentation:the user does not need to know the data is fragmented.

    • Location: the user does not need to know the location of fragments.

    • Replication: the user does not need to know the fragments are replicated.

    • Local Mapping: the user specifies the fragment and its location.

    • Naming:DDBMS makes sure every item name is unique.

  • Consider the distribution of the STAFF relation:

    • S1: staffno,Position,sex,DOB, Salary(STAFF)

    • S2: staffno,fname,lname,BranchNo(STAFF)

    • S21:  BranchNo=‘B003’ (S2)

    • S22:  BranchNo=‘B005’ (S2)

    • S22:  BranchNo=‘B007’ (S2)


A fragmentation transparency
a. Fragmentation Transparency

  • Highest level of distribution transparency.

  • The user does not need to know that the data is fragmented.

  • User treats DDB like a centralized DB.

  • The database access are based on the global schema.

  • Fragmentation of the data can be changed without impacting the user.

  • Example:

    SELECTFname, Lname

    FROM Staff

    WHERE position = ‘Manager’;


B location transparency
b. Location Transparency

  • The middle level of distribution transparency.

  • The user must know that the data is fragmented but still does not need to know the location of the data.

  • Data location can be changed without impact on the user.

  • Example:

    SELECTFname, LnameFROMS21

    WHEREstaffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)

    UNION

    SELECTFname, LnameFROMS22

    WHEREstaffNo IN (SELECT staffNo FROM S1WHERE position=‘Manager’)

    UNION

    SELECTFname, LnameFROMS23

    WHEREstaffNo IN (SELECT staffNo FROM S1WHERE position=‘Manager’)


C replication transparency
c. Replication Transparency

  • User unaware of replication and location but knows that data is fragmented.

  • On the same level with location transparency.


D local mapping transparency
d. Local Mapping Transparency

  • The lowest level of distribution transparency.

  • The user knows that the data is fragmented and the location of the data.

    Example:

    SELECTFname, LnameFROMS21AT SITE 3

    WHEREstaffNo IN

    (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’)

    UNION

    SELECTFname, LnameFROMS22AT SITE 5

    WHEREstaffNo IN

    (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’)

    UNION

    SELECTFname, LnameFROMS23AT SITE 7

    WHEREstaffNo IN

    (SELECT staffNo FROM S1AT SITE 5 WHERE position=‘Manager’)


E naming transparency
e. Naming Transparency

  • Each item in distributed database must have a unique name.

  • DDBMS must ensure that no two sites violate that.

  • Solutions

    • Create a central name server.

      • Bottleneck.

      • against local autonomy.

    • Prefix an object with the identifier of the site.

      • loss of distribution transparency.


2 transaction transparency
2. Transaction Transparency

  • All transactions must ensure the consistency and integrity of the DDB.

  • Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions.

  • Even if transaction is split, atomicity has to be maintained.


3 performance transparency
3. Performance Transparency

  • DDBMS performs as if it were a centralized DBMS.

  • Should not suffer because it is distributed (network communication cost).

  • When a site issues a query, the system must figure out the fastest way of executing it.

  • Distributed Query Processor (DQP) must figure out:

    • Which fragment to access.

    • Which copy of fragment to access (if replication is used).

    • Where are the fragments.


3 performance transparency1
3. Performance Transparency

  • Consider the following distributed DB:

    • Property(PropertyNo, city) 10,000 records in London

    • Client(ClientNo, maxPrice) 100,000 records in Glasgow

    • Viewing(PropertNo, ClientNo) 1,000,000 records in London

  • London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000.

    SELECTp.propertyNo

    FROM Property P INNER JOIN

    (Client c INNER JOIN Viewing v ONc.clientNo = v.clientNo)

    ONp.propertyNo = v.propertyNo

    WHEREp.city = ‘Aberdeen’ AND

    c.maxprice > 200000;


3 performance transparency2
3. Performance Transparency

  • After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query.

  • Strategies:

  • Move Client table to London and process query there.

  • Move Property and Viewing relation to Glasgow and process query there then return result.

  • Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with maxPrice > 200,000 then return results.

  • Select clients at Glasgow with maxPrice> 200000, move them to London and join with viewing and Aberdeen property.


4 dbms transparency
4. DBMS Transparency

  • Hides the fact that different sites have different local DBMSs.

  • Heterogeneous DDBMSs.


Date s 12 rules for a ddbms
Date’s 12 Rules for a DDBMS

  • Local autonomy.

  • No reliance on a central site.

  • Continuous operation.

  • Location independence.

  • Fragmentation independence.

  • Replication independence.

  • Distributed query processing.

  • Distributed transaction processing.

  • Hardware independence.

  • Operating system independence.

  • Network independence.

  • Database independence.


ad