1 / 66

Distributed Database Systems

Distributed Database Systems. A Distributed Database on a Geographically Dispersed Network. A Distributed Database on a Local Network. A Multi-Processor System. Types of Accesses to a Distributed Database. Distributed Access Plan. At site 1 Send sites 2 and 3 the supplier number SN

mjarrell
Download Presentation

Distributed Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Database Systems

  2. A Distributed Database on a Geographically Dispersed Network

  3. A Distributed Database on a Local Network

  4. A Multi-Processor System

  5. Types of Accesses to a Distributed Database

  6. Distributed Access Plan • At site 1 Send sites 2 and 3 the supplier number SN 2) At sites 2 and 3 Execute in parallel, upon receipt of the supplier number, the following program: Find all PARTS records having SUP # = SN; Send result to site 1 3) At Site 1 Merge results from sites 2 and 3; Output the result.

  7. Components of a Commercial DDBMS

  8. Data Distribution Problem: Choose a unit of the logical database to use for assignment to data modules. Possibilities: Relations –Distribution issues will influence logical database design. Columns –Distribution issues will influence logical database design. Rows –Too many; Directories become too large. Data Items -Too many; Directories become too large.

  9. Fragments – Logically defined rectangular subsets of relations Fragment 1 Fragment 2 Fragment 3 Relation 1 Fragment 1 Fragment 2 Relation 2 Data Distribution

  10. Logical definition of fragments - Name Age $ Job-Title Supervisor Dept. Jones 35 32K Salesman Black A $ > 30K Fragment 1 $ < 30K Fragment 2 Fragment 3 Data Distribution

  11. Data Distribution Datamodules DM1 DM2 DM3 F1 F2 F3 F1 F2 Personnel Inventory Assignment of Fragments to Datamodules

  12. Advantages of fragments as units of distribution. Very flexible in size and definition. Distribution choices are largely independent of logical design. Data Distribution

  13. System Considerations • Reliable Network • Pipelining Logical Data Items Database Operations: Read Write Transactions: Read Set Write Set Atomic – “All or Nothing” Effect

  14. System Considerations (cont’d) Each site in the DDBMS has one or both of the following software modules: • Transaction Manager (TM) • Data Manager (DM) TM’s • Read, Parse, and Optimize user queries • Handle all interface with the user DM’s • Maintain physical database • Perform actual reads and writes

  15. Transaction TM DM Data Data Data Transaction DM TM Transaction Transaction TM DM TM’s communication only with DM’s DM’s communication only with TM’s System Considerations (cont’d)

  16. Transaction Execution TransactionTM’s Action. Begin Set up temporary workspace. Read (X) Select a DM which stores X, Send a message to this DM requesting X, Place X in workspace. Read (X) No Action necessary X is already in workspace. Write (X) Change the value of X. Read (X) No action necessary. End Send a pre-commit to each DM that stores a copy of X, Await acknowledgements, Send commit message

  17. Optimal File Allocation In A Distributed Database System • Given a number of computers that process common information files, how can we: • allocate the files optimally so that the allocation yields minimum overall operating costs (storage and communication)? • meet access time requirements for each file? • not exceed the storage capacity of each computer? Note: A File may be viewed as a segment.

  18. System Parameters • n Computers • m Files • Size of each file • Usage distribution for each file at each computer • Frequency of modification of each file at each computer during usage • Access time requirement for each file at each computer Storage capacity of each computer. Cost of storage per unit file length per computer. Cost of transmission per unit file length per second per pair of computers.

  19. Model COSTS Total Cost = Storage Costs + Transmission Costs TC = CS + CT Transmission Costs = Costs for Retrievals + Cost for Updates CT = CTR + CTU CONSTRAINTS • Each file must be stored in at least one computer. • The storage capacity of each computer must not be exceeded. • The probability of exceeding the required access time for each file must be less than a specified bound.

  20. Mathematical Representation Model

  21. Transmission Paths Between Each Pair of Computers

  22. Reliability Constraint Assuming processors and channels each have identical reliability, ap = availability of the processor ac = availability of the channel rj = # of redundant copies of the jth file Aj = Availability of the jth file Aj= ap [1 - (1 - acap)rj For example ap = 0.98, ac = 0.99, then Aj = 0.951 for rj = 1 Aj = 0.979 for rj = 2

  23. File Directory for Distributed Databases

  24. Legend High-Level Request Standard Database Call Physical Access Call Non-Local Request User Transaction DDBMS Transaction Manager Directory Manager To Other Nodes Database Manager Directory Fragment Database Overview of the Directory Manager

  25. Content of Directory • Global description • Fragmentation description • Allocation description • Mappings to local names • Access method description • Statistics on the database • Consistency information

  26. Content of a Directory System Security (File, User, C); C=Read/Write; Read Only; Write Only; Operation Compression ratio (Logical Operation Query Data Value); Query Access Optimizer; Statistical Data Gathering; Protocols Logical (Dynamic) File Status (R, W) Number of Backlog Jobs; Site Availability; Resource Requirement; Processing Cost; Communication Cost; Translation Cost; Physical (Static) Location (Site, Copy #, Disk, Page); Creator; Creation Date; Version of the File Size; Code Format; Date of Last Update;

  27. The Functional Objectives ofIntegrated Dictionary/Directory • To support the control of data resources • Maintaining data independence, security, and integrity • To support applications development • Offering standardized data definitions and usage characteristics • Established program entities, DDL • To provide independence of directory data elements • Different hardware and software environments • Changes in these environments

  28. Possible Data Types In IDD • Data names, definitions, formats and sizes. • Integrity constraints, authorization tables, and usage statistics for transaction management. • Schemas and sub-schemas. • Description of standardized transactions and reports. • Characteristics of hardware, such as processors, lines, and terminals. • Description of users. • The IDD must support the maintenance of relationships between various entities such as: • Associations between • Authorization tables and data, • Users and transactions • Reports • The IDD supplies version control

  29. Attribute Attribute Attribute Attribute Attribute Attribute Entity Entity Relationship Figure 1

  30. Comments Entity Created 820114 Social Security Number Entity Created 820519 Maximum Length 400 Characters Relationship Created 820708 Payroll Record Contains Length 9 Characters Figure 2

  31. SchemaModelLevel SchemaLevel DictionaryLevel Typical Entities, Relationships, and Attributes Typical Entity-Types, Relationship-Types,and Attribute-Types Typical Meta-Entity-Types Social-Security-Number Agency-Name Element Employee Record Payroll Record Entity-Type Record Form 1040 FIPS Guideline Document Payroll-Record-Contains-Employee-Name Relationship-Type Record-Contains-Element Length 9 Characters Attribute-Type Creator ADP Division Table 1

  32. Classes of Directory • Centralized Directory • Single Master Directory • Extended Centralized Directory • Multiple Master Directory • Local Directory • Distributed Directory

  33. Causes For Directory Update • Changing the description or structure of the user database. • Moving user database entities from one node to another. • Changing the description of a user or node. • Changing a user view. • Changing a network node’s status.

  34. Specific Drawbacks with Globally Replicated Directories • Additional remote activity to maintain directory coherence. • Difficulty of posting directory changes to a down site. • Difficulty of integrating a new site. • Storage of directory entries where they are not referenced. • Blurred responsibility for maintaining the directory.

  35. Performance Measure Operating Cost/Unit Time = Communication Cost (Query+Update) +Storage Cost + Code Translation Cost (Query+Update) Response Time

  36. Operating Cost for the Centralized Directory System

More Related