1 / 44

Chapter 10

Chapter 10. Distributed Database Management Systems. In this chapter, you will learn:. What a distributed database management system (DDBMS) is and what its components are How database implementation is affected by different levels of data and process distribution

Download Presentation

Chapter 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Chapter 10 Distributed Database Management Systems

  2. In this chapter, you will learn: • What a distributed database management system (DDBMS) is and what its components are • How database implementation is affected by different levels of data and process distribution • How transactions are managed in a distributed database environment • How database design is affected by the distributed database environment

  3. The Evolution of Distributed Database Management Systems • Centralized database required that corporate data be stored in a single central site • Centralized Database Management Problems: • Performance Degradation • High Costs • Reliability Problems • Dynamic business environment and centralized database’s shortcomings spawned a demand for applications based on data access from different sources at multiple locations • Distributed database management system (DDBMS) • Governs storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites

  4. DDBMS Advantages • Data are located near “greatest demand” site • Faster data access • Faster data processing • Growth facilitation • Improved communications • Reduced operating costs • User-friendly interface • Less danger of a single-point failure • Processor independence

  5. DDBMS Disadvantages • Complexity of management and control • Security • Lack of standards • Increased storage requirements • Greater difficulty in managing the data environment • Increased training cost

  6. Distributed Processing Environment

  7. Distributed Database Environment

  8. Characteristics of Distributed Management Systems • Application/end user interface • Validation • Transformation • Query optimization • Mapping • I/O interface • Formatting • Security • Backup and recovery • DB administration • Concurrency control • Transaction management

  9. Characteristics of Distributed Management Systems (continued) • Must perform all the functions of a centralized DBMS • Must handle all necessary functions imposed by the distribution of data and processing • Must perform these additional functions transparently to the end user

  10. A Fully Distributed Database Management System

  11. DDBMS Components • Must include (at least) the following components: • Computer workstations • Network hardware and software • Communications media • TP or Transaction Processor (or, application processor, or transaction manager) • Software component found in each computer that requests data • DP or Data processor or data manager • Software component residing on each computer that stores and retrieves data located at the site • May be a centralized DBMS

  12. Database Systems: Levels of Data and Process Distribution

  13. Single-Site Processing, Single-Site Data (SPSD) • All processing is done on single CPU or host computer (mainframe, midrange, or PC) • All data are stored on host computer’s local disk • Processing cannot be done on end user’s side of the system • Typical of most mainframe and midrange computer DBMSs • DBMS is located on the host computer, which is accessed by dumb terminals connected to it • Also typical of the first generation of single-user microcomputer databases

  14. Single-Site Processing, Single-Site Data (Centralized)

  15. Multiple-Site Processing, Single-Site Data (MPSD) • Multiple processes run on different computers sharing a single data repository • MPSD scenario requires a network file server running conventional applications that are accessed through a LAN • Many multi-user accounting applications, running under a personal computer network, fit such a description

  16. Multiple-Site Processing, Single-Site Data

  17. Multiple-Site Processing, Multiple-Site Data (MPMD) • Fully distributed database management system with support for multiple data processors and transaction processors at multiple sites • Classified as either homogeneous or heterogeneous • Homogeneous DDBMSs • Integrate only one type of centralized DBMS over a network • Heterogeneous DDBMSs • Integrate different types of centralized DBMSs over a network • Fully heterogeneous DDBMS • Support different DBMSs that may even support different data models (relational, hierarchical, or network) running under different computer systems, such as mainframes and microcomputers

  18. Heterogeneous Distributed Database Scenario

  19. Distributed Database Transparency Features • Allow end user to feel like database’s only user • Hides complexities of distributed database • Features include: • Distribution transparency • Transaction transparency • Failure transparency • Performance transparency • Heterogeneity transparency

  20. Distribution Transparency • Allows management of a physically dispersed database as though it were a centralized database • Three levels of distribution transparency are recognized: • Fragmentation transparency • Location transparency • Local mapping transparency

  21. Distribution Transparency • Distribution transparency is supported by • DDD - Distributed Data Dictionary or a • DDC - Distributed Data Catalog • The DDC contains the description of the entire database as seen by the database administrator. • The database description, known as the distributed global schema, is the common database schema used by local TPs to translate user requests into subqueries.

  22. Transaction Transparency • Ensures database transactions will maintain distributed database’s integrity and consistency • Completed only if all involved database sites complete their part of the transaction • Management mechanisms • Remote request • Remote transaction • Distributed transaction • Distributed request

  23. A Remote Request

  24. A Remote Transaction

  25. A Distributed Transaction

  26. A Distributed Request

  27. Another Distributed Request

  28. Distributed Concurrency Control • Multisite, multiple-process operations more likely to create data inconsistencies and deadlocked transactions • Problems • Transaction committed by local DP • One DP could not commit transaction’s result • Yields inconsistent database Effect of a premature COMMIT

  29. Two-Phase Commit Protocol • Two-Phase Commit Protocol • The two-phase commit protocol guarantees that, if a portion of a transaction operation cannot be committed, all changes made at the other sites participating in the transaction will be undone to maintain a consistent database state. • Each DP maintains its own transaction log. The two-phase protocol requires that each individual DP’s transaction log entry be written before the database fragment is actually updated. • The two-phase commit protocol requires a DO-UNDO-REDO protocol and a write-ahead protocol.

  30. Two-Phase Commit Protocol • DO-UNDO-REDO protocol • Write-ahead protocol • Two kinds of nodes • Coordinator • Subordinates • Phases • Preparation • Coordinator sends message to all subordinates • Confirms all are ready to commit or abort • Final Commit • Ensures all subordinates have committed or aborted

  31. Performance Transparency and Query Optimization • Objective of query optimization routine is to minimize total cost associated with the execution of a request • Costs associated with a request are a function of the: • Access time (I/O) cost • Communication cost • CPU time cost

  32. Performance Transparency and Query Optimization (continued) • Must provide distribution transparency as well as replica transparency • Replica transparency: • DDBMS’s ability to hide the existence of multiple copies of data from the user • Query optimization techniques: • Manual or automatic • Static or dynamic • Statistically based or rule-based algorithms

  33. Distributed Database Design • Data fragmentation: • How to partition the database into fragments • Data replication: • Which fragments to replicate • Data allocation: • Where to locate those fragments and replicas

  34. Data Fragmentation Strategies • Horizontal fragmentation: • Division of a relation into subsets (fragments) of tuples (rows) • Vertical fragmentation: • Division of a relation into attribute (column) subsets • Mixed fragmentation: • Combination of horizontal and vertical strategies

  35. Data Replication • Storage of data copies at multiple sites served by a computer network • Fragment copies can be stored at several sites to serve specific information requirements • Can enhance data availability and response time • Can help to reduce communication and total query costs • Mutual Consistency Rule • Requires that all copies of data fragments be identical. • DDBMS must ensure that a database update is performed at all sites where replicas exist. • Data replication imposes additional DDBMS processing overhead.

  36. Replication Scenarios • Fullyreplicated database: • Stores multiple copies of each database fragment at multiple sites • Can be impractical due to amount of overhead • Partially replicated database: • Stores multiple copies of some database fragments at multiple sites • Most DDBMSs are able to handle the partially replicated database well • Unreplicated database: • Stores each database fragment at a single site • No duplicate database fragments • Factors for Data Replication Decision • Database Size • Usage Frequency • Costs

  37. Types of Data Replication • Synchronous • Asynchronous • Snapshot Replication • Changes are periodically sent to a master site which sends an updated snapshot out to the other sites. • Near Real-Time Replication • Broadcast update orders without requiring confirmation. • Pull Replication • Each site controls when it wants updates.

  38. Advantages Reliability. Fast response. May avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals) De-couples nodes (transactions proceed even if some nodes are down.) Disadvantages - Additional storage space. Additional time for update operations. Complexity and cost of updating. Integrity exposure of getting incorrect data if replicated data is not updated simultaneously. Data Replication Volatile Vs Non-Volatile Data

  39. Data Allocation • Deciding where to locate data • Allocation strategies: • Centralized data allocation • Entire database is stored at one site • Partitioned data allocation • Database is divided into several disjointed parts (fragments) and stored at several sites • Replicated data allocation • Copies of one or more database fragments are stored at several sites • Data distribution over a computer network is achieved through data partition, data replication, or a combination of both

  40. Client/Server vs. DDBMS • Way in which computers interact to form a system • Features a user of resources, or a client, and a provider of resources, or a server • Can be used to implement a DBMS in which the client is the TP and the server is the DP

  41. Client/Server Computing • Key to client server is where request processing takes place • Classification • 2-Tier • 3-Tier • 4-Tier • N-Tier • Extent of sharing processing • Thin client • Thin server • Fat client • Fat server

  42. Client/Server Advantages • Less expensive than alternate minicomputer or mainframe solutions • Allow end user to use microcomputer’s GUI, thereby improving functionality and simplicity • More people with PC skills than with mainframe skills in the job market • PC is well established in the workplace • Numerous data analysis and query tools exist to facilitate interaction with DBMSs available in the PC market • Considerable cost advantage to offloading applications development from the mainframe to powerful PCs

  43. Client/Server Disadvantages • Creates a more complex environment, in which different platforms (LANs, operating systems, and so on) are often difficult to manage • An increase in the number of users and processing sites often paves the way for security problems • Possible to spread data access to a much wider circle of users increases demand for people with broad knowledge of computers and software increases burden of training and cost of maintaining the environment

  44. C. J. Date’s Twelve Commandments for Distributed Databases • Local site independence • Central site independence • Failure independence • Location transparency • Fragmentation transparency • Replication transparency • Distributed query processing • Distributed transaction processing • Hardware independence • Operating system independence • Network independence • Database independence

More Related