1 / 40

Distributed Databases

Distributed Databases. Chapter 13. Objectives. Define key terms in the distributed database area Distributed vs. Decentralized Database Homogenous vs. Heterogeneous Decentralized Database Location transparency vs. local autonomy Asynchronous vs. Synchronous distributed databases

avedis
Download Presentation

Distributed Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Databases Chapter 13

  2. Objectives • Define key terms in the distributed database area • Distributed vs. Decentralized Database • Homogenous vs. Heterogeneous Decentralized Database • Location transparency vs. local autonomy • Asynchronous vs. Synchronous distributed databases • Horizontal vs. Vertical partitioning • Full refresh vs. differential refresh • Push replication vs. Pull replication • Local transaction vs. Global Transaction • Explain business conditions driving distributed databases

  3. Objectives • Describe salient characteristics of distributed database environments • Explain advantages and risks of distributed databases • Explain strategies and options for distributed database design • Discuss synchronous and asynchronous data replication and partitioning • Discuss optimized query processing in distributed databases

  4. Distributed vs. Decentralized Database Both are stored on computers in multiple locations • Distributed Database • Geographical distribution of a SINGLEdatabase • Decentralized Database • A collection of independent databases on non-networked computers • Users at various sites cannot share data

  5. Distributed Database • Require multiple DBMS running at remote sites • There are different types of distributed database environments • The degree to which these DBMS cooperate • Having a master site to coordinate requests involving data from multiple sites

  6. Reasons for Distributed Database • Distribution and Autonomy of Business Units • Departments/Facilities are geographically distributed • Each has the authority to create and control own data • Business mergers create this environment • Data sharing • Consolidate data across local databases on demand. • Data communication costs and reliability • Economical and reliable to locate data where needed. • High cost for remote transactions / large data volumes • Dependence on data communications can be risky

  7. Reasons for Distributed Database • Multiple application vendor environment • Each unit may have different vendor applications • A distributed DBMS can provide functionality that cuts across separate applications • Database recovery • Replicating data on separate computers may ensure that a damaged database can be quickly recovered

  8. Homogeneous vs. Heterogeneous Distributed Database • Homogeneous Distributed Database - • The same DBMS is used at each node • Difficult for most organizations to force a homogeneous environment • Heterogeneous Distributed Database • Potentially different DBMS are used at each node • Much more difficult to manage

  9. Typical Homogeneous Environment • Data distributed across all the nodes. • Same DBMS at each node. • A central DBMS coordinates database access and update across the notes • No exclusively local data • All access is through one, global schema. • The global schema is the union of all the local schema.

  10. Identical DBMSs Figure 13-2 – Homogeneous Database Everyone is a GLOBAL user Source: adapted from Bell and Grimson, 1992.

  11. Typical Heterogeneous Environment • Data distributed across all the nodes. • Different DBMSs may be used at each node. • Local access is done using the local DBMS and schema. • Remote access is done using the global schema.

  12. Local user accesses his own data Non-identical DBMSs Figure 13-3 –Typical Heterogeneous Environment Source: adapted from Bell and Grimson, 1992.

  13. Major Objectives of Distributed Database Allow users to share data yet be able to operate independently when network link fails. • Location Transparency • User does not have to know the location of the data • Data requests automatically forwarded to appropriate sites • Local Autonomy • Local site can operate with its database when network connections fail • Each site controls its own data, security, logging, recovery

  14. Trade-Offs in Distributed Database When do you update data across the database? • Synchronous Distributed Database • All copies of the same data are always identical • Updates apply immediately to all copies throughout network • Good for data integrity • High overhead  slow response times • Asynchronous Distributed Database • Some data inconsistency is tolerated • Data update propagation is delayed • Lower data integrity • Less overhead  faster response time NOTE: all this assumes replicated data (to be discussed later)

  15. Advantages of Distributed Database • Increased reliability and availability • Even when a component fails the database may continue to function albeit at a reduced level • Allow Local control over data. • Local control promotes data integrity and administration • Modular growth • Easy to add a connection to a new location • Less chance of disrupting existing users during expansion • Lower communication costs. • Faster response for certain queries. • Query local data • Parallel queries

  16. Disadvantages of Distributed Database • Software cost and complexity. • Processing overhead. • Data integrity exposure. • Slower response for certain queries. • If data are not distributed properly, according to their usage, or if queries are not formulated correctly, queries can be extremely slow

  17. Options for Distributing a Database • Data Replication • Horizontal Partitioning • Vertical Partitioning • Combinations of the above

  18. Data Replication • Advantages • Reliability – if one node fails, you can find data at another node • Fast response at sites that have a full copy • May avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals.) • De-couples nodes -transactions proceed even if some nodes are down. • Reduced network traffic at prime time, if updates can be delayed to non-primetime hours

  19. Data Replication • Disadvantages - • Storage requirements • Complexity and cost of updating. • Integrity exposure of getting incorrect data if replicated data is not updated simultaneously.

  20. Data Replication • Best for non-volatile/static, non-collaborative data • Catalogs • Telephone directories • Train Schedules • Not good for on-line applications • Airline reservations • ATM transactions

  21. Types of Data Replication • Push Replication • Updating site sends changes to other sites • Pull Replication • Receiving sites control when update messages will be processed

  22. Types of Push Replication • Snapshot Replication • Changes periodically sent to master site • Master collects updates in log • Near Real-Time Replication • Broadcast update orders without requiring confirmation • Update messages stored in message queue until processed by receiving site

  23. Issues in Data Replication Use • Data timeliness – high tolerance for out-of-date data may be required • DBMS capabilities – if DBMS cannot support multi-node queries, replication may be necessary • Performance implications – refreshing may cause performance problems for busy nodes • Network heterogeneity – complicates replication • Networkcommunication capabilities – complete refreshes place heavy demand on telecommunications

  24. Horizontal Partitioning • Different rows of a table at different sites • Advantages - • Data stored close to where it is used  efficiency • Local access optimization  better performance • Only relevant data is available  security • Unions across partitions  ease of query • Disadvantages • Accessing data across partitions  inconsistent access speed • If no data replication  backup vulnerability

  25. Vertical Partitioning • Different columns of a table at different sites • Advantages and disadvantages are the same as for horizontal partitioning except that combining data across partitions is more difficult because it requires joins (instead of unions)

  26. Five Distributed Database Organizations Based on the prior sections, a distributed database can be organized in five unique ways: • Totally centralized at one location, accessed from many geographical sites • Replication with periodic snapshot update • Replication with near real-time synchronization of updates • Partitioned, one logical database • Partitioned, independent, nonintegrated segments.

  27. Factors in Choice of Distributed Strategy No approach to data distribution is ALWAYS best • Choice depends on • Funding, autonomy, security. • Site data referencing patterns. • Growth and expansion needs. • Technological capabilities. • Costs of managing complex technologies. • Need for reliable service.

  28. Table 13-1: Distributed Design Strategies

  29. Distributed DBMS • Distributed database requires distributed DBMS • Functions of a distributed DBMS: • Locate data with a distributed data dictionary • Determine location from which to retrieve data and process query components • DBMS translation between nodes with different local DBMSs (handle heterogeneous DBMS translation using middleware) • Data consistency (via multiphase commit protocols) • Global primary key control • Scalability • Security, concurrency, query optimization, failure recovery

  30. Distributed DBMS Data Reference • Local Transaction - references local data. • Global Transaction - references non-local data.

  31. Distributed DBMS Architecture

  32. Distributed DBMS Transparency Objectives • Location Transparency • User/application does not need to know where data resides • Replication Transparency • User/application does not need to know about duplication • Failure Transparency • Either all of the actions of a transaction are committed or else none of them is committed. • If a transaction fails at one site it don’t commit at other sites • A system should detect a failure (broken communication link, erroneous data, disk head crash), reconfigure the system and recover • Each site has a transaction manager • Logs transactions and before and after images • Requires special commit protocol

  33. Failure Transparency Two-Phase Commit • Commit Protocol: Ensures that a global transaction is either successfully completed at each site or else aborted. • Two-Phase Commit • Prepare Phase: Check if operation ok at all participating sites • Commit Phase: Only if all participating sites agree, do you issue the commite

  34. Distributed DBMS Transparency Objectives • Concurrency Transparency • Allow multiple users to run transactions concurrently, with each transaction appears as if it is the only activity in the system • Timestamping • Ensure that even if two events occur simultaneously at different sites, each will have a unique timestamp. • Alternative to locks in distributed databases

  35. Query Optimization • In a query involving a multi-site join and, possibly, a distributed database with replicated files, the distributed DBMS must decide where to access the data and how to proceed with the join. Three step process: • Query decomposition - rewritten and simplified • Data localization - query fragmented so that fragments reference data at only one site. • Global optimization - • Order in which to execute query fragments. • Data movement between sites. • Where parts of the query will be executed.

  36. Query Optimization List the supplier numbers for Cleveland suppliers of red parts: SELECT Supplier. SupplierNum FROM Supplier, Shipment, Part WHERE Supplier.City = ‘Cleveland’ AND Shipment.PartNum = Part.PartNum AND Supplier.SupplierNum = Shipment.SupplierNum AND Part.Color = ‘Red’

  37. Query Optimization

  38. Evolution of Distributed DBMS • “Unit of Work” • All of a transaction’s instructions. • Remote Unit of Work • SQL statements originated at one location can be executed as a single unit of work on a single remote DBMS.

  39. Evolution of Distributed DBMS • Distributed Unit of Work • Different statements in a unit of work may refer to different remote sites. • All databases in a single SQL statement must be at a single site. • Distributed Request • A single SQL statement may refer to tables in more than one remote site. • Supports location transparency

  40. Distributed DBMS Vendors • Oracle • Microsoft • Informix • Sybase • IBM • Computer Associates • Ingress • Others……

More Related