1 / 30

Distributed databases

Distributed databases. 3. Outline. generalities objectives problems. 1. application. application. server. server. server. application. communication network. application. application. application. application. Introduction. DBMS in its own right. Introduction.

Download Presentation

Distributed databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed databases 3

  2. Outline • generalities • objectives • problems

  3. 1

  4. application application server server server application communication network application application application application Introduction DBMS in its own right

  5. Introduction • distributed database = collection of connected sites • each site is a DB in its own right (1) • has its own DBMS and its own users • operations can be performed locally as if the DB was not distributed • the sites collaborate (transparently from the user’s point of view) • the union of all DBs = the DB of the whole organisation (institution) • (oppose to (1)) • physical or logical distribution • strict homogeneity (assumption)

  6. Motivation • advantages • matches the structure of the organisation • example • efficiency of processing • stored closely to where it is being used • increased accessibility • remote DBs can be accessed • disadvantage • complexity

  7. Implementations (systems) • commercial • ORACLE (Oracle Corporation) • INGRES/STAR (Ask Group Inc. Ingres Division) • DB2 (IBM) • they all provide some sort of features for distributed databases

  8. Fundamental principle • a distributed DB system should look to the user exactly as a non-distributed DB system

  9. 2

  10. Objectives • local autonomy • no reliance on central site • location independence • fragmentation independence • replication independence • distributed query processing • distributed transaction management

  11. Objectives are: • not independent from each other • not exhaustive • sometimes contradicting • different degree of importance (for the user)

  12. Local autonomy • all operations at a certain site are fully controlled by that site • not achievable (why?) • therefore, autonomy should be achieved to the maximum extent possible • local data is locally owned and managed • local data belongs to the local server even if it is accessible from other servers • security, integrity, ..., are in the responsibility of the local server

  13. No reliance on a central site • reasons • bottle-neck • vulnerability • conclusion • all sites must be equal

  14. Location independence • users should not have to know where data is physically stored • why do you think this is needed? • think of application programs • what does this objective look like?

  15. Data fragmentation • data fragmentation • if a relation can be divided into “fragments” for storing purposes • motivation: performance - data is stored where it is mostly used • definition • fragment = any subrelation derivable via restriction or projection

  16. Data fragmentation - example FRAGMENT Emp INTO Lo_Emp AT SITE ‘London’ WHERE Dept_id = ‘Sales’ Le_Emp AT SITE ‘Leeds’ WHERE Dept_id = ‘Dev’ ;

  17. Fragmentation independence / transparency • users should perceive data as if it were not fragmented • why? • it is the optimiser’s responsibility to determine which fragments need to be physically accessed • similar to views • retrieving • updating (JOIN and UNION views)

  18. Data replication • copies of the same fragment can exist at different sites • reasons • better availability • better performance • disadvantage • update propagation

  19. Replication independence / transparency • users should not have to be aware of data replication • it is the optimiser’s responsibility to choose which replica to use • commercial systems • not full support for replication independence (update problems) - primary copy

  20. Distributed query processing • the system must have set level operators • one record at a time - too many messages (traffic) • relational - indicated • optimisation • particularly relevant! • find best way to move data across the network

  21. 3

  22. Problems • aim • minimise network utilisation • occur • due to network utilisation query processing catalogue management update propagation recovery control concurrency control

  23. Query processing • in a distributed environment • query execution is distributed • query optimisation is distributed • global optimisation • local optimisation • example • query on relation R issued at site X • part of R, say Ry, stored at Y • part of R, say Rz, stored at Z • where is the query going to be executed?

  24. Catalogue management • what ‘other’ data does the catalog include? • fragmentation, replication ... • where should the catalogue be stored • centralised • fully replicated • loss of autonomy - update propagation! • partitioned • non local operations - very expensive! • combination of first and third

  25. Central Catalogue • all updates, including local updates, have to be recorded in the central catalogue • disadvantages: • bottleneck • conflicts with the “no reliance on a central site” objective

  26. Fully Replicated Catalogue • the entire database catalogue (not only the local one) is stored at each site • every time an update is made, it has to be recorded at each site • disadvantages • loss of local autonomy • time and network traffic consuming updates

  27. Update propagation • problems because of replication • data might become less available • primary copy scheme • one copy is designated primary copy (unique) • primary copies exist at different sites (distributed) • an update is logically complete if the primary copy has been updated • the site holding the primary copy would have to propagate the updates • violation of local autonomy

  28. Concurrency control • locking • overhead - increased number of messages • primary copy strategy • locking only the primary copy • the primary copy’s site will propagate the update • loss of autonomy (severely) • global deadlock • two interlocked (waiting for each other) sites • cannot be detected using the wait-for graph - therefore, communication overhead

  29. Conclusion • generalities • objectives – in brief • problems – in brief

More Related