1 / 22

Distributed Database Management System (DDBS)

Distributed Database Management System (DDBS). Motivation : Data is used at multiple distributed sites (e.g. Branch offices). Communication between sites is ----- costly ----- potentially unreliable Solution :

juana
Download Presentation

Distributed Database Management System (DDBS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Database Management System (DDBS) Motivation: Data is used at multiple distributed sites (e.g. Branch offices). Communication between sites is ----- costly ----- potentially unreliable Solution: ----- Allow sites to store/maintain the data they use most often/specialize in ----- Sharing with other sites/HQs if combinations of data necessary

  2. A A B B C F C E F D E D A A B F C B E D E C D F A B C F E D Network Topology Partially connected network Fully connected network Star network Tree structured network Cost/reliability #of hops Ring network

  3. Tradeoffs between • keeping data in centralized headquarters: • simpler maintenance • simpler consistency enforcement • possibly more efficient if many updates,aggregate computations 2. or distributed across branch offices: • lower communication cost • reliability • parallelism can be implemented locally

  4. Advantages of DDBS (heterogeneous) • Interconnectivity of pre-existing DBs • Expandability (don’t need to replace whole system to grow) • Cost (many small engines on PC’s cheaper than mainframes) issue:communication costs vs. hardware computation costs. • Performance(place data near where used) • Availabilityand reliability

  5. Complicating factors • Maintaining data consistency (in face of replication and sharing) • Distributed directory management (who controls mapping of data to sites) • Security • Heterogeneous Databases different database architectures

  6. Distributed Database Design Issues Options for storing a relation R across multiple sites: • Replication (maintain copies/replicas of R on multiple sites) • Fragmentation (Relation store in fragments/ pieces on multiple sites) • combination of both

  7. site2 site1 site3 site4 R1 Copy of R1 Copy of R1 REPLICATION Copy of R2 Copy of R2 R2 1 2 3 1/3 of R1 (Horizontal) 4 5 6 1/3 of R1 FRAGMENTATION 7 8 9 1/3 of R1 C/D A/B ½ Of R2 ½ Of R2 (Vertical)

  8. Replication Non redundant allocation • Issues: (whole database replication v.s. no replication) - what to replicate?(all relations or only frequently user shared data) - where to replicate? (function of communication costs,usage needs,resources) - which relations to replicate? -” primary copy” of relation (simplifies consistency enforcement, but where located?)

  9. Replication (cont) • Advantages: • Improved availability (multiple sources for a relation if a site is down) • Increased parallelism(sites can process (primarily) read- only operations in parallel,minimizing data transfer) • (well suited for read-only, majority read-only data access)

  10. Disadvantages: problems/overhead for writes/updates costs of consistency enforcement - updates propagated to all sites (communication costs) - costs of synchronization/locking for consistency enforcement on update greater than in single source models. Complicates concurrency and recover Replication inefficient in databases with frequent updates Replication (cont)

  11. FRAGMENTATION • Vertical • Horizontal • mixel Issues: - completeness: Every tuple/attribute in some fragment - reconstruction:easy way of reconstructing full relation - transparency

  12. A1 A2 ………. An T1 T2 T3 . .T60 A1 A2 ………. An T1 T2 T3 . .T60 T61 . . Tn 1 1 1 2 2 3 3 3 Site 1 A1 A2 ………. An T61 . . Tn HORIZONTAL FRAGMENTATION Original relation Site 2 • Fragments contain subsets of complete tuples (all attributes at all sites) • How to reconstruct • R=Rs1 Rs2 ……. Rsn

  13. Horizontal Fragmentation • Example Usefulness: - Each branch office maintains complete attribute set of its employees (salary,benefits,address/phone,departments,projects,etc.) - Site of Fragment easily determined by a key attribute value -e.g. Branch_office*

  14. How to Reconstruct: R=Rs1 Rs2 Rsn TID –Tuple ID Hidden Attribute to ensure account and simple join reconstruction TID TID 1 2 n 1 2 n RS1.TID=RS2.TID Join condition VERTICAL FRAGMENTATION A1 A2 A3 A4 Original Relation (R) t1 t2 tn A1 A2 A3 A4 RS2 RS1 t1 t2 tn t1 t2 tn SITE1 SITE2

  15. VERTICAL FRAGMENTATION Example usefulness: Salary Office Benefits Office Directory (Name|address|phone|fax) Dependents Management Office each control their own appropriate attribute for all corporate branch offices VERTICAL–Attribute-centered management (keep all instances of an attribute in one place) HORIZONTAL – tuple/individual-centered management (keep all values of a tuple in one place)

  16. MIXED FRAGMENTATION Rs1 A4 A5 A1 A2 A3 Rs3 usa R A1 A2 A3 A4 A5 Rs2 A1 A2 A3 Europe A4 A5 (Salary Attributes) (Benefit Attributes) Rs4

  17. REPLICATION and FRAGMENTATION Partition of Attributes/tuples need not be disjoint A1 A2 A3 A4 A5 A1 A2 A3 A4 A2 A3 A4 A5 Overlap (replication of attributes)

  18. Unique name TRANSPARENCY Fragmentation Transparency -User doesn’t need to know mapping between relations and fragmented subrelations Replication Transparency -User doesn’t need to know about existence or location of other copies (treat as if single copy of DB) Location and Naming Transparency -Use shouldn’t need to know about location and full names of data on the server Salary(ssn=so(Employee)) Site27, Employee. Fragment3. Replica7 Name Server Proper site, Fragment, replica for this data access

  19. =Emp1 U Emp2 QUERY PROCESSING IN DDMS Horizontal fragmentations Issues1: Parallel Processing across Fragments LName(salary>40,000(Employee)) LName(salary>40,000(Emp1))ULName(salary>40,000(Emp2)) 2 Fragments Site 1 Site 2 Execution in Parallel on fragments and union results together

  20. Site1 Site2 Site3 50K 1K 3K (A B) C A (B C) Joins- symmetric and associative Parallel Processing (xx(A)) (B C) 1K 3K 0.5K 50K 0.5K 0.5K

  21. QUERY PROCESSING IN DDBS Join Strategies R=Fnames, Cnames, Dnames(Employee Department) Strategies: 1)Ship both relations to the result site and join there 2)Ship employee to 2, join at 2, results to 3 3)Ship Department to 1, join at 1, results to 3  minimize total communication cost of data transfer Site 3 100 records, 2000 bytes Site 1 10,000 records, 1,000,000 bytes Mg rssn to ssn Site 2 100 records, 3000 bytes 1,003,000 bytes transfered 1,002,000 bytes transfered 5,000 bytes transfered

  22. RECOVERY IN DDBS -transaction managers / coordinators -log managers Problems: -failure of site -failure of link -loss of messages if server is down, elect new server what about network partitioning? Difficult to know which had occurred Original Server Newly elected Server Server’s link

More Related