1 / 44

Distributed Database Management Systems

Distributed Database Management Systems. Database Management System (DBMS). Collection of interrelated data and set of programs to access the data Convenient and efficient processing of data Database Application Software. Abstraction. View level : different perspectives

clayv
Download Presentation

Distributed Database Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Database Management Systems

  2. Database Management System (DBMS) • Collection of • interrelated data and • set of programs to access the data • Convenient and efficient processing of data • Database Application Software CSCE 824

  3. Abstraction • View level: different perspectives • application programs hide irrelevant data • Logical level: data models • Logical representation of data • Different approaches: relational, hierarchical, network, object oriented, semi-structured, etc. • Data independence principle • Physical level: how data is stored CSCE 824

  4. Motivation for DBMS • Integrate related data • Provide centralized and controlled access to data • Guarantee database properties • Reduce database development efforts CSCE 824

  5. Data Models • A collection of tools for describing • Data • Relationships among data items • Semantics of stored data • Database constraints • Entity-Relational Model • UML • Etc. CSCE 824

  6. Database Management Systems • Smaller and smaller systems • Past: large and expensive DBMS • Present: DBMS in most personal computers • More and more data stored • Past: few MB • Present: terabyte (1012 bytes), petabyte (1015 bytes) • Functionality: from physical to view level • Optimization CSCE 824

  7. Data Definition Language (DDL) • Defines the database schema and constraints • DDL compiler data dictionary • Metadata – data about data CSCE 824

  8. Modeling data semantics CSCE 824

  9. Example E/R Diagram Breed Name Name License # Dog Boards Kennel Age Address Owns Phone Weight Pays Owner Name Phone CSCE 824

  10. Converting ER Model into Relations Dog CSCE 824

  11. Data Manipulation Language (DML) • Accessing and manipulating the data • Query Languages • Procedural – user specifies what data is required and how to get those data • Nonprocedural – user specifies what data is required without specifying how to get those data CSCE 824

  12. Query Languages • Relational Algebra • Set operations • SQL • Bag operations CSCE 824

  13. Structured Query LanguageSQL • Typical SQL query form:SELECT A1, A2, ..., AnFROMr1, r2, ..., rmWHERE C • Ais represent attributes to be returned • ris represent relations • C is a condition CSCE 824

  14. Computer Network • Distributed processing: • Number of autonomous processing elements that are interconnected by computer network • Cooperate to perform their assigned tasks CSCE 824

  15. Why to distribute? • Intuition • Reliability • Performance CSCE 824

  16. What to distribute? • Processing logic/element • Functions • Data • Control of execution CSCE 824

  17. Distributed Database Systems • Distributed database: • Collection of multiple, logically interrelated databases that are distributed over a computer network • Distributed DBMS: software system that • Permits the management of the distributed database and • Makes the distribution transparent to the user CSCE 824

  18. DDBMS Services • Transparent data management • Distributed, replicated data • Transparency: network, replica, fragmentation • Reliable access to data • Distributed transactions • Failure atomicity • Improved performance • Flexible expansion CSCE 824

  19. Difficulties • Everything that is present in traditional DBs • Fragmentation and replica control • Data retrieval • Data update • Dealing with failures • Synchronization CSCE 824

  20. Distributed Database Design CSCE 824

  21. Design Issues • Placing of data and programs (DBMS and application) • Network issues CSCE 824

  22. Level of Sharing • No sharing • Data sharing • Data and program sharing Heterogeneous environment! CSCE 824

  23. Replicated Databases

  24. Why Replication? • System availability • Performance • Scalability • Application requirements CSCE 824 - Spring 2011

  25. Risk of Replication • Worse performance: updates must be applied to all replicas and synchronized • Worse availability: some algorithms require multiple replicas to be operational for any of them to be used CSCE 824 - Spring 2011

  26. Replicated Databases • Multiple copies of the same data items (databases) • Consistency: • Local consistency • Mutual consistency CSCE 824 - Spring 2011

  27. Transaction Correctness • 2-Phase Locking – serializability • 2-Phase Commit – reliability • Replica control – mutual consistency • Database design: local vs. global transactions • Database consistency: strong consistency vs. weak consistency • Location of updates: master vs. distributed • Update propagation: eager vs. lazy • Degree of transparency: limited vs. full CSCE 824 - Spring 2011

  28. Mutual Consistency vs. Transaction Consistency • Transaction consistency: global serializability • Mutual consistency: replicas having the same values • Strong: all replicas have the same value at the end of the execution of an update transaction • Quorum: a quorum of replicas have the same value • Weak: eventually the values of all replicas become identical CSCE 824 - Spring 2011

  29. Replica Control • Hides replication from transaction • Knows location of all replicas • Translates transaction’s request to access an item into request to access particular replica(s) • Maintains some form of mutual consistency CSCE 824 - Spring 2011 29

  30. One-Copy Serializability (1SR) • Extension of the serializability theory • Effects of transactions on replicated data items should be the same as if they had been performed one at-a-time on a single set of date items CSCE 824 - Spring 2011

  31. x1 Transaction x2 x3 Example Replication Issues • May reduce performance (complex operations) • Too expensive • Can’t control when replicas are updated CSCE 824 - Spring 2011 31 7/22/99

  32. Replica Control • Pessimistic replica control: at most one group can make an update – mutual consistency at all times • Optimistic replica control: system must be available at all times. Correct if there is any violation of mutual consistency CSCE 824 - Spring 2011

  33. Read One / Write All Replica Control • Pessimistic approach • Read the nearest replica • Write all replicas • Synchronous : before transaction commits • Asynchronous case: eventually • Advantage: • Mutual consistency • Performance benefits: reads transactions • Disadvantage: availability is not always guaranteed • E.g., Primary site approach CSCE 824 - Spring 2011 33

  34. Primary Site – static • Primary site: most recent copy • What happens if the network is partitioned? 2 DB0 1 Primary DB3 DB1 DB2 DB6 DB5 DB4 CSCE 824 - Spring 2011

  35. Majority Approach • The group that contains the majority of the sites can process an update DB0 1 DB3 DB1 DB2 DB6 DB5 DB4 CSCE 824 - Spring 2011

  36. Majority Approach • The group that contains the majority of the sites can process an update 2 DB0 (N+1)/2 1 DB3 DB1 DB2 DB6 DB5 DB4 Farkas CSCE 824 - Spring 2011 CSCE 824 - Spring 2011 36

  37. Majority Approach • Advantages: more flexible than primary site • Disadvantages: zero availability may still happen • Who has the most recent copy? • Version number: • Each site assigns a version number to the copy (initially VN=0) • After an update, the VN is incremented by 1 CSCE 824 - Spring 2011

  38. Quorum Consensus • Each sites are not equal • Special case of majority approach W=5 DB0 W=3 W=2 DB3 DB1 W=1 W=1 DB2 DB6 DB5 DB4 W=1 W=15 CSCE 824 - Spring 2011

  39. Other Approaches • Dynamic Linear: order sites linearly to calculate majority • Token-based primary site (moving token): change the location of the primary site CSCE 824 - Spring 2011

  40. Pessimistic Replica Control • Advantages: • Mutual consistency at all times • Know the latest version ( between two consecutive updates, there is a site in common) • Disadvantage: • May result in zero availability CSCE 824 - Spring 2011

  41. Optimistic Replica Control • Goal: availability at all time • Issues: consistency may not be guaranteed • Need an algorithm to detect whether an inconsistency occurred • Take actions to fix any inconsistencies CSCE 824 - Spring 2011

  42. Example Optimistic Alg. • Two partitions P1, P2 • Assumption: separately, P1 and P2 produces serializable histories • Need: after P1 and P2 joins again: Detect which transactions violate global serializability CSCE 824 - Spring 2011

  43. Summary • Correctness: If the transactions are ACID, local execution in serializable, distributed transactions are reliable, and update replication is synchronous then distributed transactions are globally atomic & serializable • Performance: • Applications: transactions are not always serializable (e.g., WS-transactions) • Replication: update propagation is not always asynchronous • Compensating transactions CSCE 824 - Spring 2011

  44. Next Class Hadoop Architecture A read-only replication CSCE 824 - Spring 2011

More Related