ics 214b transaction processing and distributed data management n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ICS 214B: Transaction Processing and Distributed Data Management PowerPoint Presentation
Download Presentation
ICS 214B: Transaction Processing and Distributed Data Management

Loading in 2 Seconds...

play fullscreen
1 / 56

ICS 214B: Transaction Processing and Distributed Data Management - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

ICS 214B: Transaction Processing and Distributed Data Management. Lecture 8: Distributed Database Systems Professor Chen Li. Application SQL Front End Query Processor Transaction Proc. File Access. So far: Centralized DB systems. P. Software:. M. Simplifications:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

ICS 214B: Transaction Processing and Distributed Data Management


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. ICS 214B: Transaction Processing and Distributed Data Management Lecture 8: Distributed Database Systems Professor Chen Li

    2. Application SQL Front End Query Processor Transaction Proc. File Access So far: Centralized DB systems P Software: M ... • Simplifications: • single front end • one place to keep locks • if processor fails, system fails, ... Notes 08

    3. Next: distributed database systems • Multiple processors ( + memories) • Heterogeneity and autonomy of “components” Notes 08

    4. Why do we need Distributed Databases? • Example: Big Corp. has offices in London, New York, and Hong Kong. • Employee data: • EMP(ENO, NAME, TITLE, SALARY, …) • Where should the employee data table reside? Notes 08

    5. Big Corp. Data Access Pattern • Mostly, employee data is managed at the office where the employee works • E.g., payroll, benefits, hire and fire • Periodically, Big Corp needs consolidated access to employee data • E.g., Big Corp. changes benefit plans and that affects all employees. • E.g., Annual bonus depends on global net profit. Notes 08

    6. Internet New York Payroll app London Payroll app EMP London New York Hong Kong Payroll app Problem: NY and HK payroll apps run very slowly! Hong Kong Notes 08

    7. Internet New York Payroll app London Payroll app London Emp NY Emp London New York Hong Kong Payroll app Much better!! Hong Kong HK Emp Notes 08

    8. Internet New York Payroll app London Payroll app Annual Bonus app London Emp NY Emp London New York Hong Kong Payroll app Distribution provides opportunities for parallel execution Hong Kong HK Emp Notes 08

    9. Internet New York Payroll app London Payroll app Annual Bonus app London Emp NY Emp London New York Hong Kong Payroll app Hong Kong HK Emp Notes 08

    10. Internet New York Payroll app London Payroll app Annual Bonus app Lon, NY Emp NY, HK Emp London New York Hong Kong Payroll app Replication improves availability Hong Kong HK, Lon Emp Notes 08

    11. Heterogeneity and Autonomy Application Stock ticker tape Files RDBMS Portfolio History of dividends, ratios,... Notes 08

    12. We will study data management with multiple processors and possible autonomy, heterogeneity • Impact on: • Data organization • Query processing • Access structures • Concurrency control • Recovery Notes 08

    13. We also study transaction monitors • Coordinate transaction execution • Multiple DBMSs • High performance • Have workflow facilities • Manage communications with client “terminals” Notes 08

    14. DB architectures (1) Shared memory P P P ... M Notes 08

    15. P P P M DB architectures (2) Shared disk ... M M ... Notes 08

    16. P P P M M M DB architectures (3) Shared nothing ... Notes 08

    17. ... ... P P P P P P M M DB architectures (4) Hybrid example – Hierarchical or Clustered Notes 08

    18. Issues for selecting architecture • Reliability • Scalability • Geographic distribution of data • Data “clusters” • Performance • Cost Notes 08

    19. Parallel or distributed DB system? • More similarities than differences! Notes 08

    20. Typically, parallel DBs: • Fast interconnect • Homogeneous software • High performance is goal • Transparency is goal Notes 08

    21. Typically, distributed DBs: • Geographically distributed • Data sharing is goal (may run into heterogeneity, autonomy) • Disconnected operation possible Notes 08

    22. Distributed Database Challenges • Distributed Database Design • Deciding what data goes where • Depends on data access patterns of major applications • Two subproblems: • Fragmentation: partition tables into fragments • Allocation: allocate fragments to nodes Notes 08

    23. Distributed Database Challenges • Distributed Query Processing • Centralized query plan goal: minimize number of disk I/Os • Additional factors in distributed scenario: • Communication costs • Opportunity for parallelism • Space of possible query plans is much larger! Notes 08

    24. Distributed Database Challenges • Distributed Concurrency Control • Transactions span nodes • Must be globally serializable • Two main approaches: • Locking • Timestamps • Distributed Deadlock Management • Multiple data copies – need to be kept in sync when updates occur Notes 08

    25. Distributed Database Challenges • Reliability of Distributed Databases • Centralized database failure model: • processor fails • Distributed database failure model: • One or more processors may fail • Network may fail • Network may be partitioned • Data must be kept in sync Notes 08

    26.  To illustrate synchronization problems: “Two Generals” Problem Notes 08

    27. The one general problem (Trivial!)  Battlefield G Troops Notes 08

    28. The two general problem: <-------------------------------> Blue army Red army Enemy Blue G Red G messengers Notes 08

    29. Rules: • Blue and red army must attack at same time • Blue and red generals synchronize through messengers • Messengers can be lost Notes 08

    30. Distributed Database Challenges • Heterogeneity Application Stock ticker tape Files RDBMS Portfolio History of dividends, ratios,... Notes 08

    31. Distributed Database Challenges • Autonomy Example: unable to get statistics for query optimization Example: blue general may have mind of his (or her) own! Notes 08

    32. Next Topic • Distributed DB Design Notes 08

    33. Distributed DB Design Top-down approach: - have DB… - how to split and allocate the sites Bottom-up approach: - multi-database (possibly heterogeneous, autonomous) - no design issues! Notes 08

    34. Two issues in DDB design: • Fragmentation • Allocation Note: issues not independent, but will cover separately Notes 08

    35. Motivation: Two sites: Sa, Sb Qa  Qb Sa Sb Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... Notes 08

    36. 5 Joe Sa 10 # NM Loc Sal E 7 Sally Sb 25 8 Tom Sa 15 .. .. F # NM Loc Sal # NM Loc Sal 5 Joe Sa 10 7 Sally Sb 25 .. 8 Tom Sa 15 .. At Sb At Sa Notes 08

    37. F = { F1, F2 } F1 = loc=Sa(E) F2 = loc=Sb(E)  called primary horizontal fragmentation Notes 08

    38. Fragmentation • Horizontal Primary depends on local attributes R Derived depends on foreign relation • Vertical R Notes 08

    39. Used mostly in parallel dbs Used in parallel dbs and distributed dbs Three common horizontal fragmentation techniques • Round robin • Hash partitioning • Range partitioning Notes 08

    40. Round robin R D0 D1 D2 t1 t1 t2 t2 t3 t3 t4 t4 ... t5 • Evenly distributes data • Good for scanning full relation • Not good for point or range queries • Not suitable for databases distributed over WAN Notes 08

    41. Hash partitioning R D0 D1 D2 t1h(k1)=2 t1 t2h(k2)=0 t2 t3h(k3)=0 t3 t4h(k4)=1 t4 ... • Good for point queries on key; also for joins on key • Not good for range queries; point queries not on key • If hash function good, even distribution • Not suitable for databases distributed over a WAN Notes 08

    42. Range partitioning partitioning vector R D0 D1 D2 t1: A=5 t1 t2: A=8 t2 t3: A=2 t3 t4: A=3 t4 ... 4 7 V0 V1 • Good for point queries on A; also for joins on A • Good for some range queries on A • Need to select good vector: else unbalanced • data skew, execution skew Notes 08

    43. Which are good fragmentations? Example: F = { F1, F2 } F1 = sal<10 E F2 = sal>20 E  Problem: Some tuples lost! Notes 08

    44. Which are good fragmentations? Second example: F = { F3, F4 } F3 = sal<10 E F4 = sal>5 E  Tuples with 5 < sal < 10 are duplicated... Notes 08

    45. Better design Example: F = { F5, F6, F7 } F5 = sal  5 E F6 = 5<sal<10 EF7 = sal  10 E  Then replicate F6 if convenient (part of allocation problem) Notes 08

    46. Desired properties for fragmentation R F = {F1, F2, …, Fn} • Completeness • For every data item x  R,  FiFsuch that xFi • Disjointness • xFi,  Fj such that xFj, i  j • Reconstruction • There is function g such that R = g(F1, F2, …, Fn) Notes 08

    47. Fi FiF Desired properties for horizontal fragmentation R F = {F1, F2, …, Fn} • Completeness • For every tuple tR,  FiFsuch that tFi • Disjointness • tFi,  Fj such that tFj, i  j • Reconstruction – can safely ignore • Completeness  R = Notes 08

    48. How do we get completeness and disjointness? (1) Check it “manually”! e.g., F1 = sal<10 E ; F2 = sal10 E Notes 08

    49. How do we get completeness and disjointness? (2) “Automatically” generate fragments with these properties • Horizontal fragments are defined by selection predicates • Generate a set of selection predicates with the desired properties Notes 08

    50. Example of generation • Say queries use predicates: A<10, A>5, Loc = SA, Loc = SB • Next: - generate “minterm” predicates - eliminate useless ones • Given simple predicates Pr= { p1, p2,.. pn } • minterm predicates are of the form • p1*  p2*  …  pn* • where pk* is pk or is ¬pk Notes 08