1 / 40

High-Availability LH* Schemes with Mirroring

High-Availability LH* Schemes with Mirroring. W. Litwin, M.-A. Neimat U. Paris 9 & HPL Palo-Alto litwin@cid5.etud.dauphine.fr. LH* with mirroring. A Scalable Dsitributed Data Structures Data are in Distributed RAM of server nodes of a multicomputer Uses the mirroring to survive :

lula
Download Presentation

High-Availability LH* Schemes with Mirroring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Availability LH* Schemes with Mirroring W. Litwin, M.-A. Neimat U. Paris 9 & HPL Palo-Alto litwin@cid5.etud.dauphine.fr

  2. LH* with mirroring • A Scalable Dsitributed Data Structures • Data are in Distributed RAM of server nodes of a multicomputer • Uses the mirroring to survive : • every single node failure • most of multiple node failures • Moderate performance deterioration with respect to basic LH*

  3. Plan • Introduction • multicomputers & SDDSs • need for high availability • Principles of LH* with mirroring • Design issues • Performance • Conclusion

  4. Multicomputers • A collection of loosely coupled computers • common and/or preexisting hardware • share nothing architecture • message passing through high-speed net • Network multicomputers • use general purpose nets • LANs: Ethernet, Token Ring, Fast Ethernet, SCI, FDDI... • WANs: ATM... • Switched multicomputers • use a bus, • e.g., Transputer & Parsytec

  5. Network multicomputer Server Client

  6. Why multicomputers ? • Potentially unbeatable price-performance ratio • Much cheaper and more powerful than supercomputers • 1500 WSs at HPL with 500+ GB of RAM & TBs of disks • Potential computing power • file size • access and processing time • throughput • For more pro & cons : • NOW project (UC Berkeley) • Tanenbaum: "Distributed Operating Systems", Prentice Hall, 1995

  7. Why SDDSs • Multicomputers need data structures and file systems • Trivial extensions of traditional structures are not best • hot-spots • scalability • parallel queries • distributed and autonomous clients

  8. What is an SDDS • A scalable data structure where: • Data are on servers • always available for access • Queries come from autonomous clients • available for access only on its initiative • There is no centralized directory • Clients sometime make addressing errors • Clients have less or more adequate image of the actual file structure • Servers are able to forward the queries to the correct address • perhaps in several messages • Servers send Image Adjustment Messages • Clients do not make same error twice

  9. An SDDS Servers

  10. An SDDS growth through splits under inserts Servers

  11. An SDDS growth through splits under inserts Servers

  12. An SDDS Servers Clients

  13. An SDDS Clients

  14. An SDDS Clients

  15. An SDDS IAM Clients

  16. An SDDS Clients

  17. An SDDS Clients

  18. Known SDDSs DS SDDS (1993) Classics Arbre k-d Hachage Arbre 1-d LH* schemes DDH Breitbart & al k-RP* schemes RP* schemes Kroll & Widmayer

  19. Known SDDSs DS SDDS (1993) Classics Arbre k-d Hachage You are here Arbre 1-d LH* schemes DDH Breitbart & al k-RP* schemes RP* schemes Kroll & Widmayer

  20. LH* (A classic) • Allows for key based hash files • generalizes the LH addressing schema • Load factor 70 - 90 % • At most 2 forwarding messages • regardless of the size of the file • In practice, 1 m/insert and 2 m/search on the average • 4 messages in the worst case • Search time of a ms (10 Mb/s net) and of us (Gb/s net

  21. 10,000 inserts Global cost Client's cost

  22. High-availability LH* schemes • In a large multicomputer, it is unlikely that all servers are up • Consider the probability that a bucket is up is 99 % • bucket is unavailable 3 days per year • One stores every key in 1 bucket • case of typical SDDSs, LH* included • Probability that n-bucket file is entirely up is • 37 % for n = 100 • 0 % for n = 1000

  23. High-availability LH* schemes • Using 2 buckets to store a key, one may expect : • 99 % for n = 100 • 91 % for n = 1000 • High availability SDDS • make sense • are the only way to reliable large SDDS files

  24. High-availability LH* schemes • High-availability LH* schemes keep data available despite server failures • any single server failure • most of two server failures • some catastrophic failures • Three types of schemes are currently known • with mirroring • with striping or grouping

  25. LH* with Mirroring • There are two files called mirrors • Every insert propagates to both • the propagation is done by the servers • Splits are autonomous • Every search is directed towards one of the mirrors • the primary mirror for the corresponding client • If a bucket failure is detected, the spare is produced instantly at some site • the storage for failed bucket is reclaimed • it can be allocated to another bucket

  26. Basic configuration • Protection against a catastrophique failure Mirrors

  27. High-availability LH* schemes • Two types of LH* schemes with mirroring appear • Structurally-alike (SA) mirrors • same file parameters • keys are presumably at the same buckets • Structurally-dissimilar (SD) mirrors • keys are presumably at different buckets • loosely coupled = same LH-functions hi • minimally coupled = different LH-functions hi

  28. SA-Mirrors

  29. SA-Mirrorsnew forwarding paths

  30. Failure management • A bucket failure can be discovered • by the client • by the forwarding or mirroring server • by the LH* split coordinator • The failure discovery triggers the instant creation of a spare bucket • a copy of the failed bucket constructed from the mirror file • from one or more buckets

  31. Spare creation • The spare creation process is managed by the coordinator • choice of the node for the spare • transfert of the records from the mirror file • the algo is in the paper • propagation of the spare node address to the node of the failed bucket • when the node recovers, it contacts the coordinator

  32. And the client ? • The client can be unaware of the failure • it then may send the message to the failed node • that perhaps recovered and has another bucket n' • Problem • bucket n' should recognize an addressing error • should forward the query to the spare • a case that did not exist for the basic LH*

  33. Solution • Every client sends with the query Q the address n of the bucket Q should reach • if n <> n', then bucket n' resends the query to bucket n • that must be the spare • Bucket n sendsan IAM to the client to adjust its alloc. table • a new kind of IAM

  34. SA / SD mirrors SA-mirrors SD-mirrors

  35. Comparison • SA-mirrors • most efficient for access and spare production • but max loss in the case of two-bucket failure • Loosely-coupled SD-mirrors • less efficient for access and spare production • lesser loss of data for a two-bucket failure • Minimally-coupled SD-mirrors • least efficient for access and spare production • min. loss for a two-bucket failure

  36. Conclusion • LH* with mirroring is first SDDS for high-availability • for large multicomputer files • for high-availability DBs • avoids to create fragments replicas • Variants adapted to importance of different kinds of failures • How important is a multiple bucket failure ?

  37. Price to pay • Moderate access performance deterioration as compared to basic LH* • an additional message to the mirror per insert • a few messages when failures occur • Double storage for the file • can be a drawback

  38. Future directions • Implementation • Performance analysis • in presence of failures • Concurrency & transaction management • Other high-availability schemes • RAID-like

  39. FIN Thank you for your attention W. Litwin litwin@cid5.etud.dauphine.fr

More Related