1 / 75

Plan

Scalable Distributed Data Structures & High-Performance Computing Witold Litwin Fethi Bennour CERIA University Paris 9 Dauphine http://ceria.dauphine.fr/. Plan. Multicomputers for HPC What are SDDSs ? Overview of LH* Implementation under SDDS-2000 Conclusion. Multicomputers.

Download Presentation

Plan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Distributed Data Structures & High-Performance ComputingWitold Litwin Fethi BennourCERIAUniversity Paris 9 Dauphinehttp://ceria.dauphine.fr/

  2. Plan • Multicomputers for HPC • What are SDDSs ? • Overview of LH* • Implementation under SDDS-2000 • Conclusion

  3. Multicomputers • A collection of loosely coupled computers • Mass-produced and/or preexisting hardware • share nothing architecture • Best for HPC because of scalability • message passing through high-speed net (0Mb/s) • Networkmulticomputers • use general purpose nets & PCs • LANs: Fast Ethernet, Token Ring, SCI, FDDI, Myrinet, ATM… • NCSA cluster : 1024 NTs on Myrinet by the end of 1999 • Switched multicomputers • use a bus, or a switch • IBM-SP2, Parsytec...

  4. Why Multicomputers ? • Unbeatable price-performance ratio for HPC. • Cheaper and more powerful than supercomputers. • especially the network multicomputers. • Available everywhere. • Computing power. • file size, access and processing times, throughput... • For more pro & cons : • IBM SP2 and GPFS literature. • Tanenbaum: "Distributed Operating Systems", Prentice Hall, 1995. • NOW project (UC Berkeley). • Bill Gates at Microsoft Scalability Day, May 1997. • www.microoft.com White Papers from Business Syst. Div. • Report to the President, President’s Inf. Techn. Adv. Comm., Aug 98.

  5. Typical Network Multicomputer Client Server

  6. Why SDDSs • Multicomputers need data structures and file systems • Trivial extensions of traditional structures are not best • hot-spots • scalability • parallel queries • distributed and autonomous clients • distributed RAM & distance to data • For a CPU,data on a disk are as far as those at the Moon for a human (J. Gray, ACM Turing Price 1999)

  7. What is an SDDS ? • Data are structured • records with keys  objects with OIDs • more semantics than in Unix flat-file model • abstraction most popular with applications • parallel scans & function shipping • Data are on servers • waiting for access • Overflowing servers split into new servers • appended to the file without informing the clients • Queries come from multiple autonomous clients • Access initiators • Not supporting synchronous updates • Not using any centralized directory for access computations

  8. What is an SDDS ? • Clients can make addressing errors • Clients have less or more adequate imageof the actual file structure • Servers are able to forward the queries to the correct address • perhaps in several messages • Servers may send Image Adjustment Messages • Clients do not make same error twice • Servers supports parallel scans • Sent out by multicast or unicast • With deterministic or probabilistic termination • See the SDDS talk & papers for more • ceria.dauphine.fr/witold.html • Or the LH* ACM-TODS paper (Dec. 96)

  9. High-Availability SDDS • A server can be unavailable for access without service interruption • Data are reconstructed from other servers • Data and parity servers • Up to k ³ 1 servers can fail • At parity overhead cost of about 1/k • Factor k can itself scale with the file • Scalable availability SDDSs

  10. An SDDS growth through splits under inserts Servers Clients

  11. An SDDS growth through splits under inserts Servers Clients

  12. An SDDS growth through splits under inserts Servers Clients

  13. An SDDS growth through splits under inserts Servers Clients

  14. An SDDSClient Access Clients

  15. An SDDSClient Access Clients

  16. An SDDSClient Access IAM Clients

  17. An SDDSClient Access Clients

  18. An SDDSClient Access Clients

  19. Known SDDSs DS Classics

  20. Known SDDSs DS SDDS (1993) Classics Hash LH* DDH Breitbart & al

  21. RP* Kroll & Widmayer Known SDDSs DS SDDS (1993) Classics Hash 1-d tree LH* DDH Breitbart & al

  22. k-RP* dPi-tree RP* Kroll & Widmayer Known SDDSs DS SDDS (1993) Classics m-d trees Hash 1-d tree LH* DDH Breitbart & al

  23. k-RP* dPi-tree Nardelli-tree RP* Kroll & Widmayer Known SDDSs DS SDDS (1993) Classics m-d trees Hash 1-d tree LH* DDH Breitbart & al H-Avail. LH*m, LH*g Security LH*s

  24. k-RP* dPi-tree Nardelli-tree RP* Kroll & Widmayer Breitbart & Vingralek Known SDDSs DS SDDS (1993) Classics m-d trees Hash 1-d tree LH* DDH Breitbart & al Disk SDLSA H-Avail. LH*m, LH*g LH*SA Security s-availability LH*s LH*RS http://192.134.119.81/SDDS-bibliograhie.html

  25. LH* (A classic) • Scalable distributed hash partitionning • generalizes the LH addressing schema • variants used in Netscape products, LH-Server, Unify, Frontpage, IIS, MsExchange... • Typical load factor 70 - 90 % • In practice, at most 2 forwarding messages • regardless of the size of the file • In general, 1 m/insert and 2 m/search on the average • 4 messages in the worst case

  26. LH* bucket servers For every record c, its correct address a results from the LH addressing rule a Ühi(c) if n = 0 then exit else if a < n then aÜ h i+1 ( c) ; end (i, n) = the file state, known only to the LH*-coordinator Each server a keeps only track of the function hj used to access it: j = ior j = i+1

  27. LH* clients • Each client uses the LH-rule for address computation, but with the client image (i’, n’) of the file state. • Initially, for a new client (i’, n’) = 0.

  28. LH* Server Address Verification and Forwarding • Server a getting key c, a = m in particular,computes : a' := hj (c) ; if a' = a thenaccept c ; else a'' := hj - 1(c) ; if a'' > a and a'' < a' then a' := a'' ; send c to bucket a' ;

  29. Client Image Adjustment • The IAM consists of address a where the client sent c and of j (a) if j > i' then i' := j - 1, n' := a +1 ; if n' 2^i' then n' = 0, i' := i' +1 ; • The rule guarantees that client image is within the file • Provided there is no file contractions (merge)

  30. LH* : file structure servers j = 4 j = 4 j = 3 j = 3 j = 4 j = 4 0 1 2 7 8 9 n = 2 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinator Client Client

  31. LH* : file structure servers j = 4 j = 4 j = 3 j = 3 j = 4 j = 4 0 1 2 7 8 9 n = 2 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinator Client Client

  32. LH* : split servers j = 4 j = 4 j = 3 j = 3 j = 4 j = 4 0 1 2 7 8 9 n = 2 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinator Client Client

  33. LH* : split servers j = 4 j = 4 j = 3 j = 3 j = 4 j = 4 0 1 2 7 8 9 n = 2 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinator Client Client

  34. LH* : split servers j = 4 j = 4 j = 4 j = 3 j = 4 j = 4 j = 4 0 1 2 7 8 9 10 n = 3 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinator Client Client

  35. LH* : addressing servers j = 4 j = 4 j = 4 j = 3 j = 4 j = 4 j = 4 0 1 2 7 8 9 10 15 n = 3 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur Client Client

  36. LH* : addressing servers 15 j = 4 j = 4 j = 4 j = 3 j = 4 j = 4 j = 4 0 1 2 7 8 9 10 n = 3 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur Client Client

  37. LH* : addressing servers 15 j = 4 j = 4 j = 4 j = 3 j = 4 j = 4 j = 4 0 1 2 7 8 9 10 a =7, j = 3 n = 3 ; i = 3 n' = 0, i' = 3 n' = 3, i' = 2 Coordinateur Client Client

  38. LH* : addressing servers j = 4 j = 4 j = 4 j = 3 j = 4 j = 4 j = 4 0 1 2 7 8 9 10 9 n = 3 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur Client Client

  39. LH* : addressing servers j = 4 j = 4 j = 4 j = 3 j = 4 j = 4 j = 4 0 1 2 7 8 9 10 9 n = 3 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur Client Client

  40. LH* : addressing servers 9 j = 4 j = 4 j = 4 j = 3 j = 4 j = 4 j = 4 0 1 2 7 8 9 10 n = 3 ; i = 3 n' = 0, i' = 0 n' = 3, i' = 2 Coordinateur Client Client

  41. LH* : addressing servers 9 j = 4 j = 4 j = 4 j = 3 j = 4 j = 4 j = 4 0 1 2 7 8 9 10 a = 9, j = 4 n = 3 ; i = 3 n' = 1, i' = 3 n' = 3, i' = 2 Coordinateur Client Client

  42. Result • The distributed file can grow to even whole Internet so that : • every insert and search are done in four messages (IAM included) • in general an insert is done in one message and search in two message

  43. SDDS-2000Prototype Implementation of LH* and of RP* on Wintel multicomputer • Architecture Client/Server • TCP/IP Communication (UDP and TCP) with Windows Sockets • Multiple threads control • Processes synchronization (mutex, critical section, event, time_out, etc) • Queuing system • Optional Flow control for UDP messaging

  44. Server Network Socket Request Response send Request Receive Response Update Client Image Server Address file i n ..... ..... Request Response Receive Request Id_Req Id_App ... ..... Return Response Queuing system Interface : Applications - SDDS Applications SDDS-2000 : ClientArchitecture • Send Request • Receive Response • Return Response • Client Image process.

  45. SDDS-2000 : ServerArchitecture Bucket SDDS Insertion Search Update Delete Request Analyse • Listen Thread • Queuing system • Work Thread • Local process • Forward • Response … W.Thread 1 W.Thread 4 Queuing system Listen Thread Response Response Socket Network client Request Client

  46. 0 4 data 1 2 data 2 6 dataX 8 data3 -1 dataY -1 0 1 2 3 4 5 6 7 8 9 LH bucket . . . A record dynamic array LH* bucket LH*LH: RAM buckets

  47. Measuring conditions • LAN of 4 computers interconnected by a 100 Mb/s Ethernet • F.S : Fast Server : Pentium II 350 MHz & 128 Mo RAM • F.C : Fast Client : Pentium II 350 MHz & 128 Mo RAM • S.C : Slow Client : Pentium I 90 Mhz & 48 Mo RAM • S.S : Slow Server : Pentium I 90 Mhz & 48 Mo RAM • The measurements result from 10.000 records & more. • UDP Protocol for insertions and searches • TCP Protocol for splitting

  48. Best performances of a F.S : configuration S.C (1) F.S J=0 S.C (2) 100 Mb/s Bucket 0 S.C (3) UDP communication

  49. Fast Server Average Insert time • Inserts without ack • 3 clients create lost messages •  best time: 0,44 ms

  50. Fast ServerAverage Search time • The time measured include the search process + response return • More than 3 clients, there are a lot of lost messages • Whatever is the bucket capacity (1000,5000, …, 20000 records), • 0,66 ms is the best time

More Related