1 / 92

Madeleine

Madeleine. Olivier Aumage. Runtime Project INRIA – LaBRI Borde aux, France. Application. Model. Programming environment. Abstraction. Middle level interface. Software stack. Hardware control. Low level interface. Network. Objective.

Download Presentation

Madeleine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Madeleine Olivier Aumage Runtime Project INRIA – LaBRI Bordeaux, France

  2. Application Model Programmingenvironment Abstraction Middlelevelinterface Software stack Hardware control Lowlevelinterface Network Objective Rational task assignment in high-performance communication stacks

  3. A communication supportfor clusters and multi-clusters Madeleine

  4. Features Abstract interface • Programmation by contract • Specification of constraints • Freedom for optimization • Active software support • Dynamic optimization • Adaptivity • Transparency

  5. Interface Definitions • Connection • Uni-directional point-to-point link • FIFO ordering • Channel • Graph of connections • Multiplexing unit • Network virtualization Process Connection Channel

  6. Communication model Characteristics • Model • Message passing • Incremental message builing • Expressiveness • Control of data blocs by flags • Contract between the programmer and the interface Express

  7. Primitives Main commands • Send • mad_begin_packing • mad_pack • … • mad_pack • mad_end_packing • Receive • mad_begin_unpacking • mad_unpack • … • mad_unpack • mad_end_unpacking

  8. Message building • Commands • Mad_pack(cnx, buffer, len, pack_mode, unpack_mode) • Mad_unpack(cnx, buffer, len, pack_mode, unpack_mode) • Send contract options (send modes) • Send_CHEAPER • Send_SAFER • Send_LATER • Receive contract options (receive modes) • Receive_CHEAPER • Receive_EXPRESS • Constraints • Strictly symmetrical pack/unpack sequences • Triplets (len, pack_mode, unpack_mode) identical for send and for receive • Data consistency

  9. Send Send_SAFER Send_LATER Send_CHEAPER Pack Modification ? End_packing

  10. Contract between the programmer and the interface Send_SAFER/ Send_LATER/ Send_CHEAPER • Control of data transfer • Optimization amount • Promises of programmer • Data consistency • Special services • Delayed send • Buffer reuse • Specification at semantical level • Independency: request / implementation

  11. Data available Receive Receive_EXPRESS Receive_CHEAPER Unpack After Unpack Data available Availability? End_unpacking

  12. Message structuring Receive_CHEAPER / Receive_EXPRESS • Receive_EXPRESS • Mandatory immediate receive • Interpretation/extraction of message • Receive_CHEAPER • Free reception of block • Message contents Express

  13. Two-layered model Buffer management Data processing code reuse Hardware abstraction Modular approach Buffer management modules Drivers Transmission modules Organization Interface BMM BMM Buffermanagement Driver Driver TM TM TM Networkmanagement Network

  14. Drivers Network management layer • Data transfers • Send, receive • Group transfers • Transfer method selection • Choice function

  15. Transmission modules • Depends on the network • One module per transfer method • Pilote GM: 2 TM • Pilote BIP: 2 TM • Pilote SCI: 3 TM • Pilote VIA: 3 TM • Associated to a buffer management module

  16. Pack Transmission modules Madeleine Interface BMM TM BMM TM Thread Network Process

  17. Buffers Generic management layer • Virtual buffers • Static • Dynamic • Groups • Aggregations • Splitting

  18. Buffer management modules • Buffer type • Static/dynamic • Aggregation mode • Without • Sequential aggregation • Half-sequential aggregation • Aggregation shape • Symmetrical/non-symmetrical

  19. Status Network drivers Quadrics, MX, GM, SISCI,MPI, TCP, VRP VIA, UDP, SBP, BIP Distribution Licence GPL Availability Linux IA32, IA64, x86-64, Alpha, Sparc, PowerPC MacOS/X G4 Solaris IA32, Sparc Aix PowerPC Windows NT IA32 Implementation

  20. Tests –current plaform Test environment • Cluster of PC bi-Pentium IV HT 2.66 GHz, 1 GB • Giga-Ethernet • SISCI/SCI • MX & GM /Myrinet • Quadrics Elan4 Testing procedure • Test: 1000 x (send + receive) • Result: ½ x average of 5 tests

  21. Latency Latency (µs) Packet size (bytes)

  22. Bandwidth Bandwidth (MB/s) Transfer time (bytes)

  23. Tests –older platform Testing environments • Cluster of PC bi-Pentium II 450 MHz, 128 MB • Fast-Ethernet • SISCI/SCI • BIP/Myrinet Testing procedure • Test: 1000 x (send + receive) • Result: ½ x average of 5 tests

  24. SISCI/SCI – latency Latency (µs) Packet size (bytes)

  25. SISCI/SCI – bandwidth Bandwidth (MB/s) Packet size (bytes)

  26. SISCI/SCI – latencyPacks/messages Latency (µs) Packet size (bytes)

  27. SISCI/SCI – bandwidthPacks/messages Bandwidth (MB/s) Packet size (bytes)

  28. Users –MPICH/Madeleine API MPI Generic interface: point-to-point communication, collective communication, groups building Abstract Device Interface (ADI) Generic interface: data type management, request queues management CH_MAD SMP_PLUG CH_SELF Communication Local communication Local loops Polling loops Internal MPICH protocols Madeleine Communication TCP UDP BIP SISCI GM MX QSNET Multi-protocol support

  29. MPICH/Mad/SCI – Latency Latency (µs) Packet size (bytes)

  30. MPICH/Mad/SCI – bandwidth Bandwidth (MB/s) Packet size (bytes)

  31. Application MPI ORB JVM Circuit VSock Padico Net Access Thread Padico manager micro-kernel Padico Core Padico Task Manager Madeleine Marcel Communication TCP UDP BIP SISCI GM MX QSNET Multi-protocol support Users –Padico

  32. Padico – latency Latency (µs) Packet size (bytes)

  33. Padico – bandwidth Bandwidth (MB/s) Packet size (bytes)

  34. Conclusion Unified communication support • Abstract interface • Contract-based programming • Modular/adaptive architecture • Dynamic optimization • Transparent multi-cluster support

  35. On-going/future work Programming interface • Message structuration • Near-future information exploitation • Pathological cases reduction • Fault tolerance Communication sequences processing • Code specialization, compilation Session management • Deployment • Dynamicity • Fault-tolerance • Scaling

  36. ? Madeleine I Madeleine II Madeleine III Madeleine IV

  37. Some limitations of Madeleine (version III) Objectives for a new Madeleine • Some optimizations are out of reach for Madeleine • The optimization range is to narrow • Need information about what is coming in the near future • Need to be more liberal in allowing permutations in the packet flow • Optimizations strategies involve too much work from the driver programmer • Need to share more of strategic code • Need to easily evaluate and even mix various strategies • Optimization sequences are synchronous with the application program • Need to synchronize optimization sequences with the NIC

  38. Constraints Tracks Proposal: Madeleine IV Tactics Optimizer thread Sender thread Hardware-specificparameters Driver Strategies Network Optimizer thread

  39. Concepts Definitions • Tracks • Hardware multiplexing units mapping (tags) • Main track • Control packets, small packets, … • Optional auxiliary tracks • Other traffics (large messages, …) • Tactics • Basic optimization operations • Permutation, aggregation, piggybacking, association, splitting, track change • Strategies • Set of tactics towards one optimization goal • Constraints • Tactics compatibility • Send/receive modes

  40. Constraints Tracks Proposal: Madeleine IV Tactics Optimizer thread Sender thread Hardware-specificparameters Driver Strategies Network Optimizer thread

  41. Packet headers Giving up a little bit of raw efficiency to get much more flexibility • Opportunist packet aggregation/permutation • Inside a single packet flow • Across multiple packet flows • Side effects • Control packets • Rendez-vous • ACKs • Piggybacking • Multiplexing

  42. Concurrent communication progression Communication scheduling • The NIC is responsible for requesting work • Packets are built when the NIC is ready • The optimizer gets more time to gather up-to-date optimization clues

  43. Tests Test environment • Cluster of PC bi-Pentium IV HT 2.66 GHz, 1 GB • MX / Myrinet Testing procedure • Test: 1000 x (send + receive) • Result: ½ x average of 5 tests

  44. Test – Latency Latency (µs) Packet size (bytes)

  45. Test – Bandwidth Bandwidth (MB/s) Packet size (bytes)

  46. Test – Latency when aggregating short packets Latency (µs) Packet size (bytes)

  47. Opportunist aggregation on RDV Aggregating a short packet with a RDV request for a long packet • No gain with MX/Myrinet • Madeleine III • Latency: 310 µs • Bandwidth: 201 MB/s • Madeleine IV • Latency: 314 µs • Bandwidth: 200 MB/s • MX flow control gets in the way

  48. Conclusion • A new architecture for optimizing communication • Wider optimization spectrum • Better interactions between software and harware • A platform for experimenting optimizations • Optimization tactics • A prototype implemented on top of MX/Myrinet • Proof of concept

  49. On-going and future work • Optimization • Tactic combinations • Automatic strategy selection • External strategies (plug-ins) • Interface expressiveness • Extended packs • One-sided communication • Load-balancing, multi-rail • Benefit from all available links

  50. Constraints Tracks Proposal: Madeleine IV Tactics Optimizer thread Sender thread Hardware-specificparameters Driver Strategies Network Optimizer thread

More Related