1 / 23

Using HPS Switch on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl

Using HPS Switch on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006. IBM Switch Evolution. IBM Switch Evolution. HPS Switch Configuration. Bassi Switch Configuration. IBM Software.

braima
Download Presentation

Using HPS Switch on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using HPS Switch on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006

  2. IBM Switch Evolution

  3. IBM Switch Evolution

  4. HPS Switch Configuration

  5. Bassi Switch Configuration

  6. IBM Software • Parallel Environment (PE 4.2.2) which contains poe and MPI remains unchanged • Parallel System Support Package (PSSP 3.5.0), which contains LAPI, absorbed in Reliable Scalable Clustering Technology (RSCT 2.4.2) software stack.

  7. IBM Software • MPI 4.2.2 • Uses LAPI as reliable transport layer • Uses threads not signals for asynchronous activities • Binary compatible • New performance characteristics • Eager • Bulk transfer • Collectives

  8. IBM Software Stack Application ESSL PESSL GPFS Sockets VSD TCP UDP MPI LAPI IP HAL IF_LS SMA3+ Adapter HPS

  9. Communication Modes • FIFO mode • Chopped into 2KB chunks on host, copied by CPU • Remote Direct Memory Access (RDMA) • CPU offload • One I/O bus crossing CPU User Buffer RDMA FIFO DMA Adapter

  10. RDMA (Bulk transfer) • Overlap of communication and computation possible • Asynchronous-messaging applications • One-sided communications • Reduce CPU work • Offload fragmentation and reassembly • Minimize packet arrival interrupts • Reduce memory subsystem load • Zero copy transport • Striping across adapters

  11. RDMA vs. Packet

  12. data req data ack ack ack MPI Transfer Protocols P0 P1 • Eager: send data immediately; store in remote buffer • No synchronization • Only one message sent • Uses memory for buffering (less for application) • Rendezvous: send message header; wait for recv to be posted; send data • No data copy may be required • No memory required for buffering (more for application) • More messages required • Synchronization (standard send blocks until recv posted)

  13. Eager vs. Rendezvous

  14. Latency

  15. Internode Comparison

  16. Internode Comparison

  17. Intranode Comparison

  18. Intranode Comparison

  19. Packed-node Comparison

  20. Packed-node Comparison

  21. POE environment variables • MP_SINGLE_THREAD • Set to Yes for slight latency decrease, set to No for MPI I/O and OpenMP, etc. • MP_USE_BULK_XFER • Default to Yes • MP_BULK_MIN_MSG_SIZE • Default to ~150KB 21

  22. POE environment variables • MP_BUFFER_MEM • Default is 64MB • MP_EAGER_LIMIT • Varies from 32KB to 1KB depending on job size, can be increased in conjunction with MP_BUFFER_MEM • LAPI parameters for apps with many blocking send of small mgs: • MP_REXMIT_BUF_SIZE • Default 128 bytes • MP_REXMIT_BUF_CNT • Default is 128 buffers 22

  23. IBM Documentation • RSCT for AIX 5L LAPI Programming Guide (SA22-7936-03) • LAPI programming • Parallel Environment for AIX 5L V4.2.2Operation and Use, Vol 1 (SA22-7948-04) • Running jobs • Parallel Environment for AIX 5L V4.2.2Operation and Use, Vol 2 (SA22-7949-04) • Performance tools • Parallel Environment for AIX 5L V4.2.2MPI Programming Guide (SA22-7945-04) • IBM MPI implementation

More Related