new progress in open mpi p2p communication elan and sicortex n.
Skip this Video
Loading SlideShow in 5 Seconds..
New Progress in Open MPI p2p communication: Elan and Sicortex PowerPoint Presentation
Download Presentation
New Progress in Open MPI p2p communication: Elan and Sicortex

play fullscreen
1 / 15
Download Presentation

New Progress in Open MPI p2p communication: Elan and Sicortex - PowerPoint PPT Presentation

byron
129 Views
Download Presentation

New Progress in Open MPI p2p communication: Elan and Sicortex

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. New Progress in Open MPI p2p communication: Elan and Sicortex Teng Ma, George Bosilca @2008 ICL retreat

  2. P2p communication in Open-MPI MPI application MPI level PML(p2p management layer) OB1 or DR BML(BTL management layer) MX BTL Elan BTL UDAPL BTL SM BTL OFUD BTL SCTP BTL GM BTL Openib BTL TCP BTL …… will come soon Sicortex BTL Xensocket BTL

  3. recalls for 1st elan btl version • Use elan Tport to implement btl’s send interface and elan RDMA to implement btl’s put and get interfaces. • Provide comparable bandwidth with vender’s quadrics MPI but still have some problem in latency

  4. Latency Problem

  5. Memory copy issue Open MPI elan btl Quadrics MPI Elan system buffer Elan system buffer User buffer User buffer Btl buffer Copy Copy Copy

  6. Elan queue send/recv • It doesn’t need pre-registered buffers to receive. The message is stored in elan system buffer (in elan queue). • Elan queue has better performance than elan tport for the message size<=2KB. 2KB is one slot size of elan queue.

  7. Queue and tport

  8. The latest elan btl

  9. elan btl’s status now… • Fix the bug of backward rank initialization and finalization bug.( no bug now) • Support multi-rail on single node. • Use elan’s queue, tport and RDMA to do Open-mpi send and put protocol. • the latency of small message improves a lot. • Provide Multi-thread support.

  10. Elan btl’s roadmap

  11. Sicortex machine

  12. p2p Performance provided by Sicortex

  13. Programming environment • MPI library (libscmpi.a) • Slurm • DMA library(libscdma.a)

  14. An example of do “get” by Sicortex DMA enigne • recvbuf = (char *) (((uintptr_t) &bigbuf[65536]) & (~65535ULL)); // 64KB alignment • ret = scdma_map_bds(ctx, 3, recvbuf, rs->bd_count); // map into dma buffer • void *cmd = (void *) scdma_cq_head_spinwait(ctx); //find a cmd header • uint64_t segmentComplete = 0; • scdma_build_s_bf_bf_cmdend_put (cmd, • peers[client>serverRank].route_handles[0], • peers[client->serverRank].ports[0], • client->returnRank, • rs->bd_base + i, 0, // source • 3 + i, 0, //destination • sysconf(_SC_PAGESIZE), // size of transfer • 0, • (uintptr_t) &segmentComplete); • __asm__ volatile("sync"); /* force those out to memory */ • scdma_cq_post(ctx); // issue the command to dma engine

  15. Future work • Improve the elan’s latency using tport to send. • Finish the development of Sicortex btl