1 / 20

About some OS features related to Cluster Computing

About some OS features related to Cluster Computing . Loïc Prylli LIP (RESO team) ENS-Lyon/INRIA/CNRS France. Outline . Introduction Revisiting OS-bypass Asynchronous-IO APIs (TCP/IP, disk) Application: remote file access Conclusion. PROC. mémoire. Node hardware view (Myrinet).

nura
Download Presentation

About some OS features related to Cluster Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. About some OS features related to Cluster Computing Loïc Prylli LIP (RESO team) ENS-Lyon/INRIA/CNRS France

  2. Outline • Introduction • Revisiting OS-bypass • Asynchronous-IO APIs (TCP/IP, disk) • Application: remote file access • Conclusion

  3. PROC mémoire Node hardware view (Myrinet) Disques BUS PCI SRAM LANAI PROC Carte réseau Myrinet

  4. IP stack Driver BIP Socket interface Internet applications PVM MPI-BIP Madeleine BIP BIP « firmware » Software View kernel Libraries Embedded software

  5. What is OS-bypass (request side)? Application and libraries User-space OS-bypass OS services Kernel-space Request queue Network embedded firmware Device-space

  6. OS-bypass for data movement via « memory registration » Application and libraries User-space (virtual-memory) OS DMA Network embedded firmware

  7. Memory registration • Problem maintaining the coherence between: • OS view of the virtual space • Nic view of the virtual space • Particularly across fork/mmap/munmap, ex: • send operation -> implicit registration • munmap/mmap -> change the address space • Send operation -> reuse obsolete registration • Strong dependency on OS internals

  8. OS-Bypass : when is it useful? Communication library Communication library System Call Parameters validation Access control Protocol processing Parameters validation Access control Protocol processing Network interface Network interface Without OS-bypass With OS-bypass

  9. Syscall overhead for Linux

  10. Syscall overhead for Linux

  11. Example in the architecture of BIP • Security level: • No network protection: • OS-bypass is best • New development with network protection: • Parameter validation and checking and fairness is done in kernel for message sending

  12. Sometimes cluster are not in a « MPI » closed environment Internet Grids, Storage

  13. Mixing cluster communications and other I/O activities • 10000 connections problem: • how to deal efficiently with 10K connections (generally TCP connections, but also disks I/Os, internal cluster communications) • Typical application loop: • Wait for some event(any source) • Treat request • Problem: • Poll/select/MPI_WaitAny not scalable • Mixing with clusters communications make it worse • Independant threads is problematic when modifying the set of connections

  14. API to manage disk or TCP/IP efficiently • Problem: Old POSIX I/O is limited • No concurrency/pipeline allowed without threads. • Solution: using kernel-managed completions queues (as provided by Linux AIO project) • Functionality similar to NT queues or FreeBsd Kqueue, • Interface: • io_submit_req()=> (read, write, send, recv requests) • io_getevents()

  15. AIO subsytem structure:overcome the MPI_WaitAny or select()/poll() design Application/Libraries Event-queue requests OS interruptions requests HARDWARE

  16. Application: NFS replacement for cluster • Shared lib implementation on top of either GM, BIP TCP/IP. • Uses Linux-AIO for TCP/IP, Disk I/O. • Server-side export: • either a in-memory filesystem, • or some local fileystem. • Conceptually similar to DAFS: • add transparent use • Point-to-point design

  17. Usual NFS architecture Application NFS server (kernel) Virtual File System TCP/RPC Virtual File System Ext2 VFAT NFS client Ext2 VFAT Buffer-Cache Buffer-Cache IDE SCSI Local disks client server

  18. VIA/BIP/GM Our simple remote file access protocol architecture Application Server Client Virtual File System Ext2 FAT32 Buffer-Cache IDE SCSI Local disks

  19. Results on Myrinet • Micro Benchmark 100Mbyte copy:

  20. Conclusion • OS-bypass is not necessarily a performance optimisation, it is an architecture choice. • Similarity in the evolution of API and subsystems for cluster network communications and disk/TCP-IO API: • Strongly asynchronous design • Completion queues (a missing feature in MPI)

More Related