1 / 24

OFED for Linux: Status and Next Steps

OFED for Linux: Status and Next Steps. Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010. Agenda. Linux OFED components Releases completed in 2009: OFED 1.4.1 & 1.4.2 OFED 1.5 Releases planned for 2010: OFED 1.5.1 OFED 1.5.2/3 Future releases: OFED 1.6 and beyond

taliesin
Download Presentation

OFED for Linux: Status and Next Steps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OFED for Linux: Status and Next Steps Betsy Zeller (Qlogic), Tziporet Koren (Mellanox) 3/16/2010 www.openfabrics.org

  2. Agenda • Linux OFED components • Releases completed in 2009: • OFED 1.4.1 & 1.4.2 • OFED 1.5 • Releases planned for 2010: • OFED 1.5.1 • OFED 1.5.2/3 • Future releases: • OFED 1.6 and beyond • How can you contribute? • Questions and discussion • Planning for scalability and discussion on MPI inclusion www.openfabrics.org

  3. Linux OFED Components Add on OFA Development • HCA/NIC Drivers • IB: IBM, Mellanox, QLogic • iWARP: Chelsio, Intel • RoCEE: Mellanox • Core: Verbs, mad, SMA, CM, CMA • IPoIB • SDP • SRP and SRP Target • iSER and iSER Target • RDS • NFS-RDMA • Qlogic_VNIC • uDAPL • OSM • Diagnostic tools • Bonding module • Open iSCSI • MPI Components • MVAPICH • Open MPI • MVAPICH2 • Benchmark tests • Proprietary MPIs: Intel, Platform MPI • Proprietary SMs: Sun, Voltaire, Qlogic, Mellanox Tested with

  4. OFED 1.4.1 • Released on June 3, 2009 • Distro integration: • Integrated into RHEL 5.4 • SLES 11 • Newfeatures • Added support for RHEL 5.3 and SLES11 • NFS/RDMA: In beta quality with support for RHEL 5.2, 5.3 and SLES 10 SP2 • Updated MPI packages: MVAPICH 1.1.0-3355, Open MPI 1.3.2 • Updated bonding package: ib-bonding-0.9.0-40 • Updated DAPL: compat-dapl-1.2.14 and dapl-2.0.19 • Updated OpenSM version to include critical bug fixes • Fixed RDS iWARP support • Low level drivers updated: ehca, mlx4, cxgb3, nes, ipath, mthca • Added a module parameter to control number of MTTs per segment in Mellanox HCAs • mstflint update • Enhanced OpenSM and management tools, user interface, HA, routing enhancements, and more • Used in Intel ® Cluster Ready Solutions

  5. OFED 1.4.2 • Released on Aug 6, 2009 • Critical bug fixes, including: • Fixes to NES (Intel iWarp) driver • Fixes to support running with Lustre installed • NFS/RDMA critical bug fixes (still technology preview) • Distro integration: • RHEL 5.5 1.4.2+ • SLES 11 SP1 1.4.2 (kernel from 2.6.32) • Passed Oracle 11g certification with RDS www.openfabrics.org

  6. OFED 1.5 • Released on December 30, 2009 • Kernel base: 2.6.30 • To be included in RHEL 5.6 • Used in Intel ® Cluster Ready Solutions • List of Supported Kernels for OFED 1.5 • RHEL4: up6, up7, up8 • RHEL5: up2, up3, up4 • SLES10: SP2, SP3 • SLES 11 • Fedora Core 11* • OpenSuSE 11* • Kernel.org: 2.6.18-2.6.32* * minimal QA for these versions. www.openfabrics.org

  7. OFED 1.5 – New features New OSes: RedHat EL5.4, EL4.8 and SLES 10 SP 3 Kernel Base: 2.6.30 New supported kernels: 2.6.29-2.6.32 Forward port to support kernels above 2.6.30 Hardware driver (qib) for new Qlogic QDR HCA Added ConnectX2 support to mlx4 driver User space packages as tar balls for easier distro integration All libraries available under: http://www.openfabrics.org/downloads/ SDP: Zero Copy (beta); more performance improvements Bonding: Included only for older distros that does not include the IB support RHEL 5.3 / SLES 10 SP2 or older version

  8. 1.5 – New features – Cont. • uDAPL: • Scalability enhancements • New UCM provider • Common code base with WinOF 2.1. • MPI updates: • OSU MVAPICH 1.2.0 • Open MPI 1.4 • OSU MVAPICH2 1.4 • MPI tests 3.2 • NFS-RDMA in beta • Many bug fixes www.openfabrics.org

  9. 1.5 – Open SM new features QoS improvements • SL2VL setup optimization • QoS/LASH co-exist Major bug fix • MCG join/leave fixes • Clean delayed MCG deletion Scalability & performance • Optimized SL2VL setup • Parallel LFTs setup • Parallel MFTs setup Routing & multicast • FTree improvements • Routing engine reloading • Mesh switch reordering optimizations • MGID to MLID compression 9

  10. OFED 1.5.1 • To be released within the next week • Main new features: • Added RoCEE support • Updated Open MPI to rev 1.4.2 • Updated MVAPICH2 to rev 1.4-2 • Updated DAPL to rev 2.0.26 • Added extended atomic operations to ConnectX (kernel) • Improved NFS/RDMA stability • Need more testing • Added IPv6 support to RDMACM • Bug fixes www.openfabrics.org

  11. OFED 2010 Plans • Proposal: • Provide additional dot releases throughout the year • 1.5.2 – Jun • 1.5.3 – Sep • Delay 1.6, and changing the kernel base, till Q1 2011 • Pros: • Improve the stability of OFED 1.5.x, faster bug fix turnaround • Integrate support for new distros as they become available • Logo program: Vendors will be able to fix issues found • Cons: • Many patches in the fixes directory • Opinions? … www.openfabrics.org

  12. OFED 1.5.2 New Features • Add new OSes: RHEL 5.5, SLES11 SP1 • Update the management package • Add-on packages that does not touch the core • New low level drivers for new HW • Critical bug fixes • Schedule: June/July • Decide in EWG meetings * OFED 1.5.3 – will decide if needed after 1.5.2 www.openfabrics.org

  13. OFED 1.6 New Features • kernel base 2.6.35 or 36 • Virtualization and SRIOV support • Mellanox Vnic/vHBA for BridgeX • New HW from vendors (if any) • New scalability features (if any) • Soft-RoCEE & Soft-iWarp drivers • If ready can go into 1.5.x before • New management features • 3D torus support by OpenSM • MMU notifier for MPI • Solution that will be accepted by the kernel • Improve connection management scheme? • More features – Based on input from customers www.openfabrics.org

  14. OFED 1.6 Schedule & OS support • OS support reduction • Stop supporting old OSes (e.g. RHEL 4.x, SLES 10) • Schedule suggestion: • Start to work during Q4 2010 • Release in Q1 2011 • Opinions? www.openfabrics.org

  15. Reminder: How do you contribute? • Develop new code and features • Bug fixes • Performance tuning • Contribute backports for new OSes and forward ports for new kernels • QA and testing • Send patches and comments to the mailing lists: • ewg@lists.openfabrics.org – OFED for Linux specific only • linux-rdma@vger.kernel.org – General Linux development • Open bugs in Bugzilla (https://bugs.openfabrics.org/) • Choose OpenFabricsLinux when opening a new bug • Test old bugswith new releases and update on bugzilla • Participate in EWG bi-weekly meetings 15

  16. Open Discussion • Should we have a scalability roadmap for OFED? How do we plan for scalability? • Should we continue to include MPIs in the OFED releases? • What’s the next step on MMU notifier kernel support for MPI? • Questions and feedback from the community

  17. Scalability Challenges • Scale out to 10K-20K or more nodes • Performance • Reliability • Subnet management • Focus additional attention/resources on issues • Should we have a BOF on this? www.openfabrics.org

  18. Infrastructure Scalability/Features • Improve multicore affinity/awareness • Multicast • Flow control for SRQ • Fault tolerance • IB monitoring • Performance counters, throughput, hotspots, degraded links • Adaptive routing • HCA out-of-order delivery • Switch logic for state info & adaptive algorithm, etc. • Path Record Support • RDMA CM

  19. MPI Distribution in OFED: Rationale Open source MPI’s initially included to “bootstrap” the OFED project MPI was the main user for OFED, so this seemed like a natural pairing Made it (significantly) easier for customers to get their MPI jobs running on InfiniBand However, distros don’t like it – they package their own QA testing of MPI + OFED is still extremely valuable This is not a discussion of removing MPI + OFED QA How many use MPI from OFED, vs compiling own?

  20. MPI Distribution in OFED: Pros MPI is still the most common OFED “customer” HPC customers get network stack + MPI in one package Helps rapid MPI deployment on new clusters (out-of-box) MPI-selector function allows to select MPI stack of choice during the installation Customers get QA assurance of specific MPI + OFED version tuples Helps to test multiple functionalities of the OFED stack and IB/iWARP fabric with comprehensive suite of MPI-level benchmarks

  21. MPI Distribution in OFED: Cons MPI’s have their own QA cycles MPI+OFED QA testing is more for OFED, not MPI Bundling induces project scheduling difficulties between OFED and various MPI packages RedHat and SuSE both say “Don’t do this!” They both already include the open source MPI’s Makes it more difficult for them to take OFED drops Many users will download the latest-n-greatest MPIs anyway – not the ones included in OFED

  22. User Level MMU Notification • Userlevel registration caches need to be notified when registered memory is freed • Problem for middleware that hides memory registration • For example: MPI • Roland prototyped “ummunotify” • Jeff Sq. adapted Open MPI to use ummunotify • …but ummunotify was rejected by the kernel maintainers • KM’s want same functionality on performance counters • MPI’s still need this functionality • Roland not available; Jeff Sq. will do the MPI work • Who can do the kernel side? www.openfabrics.org

  23. Questions and feedback from the community www.openfabrics.org

  24. Thank You! www.openfabrics.org

More Related