1 / 12

Heterogeneous Compute Platforms : Data management

Heterogeneous Compute Platforms : Data management. Dan Tsafrir May 2013, ICRI-CI Retreat . Data sharing – the problem. Sharing data between heterogeneous devices Oftentimes c umbersome & device-specific In OS, apps, or both Programmers need to address questions like

don
Download Presentation

Heterogeneous Compute Platforms : Data management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heterogeneous Compute Platforms:Data management Dan Tsafrir May 2013, ICRI-CI Retreat Data Sharing

  2. Data sharing – the problem • Sharing data between heterogeneous devices • Oftentimes cumbersome & device-specific • In OS, apps, or both • Programmers need to address questions like • Can the device work directly on app memory? Or must it have its own copy of the data? • Can the device deal with app virtual addresses?Or must the memory be mapped in some other way? • Should the memory be pinnedbefore passing it to the device? Or can the device withstand I/O page faultsThereby allowingmemory overcommitment? Data Sharing

  3. Data sharing – goal • Big goal • Data sharing between heterogeneous PEs should "just work” • HW/SW interfaces should allow to keep app programmers mostly ignorant of details • Need to develop interfaces & runtime layer that • Abstract away details of each device, • Present to apps a simplified, efficient programming model • Concrete goal • Focusing on MMU and IOMMU Data Sharing

  4. Unifying MMU and IOMMU spaces Ilya Lesokhin Muli Ben-Yehuda Assaf Schuster Dan Tsafrir Data Sharing

  5. IOMMU in a nutshell • IOMMU vs. MMU • IOMMU serves I/O devices that perform DMAs • Like MMU serves processes that access virtual memory • But • No I/O page faults (IOPFs) • If memory isn’t there => crash Data Sharing

  6. No IOPFs – consequences • IOMMU management crippled compared to MMU • Virtual-memory must be pre-allocated & pinned to physical-memory • Can’t do memory overcommitment • Consider a set of uncooperative VMs with assigned NICs (SR-IOV) • Must pin their entire memory images! • Kernel’s MMU & IOMMU management subsystems • Developed separately & used differently • Causes numerous headaches and performance penalties • E.g., can’t use apps virtual memory space to do I/O • Thus, to be able to unify (and get rid of above drawbacks) • Must have IOPFs Data Sharing

  7. IOPFs support – current state of affairs • Recently defined industry spec for supporting IOPFs: • In “PRI” (Page Request Interface) • Part of the PCI-SIG ATS (Address Translation Services) specification • Bleeding edge I/O devices do (experimentally) support IOPFs • We are working on such experimental NICs Data Sharing

  8. Research • Status • Have a working environment • Handling send-IOPFs (currently NIC drops receive-IOPFs) • Measured IOPF handling (breakdown to HW and SW components) • Next steps • Attempt to reduce overhead • Develop a strategy to handle receive-IOPFs (10 Gb/sec => 1.25 MB/ms) • Characterizing IOPFs • How often? Performance penalty? Dropped packets? • Show I/O memory space overcommitment is possible & advantageous • Longer term • Unify process & I/O address spaces • Processes use their VA buffers, I/O subsystem works directly on them • Does the PRI spec make sense? Optimal? Could be improved? How? Data Sharing

  9. Rethink the IOMMU Moshe Malka Nadav Amit Dan Tsafrir Data Sharing

  10. IOMMU architected similarly to MMU |------------------------------------------- virtual address ------------------------------------| • Has IOTLB • Upon IOTLB miss, => HW walks the table CR3 Data Sharing

  11. Does this make sense? • We submit that it does not… • Specifically, it seems that • Since NICs work with rings, IOTLBaccesses are completely predictable(more important than TLB becausepage-tables are un-cached) • Since NICs map each DMA descriptorjust before using it, and un-maps itjust after, no needfor a page-tablehierarchy • Performance can begreatly improved ifredesigning the IOMMUto take advantage of the above Data Sharing

  12. Research • Status • Working hard towards proving all claims from previous slide • Environment: KVM/QEMU setup (10Gb/s NICs) logs all IOMMU accesses • Future • Not just NICs (have reason to believe other I/O devices too) • Reducing overheads for virtualization (vIOMMU) • What would be the impact of unifying I/O and process spaces? (previous project) Data Sharing

More Related