implementing pci i o virtualization standards l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Implementing PCI I/O Virtualization Standards PowerPoint Presentation
Download Presentation
Implementing PCI I/O Virtualization Standards

Loading in 2 Seconds...

play fullscreen
1 / 46

Implementing PCI I/O Virtualization Standards - PowerPoint PPT Presentation


  • 392 Views
  • Uploaded on

Implementing PCI I/O Virtualization Standards. Mike Krause and Renato Recio PCI SIG IOV Work Group Co-chairs. Today’s Approach To IOV. Virtualization Intermediaries (VI) and hypervisors are used to safely share IO Under this approach 1 or more System Images share the PCI device through a VI

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Implementing PCI I/O Virtualization Standards' - derora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
implementing pci i o virtualization standards

Implementing PCI I/O Virtualization Standards

Mike Krause and RenatoRecio

PCI SIG IOV Work Group Co-chairs

today s approach to iov
Today’s Approach To IOV
  • Virtualization Intermediaries (VI) and hypervisors are used to safely share IO
  • Under this approach
    • 1 or more System Images share the PCI device through a VI
    • Virtualization enablers are not needed in the either the Root Complex (RC) or PCIe Device
    • The VI is involved in all IO transactions and performs all IO Virtualization Functions

VI Based PCI Device Sharing Example

SI

VI

SI

System Images share the adapter through a VI.MMIO and DMA operations go through the VI

Hypervisor

PCIe RC

Today’s PCIe Device with one or more Functions.The Device may not be cognizant at all that it is being shared

PCIe Port

PCIe Device

F

performance of today s iov
Performance Of Today’s IOV

Through-put*

Latency*

Native IOV

VI Based IOV

  • VI based IOV adds path length on every IO operation
  • Native IOV significantly improves performance
    • For the example above Native IOVdoubles throughput and reduces latency by up to half
  • Several factors are increasing the virtualization use (e.g., more cores per socket, customer simplification requirements, …)
    • Making Native IOV even more important in the future

Native IOV

VI Based IOV

*Source: Self-Virtualized I/O: High Performance, Scalable I/O Virtualization in Multi-core Systems; R. Himanshu, I. Ganev, K. Schwan - Georgia Tech and J. Xenidis - IBM

pci sig iov overview
PCI SIG IOV Overview

PCIe Single-Root IOV

PCIe Multi-Root IOV

  • PCI SIG is standardizing mechanisms that enable PCIe Devices to be directly shared, with no run-time overheads
    • Single-Root IOV – Direct sharing between SIs on a single system
    • Multi-Root IOV – Direct sharing between SIs on multiple systems
  • PCI-SIG IOV Specification covers “north-side” of the Device
    • “PCI IOV Usage Models and Implementations” session will cover examples of how PCI IOV specifications can be used tovirtualizePCIe Devices

SI

VI

SI

SI

VI

SI

SI

SI

VI

Hypervisor

Hypervisor

Hypervisor

PCIe RC

Blade

Blade

PCIe RC

PCIe RC

PCIeTopology

PCIe Port

PCIe IOV Capable Device

PCIe MR-IOV

Capable Enet Device

PCIe MR-IOV Capable FC Device

FC SAN

FC SAN

Ethernet LAN

terminology
Terminology

SI

VI

SI

SR-PCIM

VI

  • System Image (SI)
    • SW, e.g., a guest OS, to which virtual and physical devices can be assigned
  • Virtual Intermediary (VI)
    • Performs resource allocation, isolation, management and event handling
  • PCIM – PCI Manager
    • Controls configuration, management and error handling of PFs and VFs
    • May be in SW and/or Firmware.
    • May be integrated into a VI
  • Translation Agent (TA )
    • Uses ATPT to translates PCI Bus Addresses into platform addresses
  • Address Translation and Protection Table (ATPT)
    • Validates access rights of incoming PCI memory transactions.
    • Translates PCI Address into platform physical addresses

Hypervisor

Processor

Memory

ATPT

TA

PCIeRootComplex

PCIe Port

PCIe Port

PCIe

Switch

PCIe Port

PCIe Port

PCIe

Device

PCIe

Device

F

F

terminology continued
Terminology… Continued

PCIe Single-Root IOV

PCIe Multi-Root IOV

  • Single-Root IOV (SR-IOV) - A PCIe hierarchy with one or more components that support the SR-IOV Capability
  • Multi-Root Aware (MRA) - A PCIe component that supports the MR-IOV capability
  • Multi-Root IOV (MR-IOV) - A PCIe Topology containing one or more PCIe Virtual Hierarchies
    • Virtual Hierarchy (VH) - A portion of an MR Topology assigned to a PCIe hierarchy, where each VH has its own PCI Memory, IO, and Configuration Space

SI

VI

SI

SI

VI

SI

SI

SI

VI

Hypervisor

Hypervisor

Hypervisor

PCIe RC

Blade

Blade

PCIe RC

PCIe RC

PCIeFabric

PCIe Port

PCIe IOV Capable Device

PCIe MRADevice

PCIeDevice

PCIe SR-IOV

MRA Device

terminology continued7

PF0

PF0

VF1

VF1

VF2

VF2

:

:

VFN

VFN

Terminology… Continued

PCIe Port

PF0

  • Address Translation Cache (ATC)
    • Cache of recent translations
  • Physical Function (PF)
    • Function that supports SR-IOV
    • Used to manage VFs
  • Virtual Function (VF)
    • Function that supports SR-IOV and shares resources with the PF it associated with
  • Base Function (BF)
    • Function that supports MR-IOV
    • Used to manage PFs and VFs on MRA Devices

VF1

ATC1

Internal Routing

Resources1

VF2

ATC2

Resources2

:

VFN

  • PCIe SR-IOV Capable Device

ATCN

ResourcesN

PCIe Port

VH0

BF0

VH1

VHN

Internal Routing

….

  • PCIe MR-IOV Capable Device
pci iov related mechanisms

PCI IOV Related Mechanisms

Function Level Reset (FLR)

Alternative Routing ID Interpretation(ARI)

Address Translation Services

Single-Root IO Virtualization

Multi-Root IO Virtualization

flr and ari

PCIe Topology

FLR And ARI

VI

SI

SI

SI

FLRs

  • FLR - Provides Function level granularity on resets
    • All software readable state must be cleared by an FLR
    • All outstanding transactions associated with the Function referenced by the FLR must be completed when the FLR is returned as completed
  • ARI - Extends Function number field from 3 to 8 bits
    • Allows up to 256 Functions or VFsper PCIe Device

Hypervisor

PCIe Root Complex

ATPT

PCIe Port

PCIe Port

PCIe DeviceFC HBA

PCIe

Device

Function1

Function2

Routing

Routing

ConfigMgt

VF1

PF0

VF2

VF3

Function3

FC Port

FC Port

FC SAN

FC SAN

address translation services

PCIe Topology

Address Translation Services

VI

SI

SI

SI

  • ATS is used to cache PCIe Memory Address translations in a PCI Device.
  • Consists of three new PCIe transactions
    • Request Translation Transaction – Used by a PCIe Device to request a translation of an untranslated address
    • Translated DMA Transaction – Used to perform a DMA that references a translated address
    • Invalidate Translation Transaction – Used to invalidatea previously exposed translated address

PCIe Device with Address Translation Services

Hypervisor

PCIe Root Complex

ATPT

PCIe Port

PCIe Device

Routing

PF0

VF1

VF2

VF3

ATC2

ATC3

ATC1

DownstreamPort

today s pci device
Today’s PCI Device

PCIeSR-IOVCapable

Device

PCIe Port

  • Function 0 is required
  • Overview of Function Attributes
    • Each Function has a its own configuration and PCIe memory address space
    • Up to 8 PCI Functions with unique configuration space / BAR / etc.
      • ARI Capability enables up to 256 Functions to be supported
    • Support INTx, MSI, MSI-X or combination of MSI and MSI-X
    • Function dependencies through vendor specific mechanisms
    • Cannot be directly shared by SIs
    • Vendor specific mechanisms to associateFunctions to “South-side” resources

F0

DownstreamPort

F1

ATC1

Internal Routing

Internal Routing

Resources1

F2

ATC2

Resources2

:

FN

ATCN

ResourcesN

sr iov device overview

Function 0 is required

Overview of VF attributes

Each VF has a its own configuration and PCIe memory address space

VFs share a contiguous PCIe memory address space

Up to ~216 Virtual Functions

ARI enables up to 256

IOV enables additional Bus Numbers to be associated

Function dependencies defined through standard mechanism

Support MSI and MSI-X

Can be directly shared by SIs

Vendor specific mechanisms to associate Functions to “South-side” resources

SR-IOV Device Overview

PCIeSR-IOVCapable

Device

PCIe Port

PF0

DownstreamPort

VF1

ATC1

Internal Routing

Internal Routing

Resources1

VF2

ATC2

Resources2

:

VFN

ATCN

ResourcesN

sr iov device discovery

Each PF must have an SR-IOV Extended Capability structure

A Device may have a mix of Functions and PFs

Each Function and PF consumes one Routing Identifier (RID)

For a Device that

isn’t ARI capable, Functions and PFs must be in the first 8 functions;

is ARI capable, number of Functions and PFs supported is as defined in the base specification

SR-IOV Device Discovery

SRPCIM

SI

A

SIB

PCI Root

PCIe Topology

PF0

PF1

F2

PF3

vf discovery part 1
VF Discovery - Part 1

SRPCIM

SI

A

SIB

  • The TotalVFs field is used to discover the number of Active VFs that a PF could have.
  • The InitialVFs field is used to discover the number of Active VFs that a PF initially has.
  • Note for a Device that isn’t MR capable*: 1 ≤ InitialVFs = TotalVFs

PCI Root

PCIe Topology

PF0

PF1

F2

PF3

bar discovery
BAR Discovery

SRPCIM

SI

A

SIB

  • Each PF has its own independent set of BARs in its standard configuration space and an MSE bit for those BARs
  • The VFs share a BAR set and have an MSE bit that controls the memory space of all the VFs
    • The BAR set that is shared by all the VFs resides in the PF’s SR-IOV capabilities

PCI Root

PCIe Topology

PF0

PF1

F2

PF3

supported page sizes
Supported Page Sizes

SRPCIM

SI

A

SIB

  • Supported Page Sizes is used to discover the page sizes supported by the VFs associated with the PF
    • When this field is read, the PF must return the page sizes it can support
    • This field will be used during the IOV configuration phase to align VF BAR apertures on system page boundaries (more later)

PCI Root

PCIe Topology

PF0

PF1

F2

PF3

configuring vfs
Configuring VFs

SRPCIM

SI

A

SIB

  • VFs for a PF are enabled by writing NumVFs and then Setting the “VF Enable” bit
    • When VF’s are enabled, the PCIe Device must associate NumVFsworth of VFs with the PF
      • If VF Migration Capable and VF Migration Enabled set, then NumVFs must be 1 ≤ NumVFs ≤ TotalVFs
      • Otherwise NumVFs must be 1 ≤ NumVFs ≤ InitialVFs

PCI Root

PCIe Topology

PF0

PF1

F2

PF3

VF Migration Capable

VF Migration Enable

VF Enable

configuring the vfs bars
Configuring The VFs’ BARs

SRPCIM

SI

A

SIB

  • System Page Size defines the page size the system will use
    • Value of System Page Size must be one of the Supported Page Sizes
    • System Page Size is used by the PF to align the MMIO aperture defined by each BAR to a system page boundary
  • VF BARs behave as in PCI 3.0 Spec’s PCI BARs, except that a VF BAR describes the aperture for multiple VFs (see next page)

PCI Root

PCIe Topology

PF0

PF1

F2

PF3

pf and vf bar semantics
PF And VF BAR Semantics
  • The memory aperture required for each VF BAR can be determined by writing all “1”s and then reading the VF BAR
    • Behaves as in the base PCI Spec
  • The address written into each VF BAR is used by the Device to set the starting address for that BAR on the first VF
  • The differences between VF BARs and PCI 3.0 Spec BARs are
    • For each VF BAR, the memory space associated with the 2nd and higher VFs is derived from the starting address of the first VF and the memory space aperture
    • The VF BAR’s MMIO space is not enabled until VF Enable and VF MSE have been set

BAR1 VF N SA = BAR1 VF 1 SA + N x (BAR1 VF MA) - 1

where BAR1 VF 1 SA is the address written into VF BAR1and MA is the memory aperture of VF BAR1.

vf discovery part 2
VF Discovery – Part 2

SRPCIM

SI

A

SIB

  • The values in NumVFs and VF ARI Enable affect the values for First VF Offset and VF Stride*
    • First VF Offset and VF Stride are defined by the Device
    • Following is the equation for determining the RID of VF N:

VF N = [PF RID + VF Offset + (N-1) * VF Stride] Modulo 216

where all arithmetic used is 16 bit unsigned dropping all carries

PCI Root

PCIe Topology

PF0

PF1

F2

*Note: After setting NumVFs and VF ARI Enable, SR-PCIM can read these two fields to determine how many busses will be consumed bt the PF’sVFs

PF3

pf and vf bar semantics22
PF And VF BAR Semantics

PCIe RID Number

  • PF and VF RIDs must not overlap given any valid NumVFs setting across all PFs of a Device
  • As in base, an SR-IOV Device captures Bus # from any Type 0 configrequest
    • VFs may reside on a different bus number as associated PF
  • If the switch above the Device
    • supports ARI Forwarding bit, RIDs are interpreted as 8 bit Bus # and 8 bit Device #
    • doesn’t supports ARI, RIDs are interpreted as BDF#
  • Bus # can be used to assign VFs,but PFs must be on 1stBus assigned to Device

PF 0 RID = 0200

PF 1 RID = 0201

reset mechanisms
Reset Mechanisms
  • Three reset mechanisms are supported
    • Conventional Reset – Resets all PF and VF state
    • FLR that targets a VF – Resets a single VF
    • FLR that targets a PF – Resets a PF and its associated VFs
conventional reset
Conventional Reset
  • A Fundamental or Hot Reset to an SR-IOV Device shall cause all Functions, PFs, and VFs context to be reset to their original, power-on state
  • If a PF has its VFs enabled and a Fundamental or Hot Reset is issued to the Device, the Device must reset all PF and VF state, eg:
    • The PF must disable its SR-IOV capabilities andreverts back to being a PCI Function
    • Settable SR-IOV capabilities (e.g., NumVFs) are reset to default values
    • MSE and BME are both off
    • All BAR values used by the PF and VFs are indeterminate.
    • All interrupts are disabled
flrs to vfs and pfs
FLRs To VFs And PFs
  • An FLR that targets a VF must be supported
    • Software may use FLR to reset a VF
    • An FLR that targets a VF must
      • Not affect the VFs existence (e.g. it still consumes a RID)
      • Not affect any address assigned to it. That is, the VF’s BAR registers and MSE are unaffected by FLR
  • An FLR that targets a PF must be supported
    • Software may use an FLR to reset a PF
    • An FLR that targets a PF must
      • Reset the PF’s SR-IOV Extended Capabilities
      • The VFs are no longer allocated or enabled
today s pcie topology
Today’s PCIe Topology
  • Strict Tree
    • Root Ports Connect to Switches
    • Switches Connect to Devices
    • Roots Ports can also connect to Devices
    • Switches can also connect to more Switches
  • Single Software Management Entity
    • Runs above the Root
  • Implicit Message Routing
    • Route to Root
    • Broadcast from Root
single switch mr topology

FC SAN

FC SAN

Ethernet LAN

Ethernet LAN

Single Switch MR Topology

Root A

Root B

Root C

Root D

SI

MR Switch

MR Device

MR Device

PCIe Device

MR Device

two switch mr topology

FC SAN

FC SAN

Ethernet LAN

Ethernet LAN

Two Switch MR Topology

Root A

Root B

Root C

Root D

Left

Right

MR Device

MR Device

PCIe Device

MR Device

tlp labeling

LCRC

STP

Sequence #

TLP Prefix

ECRC

PCIe TLP Header

TLP Data (optional)

ECRC (optional)

LCRC

END

TLP Labeling
  • TLP Prefix on MR Links
    • After Seq #, before TLP Hdr
    • Sent / Resent with TLP
  • Prefix Contains
    • VH Number (link local)
    • VL Number (flow control)
    • Global Key (error checking)
  • Added at MR Ingress
    • First MR Component seen
  • Dropped at MR Egress
    • Last MR Component seen
sr iov device overview31
SR-IOV Device Overview

MR-IOVCapable

Device

PCIe Port

PF0

PF0VF1

:

DownstreamPort

Internal Routing

Internal Routing

PF0VFN

mr iov device overview
MR-IOV Device Overview
  • Each Virtual Hierarchy (VH) has a full PCIe address space
    • Configuration, Memory and I/O Space
  • Up to 28 Virtual Hierarchies
  • VH0 is used to Manage MR-IOV features of the Device
    • BF in VH0 manages associated PFs and VFs
    • Other Functions optional in VH0
  • VH1 to VHmax have same PFs at the same Function #s
    • Each PF in each VH has it’s own values for InitialVFs, TotalVFs, NumVFs, VF Stride and VF Offset

MR-IOVCapable

Device

PCIe Port

VH1 PF0

VH1 PF0VF1

:

DownstreamPort

Internal Routing

Internal Routing

VH1 PF0 VFN

VHK PF0

VH0 BF0

VHK PF0 VF1

:

VHK PF0 VFM

mr iov device configuration
MR-IOV Device Configuration
  • Initially, only VH0 is enabled
  • Within VH0, MR-PCIM enumerates Config Space
    • Locate all BFs by looking for MR-IOV Capability

For each Device located…

  • In BF0
    • Determine MaxVH, set NumVH
    • Enable additional VLs (if available)
    • Configure VL arbitration (optional)
    • Configure per VH mapping of VCID to VL
    • Configure per VH Global Keys
  • In every BF
    • In each VH, provision VC Resources (optional)
    • If hardware present, configure Initial VF mapping
schedules
Schedules
  • ATS Specification released 3/8/2007
  • SR-IOV Specification
    • Following PCI SIG Specification Process
    • Draft 0.7 Completed
    • Draft 0.9 May, 2007
    • Version 1.0 early 3Q/2007
  • MR-IOV Specification
    • Following PCI SIG Specification Process.
    • Draft 0.5 Completed
    • Draft 0.7 2Q/2007
    • Draft 0.9 late 2Q/2007
    • Version 1.0 late 3Q/2007
call to action
Call To Action
  • Please participate in the PCI-SIG Specification Development Process
  • For more information please go to www.pcisig.com
  • Thank you for attending
workgroup members
Workgroup Members
  • AMD
  • Broadcom
  • Emulex
  • HP
  • IBM
  • IDT
  • Intel
  • LSI Logic
  • Microsoft
  • NextIO
  • Neterion
  • Nvidia
  • PLX
  • Qlogic
  • Stargen
  • Sun
  • VMWare
mr topics

MR Topics

Back-ups

switch tlp routing
Switch TLP Routing
  • Three step routing decision:
    • MR Input Map
      • { Input Port, Input VH# }  { VS, VS Input Port }
      • MR-PCIM Manages
    • VS PCIe Map
      • { VS Input Port, TLP Hdr }  { VS Output Port }
      • VH Software Manages using PCIe rules
    • MR Output Map
      • { VS, VS Output Port }  { Output Port, Output VH# }
      • MR-PCIM Manages
hot reset
Hot Reset
  • PCIe Hot Reset is broadcast downstream
    • Switch Upstream port  all Downstream ports
    • Uses Phy Training Sequence
    • Handshake to detect link partner received
  • MR Reset is per VH
    • VS Upstream port  all VS Downstream ports
    • Uses Reset DLLP
    • Handshake to detect link partner received
    • Link stays up, other VHs unaffected
flow control
Flow Control
  • PCIe uses Virtual Channels (VC) for QoS, traffic isolation, etc.
  • MR Extends this to Virtual Links (VL)
  • MR adds per VH, VL flow control
  • Traffic Gate
    • PCIe: sufficient VC Credits to send TLP
    • MR: sufficient VL Credits to send TLPand sufficient VH, VL Credits to send TLP
management authorization
Management / Authorization
  • Devices are managed using VH0
  • Switches are managed using the Upstream port of any Authorized VS
    • Multiple VS may be simultaneously authorized
      • Supports MR-PCIM failover
    • At failover, new MR-PCIM remaps VH0 so it can mange MR Devices
virtual hot plug
Virtual Hot Plug
  • Virtual Hot Plug is implemented in each Downstream Port of each VS
    • Software in VH sees PCIe Slot Control Regs
  • MR-PCIM controls virtual “buttons & lights”
    • e.g. pushing the virtual Attention Button, detecting virtual slot power state, …
  • Physical Hot-Plug is implemented in each Physical Port
    • Managed by MR-PCIM
    • Optional (same as PCIe)
power management
Power Management
  • Each PF has a power state
    • if all PFs in low power state, link can go to low power state (like PCIe multi-function rules)
  • PM_PME Messages sometimes trigger WAKE#
    • e.g., some Root powered itself off but a shared device below it was only virtually powered off (Device still being used by other VHs)
slide46

© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.