EMC ISILON vSphere 5 Best Practices James Walkenhorst Solutions Architect Alliances and Solutions Marketing EMC Isilon Division Cormac Hogan Technical Marketing Manager - Storage Cloud Infrastructure VMware
Agenda • “Best Practices” FAQ • General Datastore Guidelines and Best Practices • Overview of NFS Datastores on Isilon • Isilon Best Practices for NFS Datastores • VMware Best Practices for NFS Datastores • Overview of iSCSI Datastores on Isilon • Isilon Best Practices for iSCSI Datastores • VMware Best Practices for iSCSI Datastores • VMware Best Practices for Optimal Network and Storage Access • Resources and Links • Q&A
EMC-Isilon Storage Best-Practices FAQs • What is a Best Practice? A. An approach, validated through specific testing scenarios or observed frequently in customers’ environments, that produces an optimal outcome to a particular technical challenge. • Are Best Practices the same as Standards? • No. They simply represent an approach that is widely understood to produce the best possible outcome. • What are the benefits of observing Best Practices? • Reduced risk, faster root-cause analysis of issues, faster response times from support organizations, and the best chance of achieving optimal performance.
General Best PracticesFor All Isilon-Based Datastores For optimal datastore performance and availability: • Network segmentation (e.g. VLANs) to separate VM network traffic from VMkernel storage traffic • Best practice for optimal performance • For optimal security, use an isolated (or trusted) network for all storage traffic • Test Jumbo frame (MTU=9000) performance in your environment • Fully supported by both VMware and EMC • Overall performance results depend on multiple variables • Use whichever configuration produces best overall performance • Use 10Gb/s Ethernet connections if possible for best performance • Use vSphere Storage I/O Control to manage VM storage utilization • Use Network I/O control to manage network bandwidth for storage traffic under heavy workloads
General Best PracticesFor All Isilon-Based Datastores For optimal datastore performance and availability (continued): • Size your storage cluster first for performance, and then for capacity • Minimize the number of network hops between vSphere hosts and Isilon storage: • EMC recommends using the same subnet • Use the same switch, if possible • Ensure redundant network links exist between vSphere hosts and Isilon nodes for all datastores • HA path configuration and administration differs for each datastore type • Different workloads may require different storage configuration settings • Higher data protection levels vs. higher performance requirements • Analyze workload patterns for each application, if possible
CDB CDB Data Data Jumbo Frames • Jumbo frames were not supported for NAS and iSCSI traffic on ESX 3.x. They were limited to data networking only (Virtual Machines and the VMotion network). • Jumbo frames are fully supportedfor NFS and iSCSI traffic on ESX 4 & 5. BHS • What do they do? • Jumbo frames allows for multiple PDUs to be combined into a single frame to improve throughput. CDB Data BHS CDB Data
NFS Datastore Overview • VM data stored on file-based NFS mount, accessed using standard NFS v3.0 protocol • Datastore is automatically thin-provisioned • Advantages of NFS datastores: • Rapid, simple storage provisioning • No individual LUNs to manage • Datastore is immediately available upon creation • Multiple exports to the same mount point, using multiple interfaces, increase throughput and performance
NFS Datastore Overview • VM data stored on file-based NFS mount, accessed using standard NFS v3.0 protocol • Datastore is automatically thin-provisioned • Advantages of NFS datastores: • Rapid, simple storage provisioning • Higher storage utilization rates • File system space not restricted to limitations of a single LUN • Larger storage pool for VMDK files to share
NFS Datastore Overview • VM data stored on file-based NFS mount, accessed using standard NFS v3.0 protocol • Datastore is automatically thin-provisioned • Advantages of NFS datastores: • Rapid, simple storage provisioning • Higher storage utilization rates • Simplified management • No need to balance space usage across LUNs • VMs can be balanced across datastores based solely on bandwidth usage
NFS Best PracticesOptimal Configuration for High Availability • Network Redundancy Options • Static Link Aggregation using 802.3ad LAG • Requires compatible switch and NIC hardware • Protects against NIC/path failures • Does not increase performance • SmartConnect Dynamic IP Address Pools • Automatically assigns IP addresses to member interfaces on each node • Interface or node failure causes SmartConnect to reassign IP address(es) to remaining nodes in the cluster • Datastore mapping can be IP-addressed based, or use DNS round-robin
NFS Best PracticesOptimal Configuration for Performance • Throughput limits of a single datastore • Two TCP connections per datastore • One connection for NFS data flow • One connection for NFS control information • Datastore throughput is dependent on the available bandwidth
NFS Best PracticesOptimal Configuration for Performance (continued) • Creating multiple datastores increases throughput • Best design uses mesh topology • Every vSphere host connects to every datastore • VMs can be created on any datastore to balance the I/O workload between hosts and cluster nodes
NFS Configuration Gotcha #1 • ESXi supports NFS, but more specifically: • NFS version 3 only, no support for v2 or v4. • Over TCP only, no support for UDP. • The UI and ESXi logs will inform you if you attempt to use a version or protocol other than version 3 over TCP: NasVsi: 107: Command: (mount) Server: (madpat) IP: (10.16.156.25) Path: (/cormac) Label: (demo) Options: (None) WARNING: NFS: 1007: Server (10.16.156.25) does not support NFS Program (100003) Version (3) Protocol (TCP)
NFS Configuration Gotcha #2 • Ensure that the admin who is mounting the NFS datastore on the ESXi host has appropriate permissions to do so. • If an admin attempts to mount a datastore without the correct permissions, the mount may be successful, but the first attempt to deploy a VM will fail as follows:
Increasing Maximum Number of NFS mounts • Default configuration only allows for 8 NFS mounts per ESXi Server. • To enable more, start the vSphere Client, select the host from the inventory, and click Advanced Settings on the Configuration tab. • In the Advanced Settings dialog box, Net.TcplpHeapSize needs to be adjusted if NFS.MaxVolumesis increased or you may deplete heap. Symptoms from running out of heap are documented here: http://kb.vmware.com/kb/1007332
NIC Teaming – Failover, not Load Balancing • There is only one active connection between the ESXi server and a single storage target (mount point). • This means that although there may be alternate connections available for failover, the bandwidth for a single datastore and the underlying storage is limited to what a single connection can provide. • To leverage more available bandwidth, there must be multiple connections from the ESXi server to the storage targets. • One would need to configure multiple datastores, with each datastore using separate connections between the server and the storage, i.e. NFS shares presented on different IP addresses.
VLANs for Isolation & Security of NFS Traffic • Storage traffic is transmitted as clear text across the LAN. • Since ESXi 5.0 continues to use NFS v3, there is no built-in encryption mechanism for the traffic. • A best practice would be to use trusted networks for NFS. • This may possibly entail using separate physical switches or leverage a private VLAN.
iSCSI Datastore Overview • iSCSI LUNs are constructed and treated as files within OneFS • Mounted over Ethernet network using iSCSI Initiators • EMC supports both thin and thick provisioning • Advantages of iSCSI datastores: • Raw-device mapping supported for VMs that require it • May provide better throughput performance for some workload types • iSCSI LUNs can be cloned for certain VM management scenarios
iSCSI Best PracticesOptimal Configuration for High Availability • Network Redundancy Options • Leverage vSphere multipath plug-ins instead of LAG or SmartConnect Advanced • Use dedicated IP pool for iSCSI target IP management and connectivity • Enables segmentation of traffic between iSCSI and NFS workloads across the cluster
iSCSI Best PracticesOptimal Configuration for High Availability (continued) • Network Redundancy Options • Create multiple VMkernel ports on vSphere hosts, with a single active network interface, then use port binding to associate those groups with the iSCSI initiator* • Set Path Selection Policy (PSP) to Fixed, and configure all hosts to use the same preferred path for each datastore • *Requires storage nodes and vSphere hosts to be on the same subnet
iSCSI Best PracticesOptimal Configuration for Performance • Highly randomized workloads within a large LUN may benefit from setting the LUN’s access pattern to Streaming within OneFS • Consider using 2x mirroring protection to minimize parity calculation overhead on iSCSI write operations • If multiple storage pools are used, create iSCSI LUNs on each pool and assign VM data to tiered pools based on each VM’s performance requirements
iSCSI Gotcha #1 - Routing • iSCSI traffic can be routed between an initiator and target only when iSCSI binding is not implemented. • If iSCSI binding is implemented, then you cannot route between an initiator and target; they must be on the same subnet. • This has been the cause of many Service Requests.
iSCSI Gotcha #2 – Slow Boot on ESXi 5.0 • An issue was uncovered soon after ESXi 5.0 was released. • For hosts that used iSCSI, but where some initiators could not see all configured targets, the slow boot time was observed. • This was due to all initiators trying to login to every target, and retrying multiple times when it failed to do so. • The symptoms of the slow boot, and the patch resolution are described in this KB article – http://kb.vmware.com/kb/2007108
Multipathing - Overview • Pluggable Storage Architecture - PSA • Native Multipathing Plugin – NMP • Storage Array Type Plugin – SATP • Path Selection Policy – PSP (follow vendor recommendations) • Some vendors provide their own plugin to the PSA, e.g. EMC’s PowerPath. Third-Party PSP
iSCSI Multipathing – Best Practices • iSCSI binding ties a VMkernel port to a physical adapter and allows the PSA to implement multipathing on VMware’s Software iSCSI Adapter. • If you use iSCSI binding, then you cannot route between initiator and target; they must co-exist on the same subnet. • Do not create a NIC team when implementing iSCSI binding. • We want customers to consider storage resiliency based on multiple paths to the storage, rather than basing it on the number of networks available to a single storage path.
Gotcha – Improper Device Removal • Improper removal of a physical device containing a VMFS or RDM could result in an APD (All Paths Dead) state. • Improvements have been made to ESX 4.x & 5.0. • Follow steps outlined in http://kb.vmware.com/kb/1015084 for ESX 4.x & http://kb.vmware.com/kb/2004605 for ESXi 5.0 when removing a datastore. RDM VMFS VMFS
Storage I/O Control - Overview • Storage I/O Control is supported with both block & NFS datastores • Monitors I/O latency to both block (iSCSI, FC, FCoE) datastores & NFS datastores at each ESXi host sharing a physical device. • When the average normalized latency exceeds a set threshold (30ms by default), the datastore is considered to be congested. • If congested, SIOC distributes available storage resources to virtual machines in proportion to their configured shares. • Used to determine migration needs with Storage DRS in ESXi 5.0 • Troubleshooting Storage I/O Control: http://kb.vmware.com/kb/1022091
Storage I/O Control Usage Scenario What you see What you want to see MicrosoftExchange MicrosoftExchange online store online store data mining data mining NFS / VMFS Datastore NFS / VMFS Datastore
Network I/O Control - Overview • With converged networks, network traffic with different patterns and needs will merge together on the same network. • This may directly impact performance and predictability due to lack of isolation, scheduling & arbitration. • Network I/O Control can be used to prioritize different network traffic on the same pipe. • Network I/O Control also introduces the concept of user defined network groups. • This is quite useful for Virtual Machines, whereby an administrator can select which VM or group of VMs has the higher priority on the network. • Network I/O Control can be used on both NFS & iSCSI.
Network I/O Control – Configuration UI NFS traffic can be given a higher priority than other traffic if contention arises
Resources Links and Contact information • Everything VMware at EMC Community: http://emc.com/vmwarecommunity • VMware Storage blog: http://blogs.vmware.com/vSphere/Storage/ • EMC’s Best Practices for vSphere5 on Isilon storage white paper : http://www.isilon.com/file-handler/1729/library-emc-isilon-storage-and-vmware-vsphere-5.pdf • Additional questions email@example.com Thank you for attending!