1 / 40

Effective Strategies for SAN Performance Monitoring

Effective Strategies for SAN Performance Monitoring. with PerformanceVSN. NTSMF User’s Group - CMG. David Signori Product Marketing Manager, Software Solutions INRANGE Technologies Corporation 12/9/02. Current Challenges in Storage Networking Administration.

binta
Download Presentation

Effective Strategies for SAN Performance Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Strategies for SAN Performance Monitoring with PerformanceVSN NTSMF User’s Group - CMG David Signori Product Marketing Manager, Software Solutions INRANGE Technologies Corporation 12/9/02

  2. Current Challenges in Storage Networking Administration • Planning network requirements for Business Continuance applications. • Planning network requirements for the ever-increasing size and complexity of the storage environment. • Lowering management cost while increasing storage networking performance • Implementing a Service Provider model consisting of charge back, reporting, and service level agreements to end users. • Eliminating finger pointing with Server, Network, and Database administration groups. • Managing heterogeneous environments. • Decreasing or eliminating downtime. Ultimately, how do I increase and guarantee performance while lowering cost?

  3. Storage Networking Performance Monitoring Solution Requirements: • Session Layer Traffic Flow Monitoring • External to the Storage Networking Equipment • Standards-based Management, Collection, and Reporting Interfaces • Simple Plug-and-Play Configuration and Operation • Persistence: Permanent Records of Traffic Behavior • Flexible Reporting Capabilities • Policy Monitoring and Alerting • Enhance Storage Network Security • Scalable A Comprehensive Storage Networking Performance Monitoring Solution will increase performance and lower cost.

  4. What is PerformanceVSN?Product Overview • Definition • INRANGE Storage Networking Performance Monitoring Solution for Capacity Planning and Service Level Management. • Components • PerformanceVSN Server (Appliance) • PerformanceVSN Server Software • Optional PerformanceVSN Probe • Base Functionality • PerformanceVSN Server + Server Software • Port-level statistics collection both real-time and historical • Statistics gathered from INRANGE Directors & switches • Advanced Functionality • PerformanceVSN Server + Server Software + Probe(s) • Session-level statistics collection both real-time and historical • Statistics gathered from INRANGE Directors & switches + Probe(s) PerformanceVSN Server PerformanceVSN Probe

  5. Performance Monitoring Requirements Session Layer Traffic Flow Monitoring LUNs 1..n RAID_A ISL LUNs 1..n Server_A Server_B Server_C RAID_B Server_D Server_E Server_F LUNs 1..n Server_G Session statistics: Total ISL utilization: 60% Server_A to RAID_B util: 35% Server_A to RAID_B / Lun 3 util: 10% Server_A to RAID_B / Lun 9 util: 15% Server_A to RAID_B / Lun 5 util: 10% Server_B to RAID_C util: 25% Server_B to RAID_C / Lun 2 util: 22% Server_B to RAID_C / Lun 7 util: 3% RAID_C Port statistics: ISL at 60% utilization Port vs. Session Layer Statistics

  6. Performance Monitoring Requirements FICON Layer 2 Session Layer Traffic Flow Monitoring Server_A FICON_Storage_A Channel_A1 CU_A1 Channel_A2 CU_A2 Server_B Channel_B1 FICON_Storage_B CU_B1 Channel_B2 CU_B2 Channel_C1 Server_C Channel_C2 Session statistics: Total CU_B2 utilization: 60% Channel_A1 to CU_B2 util: 35% Channel_B2 to CU_B2 util: 20% Channel_C1 to CU_B2 util: 5% Port statistics: CU_B2 60% utilization

  7. FICON Cascading – High Integrity Fabric Server_A FICON_Storage_A Channel_A1 CU_A1 Channel_A2 CU_A2 Server_B Channel_B1 FICON_Storage_B CU_B1 Channel_B2 CU_B2 Channel_C1 Server_C FICON_Storage_C Channel_C2 CU_C1 CU_C2 Channel_D1 Server_D Channel_D2 Session statistics: Total ISL utilization: 60% Channel_D1 to CU_B2 util: 35% Channel_A2 to CU_C1 util: 20% Channel_C1 to CU_C2 util: 5% Port statistics: ISL 60% utilization

  8. Performance Monitoring CUADD_B2B CUADD_B2A LPAR_A1B LPAR_A1A FICON ULP Session Layer Traffic Flow Monitoring Server_A FICON_Storage_A Channel_A1 CU_A1 Channel_A2 CU_A2 Server_B Channel_B1 FICON_Storage_B CU_B1 Device_B2A1 Channel_B2 Device_B2A2 CU_B2 Device_B2B1 Device_B2B2 Channel_C1 Server_C Device_B2B3 Channel_C2 Session statistics: Total CU_B2 utilization: 60% Channel_A1 to CU_B2 util: 35% Channel_A1 to CUADD_B2B util: 20% Channel_A1 to Device _B2B1 util: 15% Channel_A1 to Device_B2B3 util: 5% Channel_A1 to CUADD_B2A util: 15% Channel_A1 to Device_B2A1 util 10% Channel_A1 to Device_B2A2 util 5% Channel_B2 to CU_B2 util: 20% Channel_C1 to CU_B2 util: 5% Port statistics: CU_B2 60% utilization

  9. Session Layer ReportingExamples • Real-time Summary of the Selected LUNs in SCSI Read Mbytes/Sec being currently accessed by all hosts. • Note that this is a system wide report across all servers on the network.

  10. Session Layer ReportingExamples • Real-time Summary of the Top 5 LUNs in Total Mbytes/Sec being currently accessed by Host “Server_A”. • Note LUNs 9, 5, 7, 6, and 8 on storage device “RAID_A”

  11. Session Layer ReportingExamples • Real-time Summary of the Top 5 LUNs in Read Duration for Host “Server_A”. • Note that this is a measure of latency and is reporting on the 5 LUNs in which latency is a maximum the network.

  12. Session Layer Reporting Examples • Trend of SCSI Exchanges/Sec between host “Server_A” and storage device “RAID_A” for the past 2 hours.

  13. Performance Monitoring Requirements External to Storage Networking Devices • Resources in network devices should be dedicated to the distribution and handling of incoming and outgoing data streams. • Many potential problems at the framing and upper layers are not reported. • Although external, probe should be non-intrusive Servers Servers Metro Disk Mirroring Remote Storage Remote Storage WAN Disk Mirroring WAN Storage Performance Monitoring Probe

  14. Performance Monitoring Requirements Standards Based Reporting Management SAN Management, Data Management, Virtualization, SRM, Enterprise Management Java GUI, Spreadsheets, SAS, Home grown SNMP, CIM/XML CSV, SQL, HTTP Performance Monitoring Platform TCP/IP SNMP Fibre Alliance MIB 3rd Party Devices Routers/ Channel Extension Switches/Directors Probes Collection

  15. Performance Monitoring Requirements Standards Based • Should Support Heterogeneous Environments: • Multi-Vendor Equipment • FICON, FCP, IP, and VI • Fibre Channel and WAN • Should Support Standalone Deployment or as a Plug-In to Chosen SAN Management Application • Adds value to chosen storage management applications • Should Function as a Plug-In to Chosen Enterprise Management System. • Should Leverage Performance Monitoring Capabilities in Existing Equipment: Metrics and Access • Service Provider-Type Reporting

  16. Performance Monitoring Requirements Simple Plug-and-Play Configuration and Operation • Should Support Topology Rollup and Automatic Discovery of ports, devices, and LUNs. • Session and SCSI layer monitoring should be reported by human-readable logical port and device names • Permanent Statistics Logging should start automatically and have easily configurable sampling periods • Should Support a Dashboard for Quick Health Assessment • Should Support Open Systems Management for Remote and Desktop Access.

  17. Performance Monitoring Requirements Persistence: Permanent Records of Traffic Behavior • Should support user-configurable historic sampling intervals • Should support user-configurable rollup periods and retention times for efficient database usage • Should support archival and export of database for long term capacity planning • Persistent statistical storage enables capacity planning and trouble-shooting of problems that occurred in the past • Should support historical trend reports for capacity planning and performance tuning. • Should support historical summaries for Service Provider-Type Reporting. • Should support bookmarks and pre-configured time durations for frequently viewed reports and Service Provider-Type Reporting

  18. Performance Monitoring Requirements Persistence: Permanent Records of Traffic Behavior Examples • Trend of Total Mbytes/Sec In and Out for a selected port over the past 2 hours • Note that report was requested at 18:30 and displayed historical data. This is not a trace that began at 16:30.

  19. Performance Monitoring Requirements Persistence: Permanent Records of Traffic Behavior Examples • Trend of Total Mbytes/Sec In and Out for a selected port over the past 8 hours • Note that in addition to customized time periods, pre-configured time periods like Today, Yesterday, Current Week, and Last Month should be possible.

  20. Performance Monitoring Requirements Persistence: Permanent Records of Traffic Behavior Examples • Trend of SCSI Exchanges/Sec between host “Server_A” and storage device “RAID_A” for the past 2 hours.

  21. May 2002 Performance Monitoring Requirements Persistence: Permanent Records of Traffic Behavior Examples • Summary of the Top 5 LUNs in Total Mbytes/Sec being currently accessed by Host “Server_A” for Month of May, 2002 • Note LUNs 9, 5, 7, 6, and 8 on storage device “RAID_A”

  22. Performance Monitoring Requirements Flexible Reporting Capabilities • Should Support Real-Time Monitoring • Should Support Collection of Hundreds of Metrics including Diagnostics • Should Include Value-Added Derived Reports like TopN, Rates, and Multiple Devices and Statistics in a Single Report • Should Support Configurable Sampling Intervals • Should Support Bookmarks to Easily Return to Frequently Viewed Reports.

  23. Performance Monitoring Requirements Flexible Reporting Capabilities Hundreds of Metrics, Examples … • Utilization: • Frames (In/Out) • FC-2 MB/Sec (In, Out) • FC-4 MB/Sec (In, Out by ULP – SCSI, IP, VI, FICON, and others) • Errors MB/Sec (In, Out) • SCSI IO/Sec (Read, Write, Other) • SCSI Read (avg, min, max, read percentage) • SCSI Write (avg, min, max, write percentage) • SCSI Other (other percentage) • SCSI Read/Write Payload Size Ranges (percentage) • Throughput Errors: • Busy Frames • Rejected Frames • Link Failures • Aborts • Primitive Seq Protocol Errors • Invalid Tx Words • Delimiter Errors • Discarded Frames • BSYs and RJTs (Port, Fabric) • CRC Errors • Availability: • Link Resets (In/Out) • OLS (In/Out) • LOGIs (Port, Fabric) • %Available • Link Integrity: • Sync Loss • Sig Loss • Capacity: • %capacity for all frames • FC-4 %capacity (SCSI,IP,VI,FICON, other) • % capacity link control • % capacity link services • Latency: • SCSI Read/Write Duration (ms)

  24. Performance Monitoring Requirements Flexible Reporting Capabilities Examples • Real-time Summary of Total Mbytes/Sec for 24 selected ports. • Note that multiple ports across multiple switches can be added to single report. • Note Report is accessed using a Bookmark

  25. Performance Monitoring Requirements Flexible Reporting Capabilities Examples • Real-time Summary of percent read exchange size to storage device “RAID_A” from all hosts on the network. • Real-time sampling interval can be modified. • Report can be toggled to trend by simply selecting tool bar button. • Multiple metrics in a single report

  26. Performance Monitoring Requirements Policy Monitoring and Alerting • Should support proactive troubleshooting to eliminate or decrease downtime • Should support open real time alerting (i.e. SNMP, Email) • Should support multiple levels of thresholds • Should support pre-defined threshold definitions for quick and easy configuration • Thresholds should be supported on all metrics collected including errors, type of traffic, size of traffic, etc … and all objects including ports, devices, and logical units • Ideal for Service Provider Model since administrator knows about potential problems before end-user.

  27. Performance Monitoring Requirements Enhanced Security Policies • Role-Based Security • Event Logging • Security Policy Monitoring: Alerting on unauthorized Host to LUN access

  28. Performance Monitoring Requirements Scalability • Should Support a Combination of Software and Hardware to Suits your needs. • Should Support an Inexpensive Entry Point that is easily Expandable as your Network Grows. • Should Support a Roadmap around Future Storage Networking requirements (i.e. 10G, FC-IP, iSCSI, Infiniband) • Should be Data Center ready (i.e. multiple interfaces in a single enclosure, rack-mountable)

  29. Performance Monitoring Life-Cycle Putting it all together • Performance Profiling: • Record and Monitor Current Network Performance Levels • Performance Thresholding: • Set Thresholds based on profiles for real-time alerting to throughput and availability problems. • Performance Tuning: • Adjust traffic flows based on profiles for better network performance without spending for more resources. • Capacity Planning: • Know exactly when and how much more resources are needed without overspending.

  30. Performance Monitoring Platform Case Study and ROI Large Financial Brokerage – Metro Area Disk Mirroring Servers Storage Remote Storage FICON FICON FICON FCP FCP FCP FCP DWDM

  31. Case Study and ROI Performance Profiling MAN extender usage across a selected week. Note spikes in traffic.

  32. Case Study and ROI Performance Profiling Drilling into MAN extender usage across for specific day. Note spike in traffic between noon and 1PM.

  33. Case Study and ROI Performance Tuning Drilling into Storage port usage identifies offending Storage Device

  34. Case Study and ROI Large Financial Brokerage – Metro Area Disk Mirroring • Given: • DWDM Channel costs $16k/month. • Customer was considering going to 4 channels per fabric but justified that for time being, 3 per fabric was adequate. • Result: • ROI was less than 2 months for this particular solution. • Additional Benefits: • Capacity Planning: • Visibility into utilization trends determine exactly when additional channels will be needed. • Performance Tuning: • Visibility into offending storage device provide load balancing feedback to re-map devices to lower utilized links thus optimizing channels. • Standards-Based: • Provides seamless visibility into the FICON portion of the fabric as well. • Real-Time Monitoring: • Reports on errors for trouble-shooting and diagnostics.

  35. Performance Monitoring Solutions to Current Challenges • Planning network requirements for Business Continuance applications: • Planning network requirements for the ever-increasing size and complexity of the storage environment: • Answers question of how many MAN extender links you need. • Answers question of how much WAN extender bandwidth you need. • Traces spikes in MAN/WAN extender link back to the device and volume that caused it. • Enables you to know when you will need more bandwidth. • Reports on Latency • Answers question of how many ISLs you need. • Answers question of what is the optimum server-to-storage ratio. • Enables you to know when you will need more ports. • Traces spikes in ISL and storage port back to the device and volume (LUNs) that caused it.

  36. Performance Monitoring Solutions to Current Challenges • Lowering management cost while increasing storage networking performance • Implementing a Service Provider model consisting of charge back, reporting, and service level agreements to end users. • Eliminating finger pointing with Server, Network, and Database administration groups. • Reports, both real-time and historical, are only a mouse click away. No need for tedious spreadsheet crunching. • Command line launch and open APIs for seamless integration with 3rd party storage management application. • Since Session Layer Monitoring correlates usage and errors to the individual server, storage device, and volume (LUN), accountability can be maintained at the department level. • Session layer response time metrics allow you to distinguish between network, server, and storage device latency.

  37. Performance Monitoring Solutions to Current Challenges • Managing heterogeneous environments. • Decreasing or eliminating downtime with proactive policy-based monitoring. • Because solution is external to networking devices and uses standard collection interfaces, it is independent of fabric vendor, ULP, and can extend to the WAN. • Real-time and SNMP alerts on user-defined thresholds. You profile the network and define behavior. Solution provides real-time notification of policy violation. • Combines the best of both worlds: • Level of visibility on par with expensive diagnostic tools • Ease of use and capacity planning of an Enterprise service level management application.

  38. Advanced Performance Monitoring Solutions • Capacity Planning/Modeling: Planning for network usage of resources yet added. For example, when adding a new department with 10 clients to access application X on Server A. Server A already has 100 clients. Throughput from Server A to what disks will increase 10%? ROI Potential: If you under-use ISLs you are over-spending. • Service Duplication/Modeling: Planning for WAN usage of application yet added. For example, WAN will support disk mirror. How much bandwidth is needed to adequately support write I/O to particular disks or volumes? ROI Potential: If you under-use WAN links you are over-spending. • Performance Tuning: An Application/Server consolidation example: Applications needing access to much of the same data are candidates to run on the same server or in the same cluster. ROI Potential: If you under-use servers you are over-spending.

  39. Advanced Performance Monitoring Solutions • Performance Tuning: Save cost by separating the types of transactions on the network. For instance, separating transaction (I/O) and data intensive operations will allow more transactions ($) and deeper data mining. • Add value to storage management applications: Example: performance monitoring application feeds data backup/replication application so that backup time period is automatically selected and optimized. • Performance Management: Automate actions based on conditions detected. Example: Feedback loop to switching devices for intelligent routing decisions. • Life-Cycle Data Monitoring: Based on level of access over network, determine appropriate storage type for particular data or application. Provides feedback for HSM.

  40. TM QuestionsorFor a Copy of the Presentation:David.Signori@Inrange.com703-442-3284

More Related