Caché Performance Troubleshooting Part II The System

Caché Performance TroubleshootingPart IIThe System Vik Nagjee Product Manager, Kernel Technologies

System Performance: Limiting Factors CPU Memory I/O Disk Network

System system-wide metrics CPU Memory I/O Disk Network Latency and/or Queuing CPU Utilization Available Memory avwait + avserv/ queue seconds/Read Current Disk Queue [MON.DISK] RespT QLen %USER + %SYS %PROCESSOR [MON.SYS]CPU Busy Freemem pi/po Available Memory Page Reads [MON.PAGE] Unix Windows OpenVMS Unix Windows OpenVMS Unix Windows OpenVMS

Caché system-wide metrics CPU Memory Non DB I/O Files, Network DB I/O CACHE.DAT, WIJ, Journals Physical Block Reads Routine Commands Global References Physical Block Writes Journal Writes Time-to-run or other Application Specific Metrics

Significance: Caché system-wide metrics What are your usersexperiencing? Memory CPU DB I/O CACHE.DAT, WIJ, Journals How busyis Your database? How well is your application using database cache? How well is your disksystem responding? Routine Commands Cache Efficiency Physical Block Reads Global References Time-to-run and other Application Specific Metrics Physical Block Reads Global References

Collecting: system-level metrics system-wide Network management software: OpenView Tivoli BMC OpenNMS Nagios PRTG Traffic Monitor etc. Unix OpenVMS Windows PERFDAT T4 MONITOR sar | glance | nmon iostat | vmstat top | topas Resource and Performance Monitor logman Process Explorer

Collecting: system-wide Caché metrics CPU DB I/O CACHE.DAT, WIJ, Journals Memory Routine Commands Physical Block Reads Cache Efficiency Global References Time-to-run and other Application Specific Metrics

Collecting Caché metrics: GLOSTAT • %SYS>DO ^GLOSTAT

Collecting Caché metrics: ^pButtons • %SYS>DO ^pButtons • Installed in %SYS since 2008.2 but • The latest version (currently 1.15c) is available at ftp://ftp.intersystems.com/pub/performance/ • Can be automated via TASKMGR • Low overhead – logging data that’s already available. • Documented in the Caché Monitoring Guide

The performance “button” report (^pButtons)

Notes on using ^pButtons • Profiles are configurable: • Create custom duration and interval combinations • Add or delete from the OS level metric collection • Collect the logs into one easy-to-use .html file: • Preview a currently running profile’s data: • Available at any point while profile is running. • May result in some truncated data. %SYS>DO Collect^pButtons %SYS>DO Preview^pButtons(runid)

Collecting Caché metrics: Monitors • Caché History Monitor – SYS.History • Collect Caché metrics and User-defined metrics over time • Stored in your Caché database • Query or export the data using a variety of methods • Caché System Monitor – %Monitor.Health • Monitor the system health of your database • Alerts on abnormal metrics based on configurable criteria • Alerts from the System Monitor in cconsole.log: • 04/01/13-13:55:55:847 (13897) 1 [SYSTEM MONITOR] CPUusage Warning: CPUusage = 82 ( Warnvalue is 75)....(repeated 1 times)

Collecting Caché metrics: SNMP/WMI • SNMP, WMI, WSMON • Documented in the Caché Monitoring Guide • Caché metrics are exposed via the SNMP or WMI or Web services • NOTE: Future focus is on SNMP • Add CUSTOM application-specific metrics to be exposed • Use your EXISTING network management infrastructure to collect and alert on Cachémetrics, your applicationmetrics and operating system metrics

System-level clues to performance issues • CPU • Lack of processing cycles ( 0% CPU Idle) • Blocked processes (run queue or device queuing) • Disk • Abnormal disk IO rate • Queuing on devices • Higher than normal latency on busy disk • Memory • Lack of free memory • Hard page faults

Caché-level clues to performance issues • GloRefs and/or RouCmds • Higher than normal? • Your app will be using more CPU… • Are there extraneous processes or more users? • Lower than normal? • Your app may be struggling with another problem (slow disk) • Concurrency issues • Blocked users upstream on the network

Caché-level clues to performance issues • PhysBlkRds • Higher than normal? • Cache size doesn’t match current load • Use of CACHETEMP is forcing more disk reads for other data • Lower than normal? • Maybe that’s ok • App is struggling elsewhere such as lack of CPU cycles • If coupled with abnormally low GloRefs maybe disk latency issue

Application clues! • All the above coupled with application-level clues lead to solutions: • Are users complaining? • Is the rate of application activity the same? • Are batch-jobs/print jobs/screen refreshes completing in a timely manner? • Are your interfaces queuing?

Comparing metrics – Load measure

Comparing metrics – add App Metric 0.7/min/user 0.8/min/user 0.8/min/user 0.9/min/user 0.8/min/user

Comparing metrics – add Caché metric

Key points • Many important metrics available for capture • Capture the metrics at all times • Many tools/methods for capturing metrics • Include application-level metrics in your capture • Analysis for capacity or troubleshooting begins with understanding your application’s affects on the system.

You can reach me at: vik@intersystems.com Thanks for attending! Q&A

Caché Performance Troubleshooting Part II The System

Caché Performance Troubleshooting Part II The System

Presentation Transcript

Skeletal System Part II

RESPIRATORY SYSTEM PART II PATHOPHYSIOLOGY

Part II The Baldrige Model of Performance Excellence

Part II: The Global Positioning System

CARDIOVASCULAR SYSTEM PART II PATHOPHYSIOLOGY

ENDOCRINE SYSTEM PART II

The Financial System, Money, and Prices: Part II

Part II THE SYSTEM AND EARLY DECISION MAKING

The Nervous System-Part II

Performance Troubleshooting

Immune System Part II

Performance Evaluation of Networks, part II

Ensemble Performance Troubleshooting

Student Scheduling System Part II

The Nervous System Part II

WHONET Part II Expert System

Performance Evaluation of Networks, part II

The Circulatory System Part II

Chapter 26 – The Reproductive System (part II)

Troubleshooting Performance

Basic System Troubleshooting

Troubleshooting Performance