1 / 23

Caché Performance Troubleshooting Part II The System

Caché Performance Troubleshooting Part II The System. Vik Nagjee Product Manager, Kernel Technologies. System Performance: Limiting Factors. CPU. Memory. I/ O Disk Network. System system-wide metrics. CPU. Memory. I/ O Disk Network. Latency and/or Queuing. CPU Utilization.

march
Download Presentation

Caché Performance Troubleshooting Part II The System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caché Performance TroubleshootingPart IIThe System Vik Nagjee Product Manager, Kernel Technologies

  2. System Performance: Limiting Factors CPU Memory I/O Disk Network

  3. System system-wide metrics CPU Memory I/O Disk Network Latency and/or Queuing CPU Utilization Available Memory avwait + avserv/ queue seconds/Read Current Disk Queue [MON.DISK] RespT QLen %USER + %SYS %PROCESSOR [MON.SYS]CPU Busy Freemem pi/po Available Memory Page Reads [MON.PAGE] Unix Windows OpenVMS Unix Windows OpenVMS Unix Windows OpenVMS

  4. Caché system-wide metrics CPU Memory Non DB I/O Files, Network DB I/O CACHE.DAT, WIJ, Journals Physical Block Reads Routine Commands Global References Physical Block Writes Journal Writes Time-to-run or other Application Specific Metrics

  5. Significance: Caché system-wide metrics What are your usersexperiencing? Memory CPU DB I/O CACHE.DAT, WIJ, Journals How busyis Your database? How well is your application using database cache? How well is your disksystem responding? Routine Commands Cache Efficiency Physical Block Reads Global References Time-to-run and other Application Specific Metrics Physical Block Reads Global References

  6. Collecting: system-level metrics system-wide Network management software: OpenView Tivoli BMC OpenNMS Nagios PRTG Traffic Monitor etc. Unix OpenVMS Windows PERFDAT T4 MONITOR sar | glance | nmon iostat | vmstat top | topas Resource and Performance Monitor logman Process Explorer

  7. Collecting: system-wide Caché metrics CPU DB I/O CACHE.DAT, WIJ, Journals Memory Routine Commands Physical Block Reads Cache Efficiency Global References Time-to-run and other Application Specific Metrics

  8. Collecting Caché metrics: GLOSTAT • %SYS>DO ^GLOSTAT

  9. Collecting Caché metrics: ^pButtons • %SYS>DO ^pButtons • Installed in %SYS since 2008.2 but • The latest version (currently 1.15c) is available at ftp://ftp.intersystems.com/pub/performance/ • Can be automated via TASKMGR • Low overhead – logging data that’s already available. • Documented in the Caché Monitoring Guide

  10. The performance “button” report (^pButtons)

  11. Notes on using ^pButtons • Profiles are configurable: • Create custom duration and interval combinations • Add or delete from the OS level metric collection • Collect the logs into one easy-to-use .html file: • Preview a currently running profile’s data: • Available at any point while profile is running. • May result in some truncated data. %SYS>DO Collect^pButtons %SYS>DO Preview^pButtons(runid)

  12. Collecting Caché metrics: Monitors • Caché History Monitor – SYS.History • Collect Caché metrics and User-defined metrics over time • Stored in your Caché database • Query or export the data using a variety of methods • Caché System Monitor – %Monitor.Health • Monitor the system health of your database • Alerts on abnormal metrics based on configurable criteria • Alerts from the System Monitor in cconsole.log: • 04/01/13-13:55:55:847 (13897) 1 [SYSTEM MONITOR] CPUusage Warning: CPUusage = 82 ( Warnvalue is 75)....(repeated 1 times)

  13. Collecting Caché metrics: SNMP/WMI • SNMP, WMI, WSMON • Documented in the Caché Monitoring Guide • Caché metrics are exposed via the SNMP or WMI or Web services • NOTE: Future focus is on SNMP • Add CUSTOM application-specific metrics to be exposed • Use your EXISTING network management infrastructure to collect and alert on Cachémetrics, your applicationmetrics and operating system metrics

  14. System-level clues to performance issues • CPU • Lack of processing cycles ( 0% CPU Idle) • Blocked processes (run queue or device queuing) • Disk • Abnormal disk IO rate • Queuing on devices • Higher than normal latency on busy disk • Memory • Lack of free memory • Hard page faults

  15. Caché-level clues to performance issues • GloRefs and/or RouCmds • Higher than normal? • Your app will be using more CPU… • Are there extraneous processes or more users? • Lower than normal? • Your app may be struggling with another problem (slow disk) • Concurrency issues • Blocked users upstream on the network

  16. Caché-level clues to performance issues • PhysBlkRds • Higher than normal? • Cache size doesn’t match current load • Use of CACHETEMP is forcing more disk reads for other data • Lower than normal? • Maybe that’s ok • App is struggling elsewhere such as lack of CPU cycles • If coupled with abnormally low GloRefs maybe disk latency issue

  17. Application clues! • All the above coupled with application-level clues lead to solutions: • Are users complaining? • Is the rate of application activity the same? • Are batch-jobs/print jobs/screen refreshes completing in a timely manner? • Are your interfaces queuing?

  18. Comparing metrics – Load measure

  19. Comparing metrics – add App Metric 0.7/min/user 0.8/min/user 0.8/min/user 0.9/min/user 0.8/min/user

  20. Comparing metrics – add Caché metric

  21. Key points • Many important metrics available for capture • Capture the metrics at all times • Many tools/methods for capturing metrics • Include application-level metrics in your capture • Analysis for capacity or troubleshooting begins with understanding your application’s affects on the system.

  22. You can reach me at: vik@intersystems.com Thanks for attending! Q&A

More Related