Understanding CPU Queue Length Calibration Choices in PATROL/Predict

Understanding CPU Queue Length Calibration Choices in PATROL/Predict Debbie Sheetz Sr. Staff Consultant BMC Software/Waltham 15 December 2002

Understanding CPU Queue Length Calibration Choices • CPU Queue Length • Definition • Measured vs. Modeled • Causes for Calibration Exceptions • CPU Queue Length Calibration • Process • Effect on Results • Too high vs. too low • Reporting vs. Modeling • Reporting • Analyze vs. Predict • Modeling • Explicit workload calibration • Auto-calibration

CPU Queue Length - Definition • Number of requests being serviced or waiting for service at the CPU • Native measurements do not always include requests being serviced • HP, Solaris, OSF, NT do not include requests being serviced • Total = Measured Queue + CPU Utilization / 100 • Analyze/Predict uses consistent definition • Investigate does not! • Requests are whatever actually presents itself to the CPU, e.g. a thread or a process

CPU Queue Length – Measured vs. Modeled • Measured vs. Modeled • Measurements report what actually occurred • Modeling (Predict) computes CPU queue length • Assume workload arrival pattern, e.g. that transactions arrive randomly • Actual transaction arrival pattern may be different from random • Transactions are coming from fewer sources, and/or are more sequential • Transactions are coming from many sources and/or are “bunched”

CPU Queue Length Calibration Exception – Causes • Measured vs. Modeled • So discrepancies between the two occur when • The transaction arrival pattern is different from the assumed pattern (random) • The measurement of queue length is flawed • Values are small

CPU Queue Length Calibration – Process • Observe Measured vs. Modeled (Predict Calibration Exceptions report) • Does any discrepancy matter? • Will CPU (or total) response time be reported? • Is CPU (or total) response time a performance modeling objective? • Is the discrepancy significant? • 30% is used as an indicator that discrepancies are expected here vs. in utilization or rate metrics (3% threshold) • CPU Queue lengths less than 1.0

CPU Queue Length Calibration – Process • Resolve significant discrepancy between Measured vs. Modeled (Predict Calibration Exceptions report) • What if the measured queue length is incorrect? • Known problems with NT and HP • CMG 2002 paper on the NT measured queue length • NT queue lengths are high when CPU utilization is low • HP queue length high when CPU utilization is low • Measurement does not allow precise time averaging • Resolve discrepancy by explicit workload or “Auto-calibration” techniques in Predict

CPU Queue Length Calibration – Effect on Results • Too high vs. too low • Too high, i.e. measured is higher than modeled • CPU response time should be higher than predicted • Response time could be underestimated • Or maybe application design/implementation could/should be improved • Too low, i.e. measured is lower than modeled • CPU response time should be lower than predicted • CPU upgrade may have less effect than expected • CPU upgrade may need to be focused on service time reduction (instead of reduction in overall utilization)

CPU Queue Length Calibration – Reporting • Analyze vs. Predict • Analyze-only Visualizer input means no possible discrepancy • Predict results will override Analyze results for Visualizer • Calibration choice needs to be made • Results should be interpreted according to choice • Manager default is Auto-Calibration ON for all nodes

CPU Queue Length Calibration – Modeling • For Predict, choices for CPU Queue Length Calibration • No action taken because • Platform has known measurement deficiency • Discrepancy is not significant • Explicit workload calibration is used • Discussed in User’s Guide • Changes assumptions about workload arrival pattern • Only applies to case where calculated queue is higher than measured • Auto-calibration feature is turned ON • Forces measured and modeled to match • Discrepancy (NODE QL-CORRECTION-FACTOR) is carried at the node • Correction is applied to all modeling scenarios (baseline and what-if)

CPU Queue Length Calibration – Modeling • Auto-Calibration • A workload characteristic is carried at the node • Differential changes in workload volumes will not be reflected properly • New/deleted workloads are an extreme example of this • Correction factors computed for low utilized systems may be very dramatic • And may be inappropriate when what-if increases CPU utilization significantly • Correction factors computed for systems with incorrect measurements can be extremely misleading • Explicit workload calibration does not have these drawbacks • Auto-Calibration is OFF by default in Analyze

Understanding CPU Queue Length Calibration Choices in PATROL/Predict