Capacity planning for the newer workloads l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 70

Capacity Planning for the Newer Workloads PowerPoint PPT Presentation


  • 167 Views
  • Uploaded on
  • Presentation posted in: General

Capacity Planning for the Newer Workloads. Linwood Merritt Capital One Services, Inc. [email protected] Disclaimer. These generic issues are addressed by this presentation: Vendor capacity ratings e-Commerce Continuous availability Data warehousing Growth rates

Download Presentation

Capacity Planning for the Newer Workloads

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Capacity planning for the newer workloads l.jpg

Capacity Planning for the Newer Workloads

Linwood Merritt

Capital One Services, Inc.

[email protected]


Disclaimer l.jpg

Disclaimer

  • These generic issues are addressed by this presentation:

    • Vendor capacity ratings

    • e-Commerce

    • Continuous availability

    • Data warehousing

    • Growth rates

  • This presentation contains no specific business-related information.


Introduction environment l.jpg

Introduction: Environment

  • Capital One

    • 5th largest card issuer in the United States

    • Capital One to S&P 500 in 1998

    • Fortune 500 company (#260)

    • Managed loans at $48.6 billion as of Q1 2002

    • Accounts at 46.6 million as of Q1 2002

    • Fortune 100 “Best Places to Work in America”

    • CIO 100 Award “Master of the Customer Connection”

    • Information Week “Innovation 100” Award Winner

    • ComputerWorld “Top 100 places to work in IT”


Outline of approach l.jpg

Outline of Approach

  • Understand behavior and issues around workloads, hardware, and data

  • Create projections and build recommendations.

  • Report the findings.


Outline of presentation l.jpg

Outline of Presentation

  • Discussion of workload types and capacity projection approaches

  • Overall summary of issues and approaches

  • Examples


What workloads l.jpg

What Workloads?

  • E-Commerce

  • Relational database systems

  • Mainframe-class UNIX

  • Multiple platforms

  • New characteristics


E commerce workloads direct to client business to business l.jpg

e-Commerce WorkloadsDirect to Client (business-to-business)

  • Access

    • Internet

    • Leased line

  • Services

    • Point of Care / Point of Sale

    • Value-added analysis


E commerce workloads direct to customer l.jpg

e-Commerce WorkloadsDirect to Customer

  • Access

    • Internet

    • Dial-in

  • Services

    • Marketing

    • Account query


E commerce workloads how to predict l.jpg

e-Commerce WorkloadsHow to Predict

  • Take business projections of volumes or users (include fudge factor)

  • Estimate transaction volumes and CPU/transaction

  • Convert to normalized unit such as MIPS


Relational databases l.jpg

Relational Databases

  • Sub-second (OLTP), decision support / data mining

  • Distributed gateways

  • Database machines

  • Redundant data with extracts

  • How to predict: estimate a factor over current database demand or take usage estimates


Mainframe class unix l.jpg

Mainframe-Class Unix

  • Types: Mainframe USS or Linux, Future UNIX vendor offerings

  • Candidate applications

    • Web server

    • Vendor-ported applications

    • User-ported / new applications

  • How to predict:

    • Estimate by timeframe

    • Add factor to growth rates


Multiple platforms l.jpg

Multiple Platforms

  • Mainframe: plan like existing applications (#users, transactions * CPU/transaction, application look-alikes, sizing tools)

  • Distributed: use vendor sizing, modeling tools, existing applications

  • Network: use network simulation tools, rules-of-thumb, bandwidth calculations


New characteristics l.jpg

New Characteristics

  • External users

  • Continuous availability

  • New user interfaces

  • Cross-platform


External users l.jpg

External Users

  • Drive need for continuous availability

  • Different access patterns (e.g., doctor’s office vs. call center)

  • Service level measurement - harder to put agent on external workstations


Continuous availability l.jpg

Continuous Availability

  • Driven by external users

  • 24x7 schedule

    • Application redesign

    • Data Sharing: CPU overhead

    • Coupling Facility

    • Expansion of “prime shift”

  • 99.999% “up time”

    • Redundancy, overhead

    • Availability reporting


User interfaces l.jpg

User Interfaces

  • TCP/IP - no “definite response” (end-to-end response time measurement)

  • Multiple internal transactions per “mouse click”

  • Response time measurement:

    • Agent on workstations

    • Scripting from “robots”


Cross platform applications l.jpg

Cross Platform Applications

  • Only unified view: simulation package

  • Each platform (“silo”) can be analyzed separately.

  • Different application development groups

  • May be able to cross-validate user numbers


Types of implementation 1 l.jpg

Types of Implementation (1)

  • Standalone / “shrink-wrap”

  • Layered onto legacy applications

    • New mainframe application code

    • GUI front-end

    • Browser

    • Middle-tier (Unix or NT)

    • MQSeries - can add middle-tier and new mainframe applications


Types of implementation 2 l.jpg

Types of Implementation (2)

  • Legacy extracts

  • Re-engineered legacy applications

    • Convergence of business rules / applications

    • Re-usable components

    • Redundant access

    • Salvage investment, fix Band-Aids

    • Simplify logic, reduce platform complexity


What are we analyzing mainframe l.jpg

What Are We Analyzing?(Mainframe)

  • MIPS - growth, latent demand, software cost

  • Memory - track and watch 2 GB limit on central storage (goes away with 64-bit)

  • I/O - channels, gigabytes of disk, tape

  • Coupling Facility - Parallel Sysplex, Shared Data, continuous availability

  • Vendor upgrade paths

  • New partitions


What are we analyzing distributed l.jpg

What Are We Analyzing?(Distributed)

  • Number and types of platforms

  • CPU, memory, disk space

  • Bandwidth

  • Location of applications / processes

  • Platform limitations (CPU, memory)

  • Software pricing considerations

  • Porting opportunities


Measurement of new workloads l.jpg

Measurement of New Workloads

  • Summarize by platform:

    • Workload rules (process or user names)

    • Processes by descending CPU%

  • Resources: CPU, memory, disk space, Coupling Facility, network traffic

  • Growth:

    • Resources/user/application

    • Number of users + application changes


Distributed approach l.jpg

Distributed Approach

  • Consider tiers of service (not currently at Capital One)

  • Address service level measurement issue

  • Implement reporting

  • Add to Capacity Plan

  • “Silo” vs. “Application”


Tiers of service platinum l.jpg

Tiers of Service“Platinum”

  • Most expensive

  • Modeling product

  • Install in one server for each major application, use collection product for other servers


Tiers of service gold l.jpg

Tiers of Service“Gold”

  • Collection product

  • Capacity planning with Rules of Thumb


Tiers of service brass l.jpg

Tiers of Service“Brass”

  • Least expensive (man-hours only)

  • “Native”

    • Unix scripts

    • NT PerfMon


Service level measurement l.jpg

Service Level Measurement

  • API call at workstation - “Applications Response Measurement” (ARM) or Windows 2000 trace API calls

  • Agents: software tracing of Windows API calls - can be installed in a subset of end-user base (sampling)

  • Scripting (“robots”)

  • Stop watch sampling and logging


Distributed reporting l.jpg

Distributed Reporting


Add to capacity plan l.jpg

Add to Capacity Plan


Scope of analysis l.jpg

Scope of Analysis

  • Silos

    • Look at each hardware/application environment independently.

  • Applications

    • Look at each application as a whole.

    • Application instrumentation

    • Inference: put platform silos together.


Analyzing the data growth rates l.jpg

Analyzing the DataGrowth Rates

  • General list of business plans

  • List of technical scenarios

  • Timeline

  • Estimate median and maximum likely MIPS/CPU/users/business units

  • Derive scenario growth rates


Analyzing the data additional resources l.jpg

Analyzing the DataAdditional Resources

  • Parallel Sysplex (Coupling Facility): important for continuous availability, level set functionality

  • Disk / channels / tape: disk megabytes, channel maximum, tape connectivity

  • Communications connectivity: new partitions for availability

  • Memory: 2 GB constraint, 64-bit


Growth l.jpg

Growth

  • “Baseline” growth

  • “Scenario” growth

  • Independent events (merger/acquisition, potential major project)


Example 1 mainframe upgrade l.jpg

Example 1: Mainframe Upgrade

  • Task force, led by Capacity Planner

  • Driven by expiring three-year lease (CPU replacement, three-year planning horizon)

  • “Vendor parade” - presentations and dialogues

    • Upgrade paths

    • Technology / service differences

    • References / site visits

    • Capacity sizing: MIPS charts, LSPR / sizing tools


Mainframe upgrade deliverables l.jpg

Mainframe Upgrade Deliverables

  • Document

    • Business drivers and technical scenarios

    • Growth forecasts

    • Vendor options and growth paths

    • Coupling Facility / Parallel Sysplex

  • Evaluation

    • Difference thresholds: MIPS claims, price/MIPS, ICF

    • Differentiators


Business and technical l.jpg

Business and Technical

Technical Scenarios

Consolidation of distributed servers

Continuous availability

Significant external business

Data Warehousing

Acquisition/merger

Business Drivers

Cost management

External business

Improved data access

Business expansion


Projections l.jpg

Projections

  • Make educated guess by timeframe for each scenario

  • Add to “baseline” growth

  • Convert to growth rate

  • Use both “baseline” and “scenario growth”

  • Compare maximum scenario growth to maximum for platform family


Impact analysis l.jpg

Impact Analysis


Scenario timeline l.jpg

Period1

Initial muck exploitation with 250 Users

First Parallel Sysplex exploitation

Period2

First mainframe Wk1 Application

Period3

(Potential acquisition)

MajorProject A with 100 users, 150% CAGR

New DB2 functionality exploitation

Period4

64-bit OS/390

Full Data Sharing exploitation (IMS, CICS, DB2)

Period5

Full subsystem redundancy (IMS, CICS, DB2)

Period6

24x7 operation

Period7

Scenario Timeline


Vendor upgrade paths detail l.jpg

Vendor Upgrade PathsDetail

  • Use logarithms:

    Start*CAGR^x = Threshold

    x years = log(Threshold/Start)/log(CAGR)

  • ModelMIPSMSU+40%/Yr+25%/Yr

    • GS2068E952160Aug-00Sep-00

    • GS2074E1013171Oct-00Dec-00

    • GS2084E1141193Apr-01Jul-01

    • GS2094E1260213Sep-01Dec-01

    • GS2104E1378234Nov-01May-02


Vendor upgrade paths summary l.jpg

Vendor Upgrade PathsSummary


Upgrade document l.jpg

Upgrade Document


Example 2 unix modeling l.jpg

Example 2: UNIX Modeling

  • Modeling product installed on MQSeries server

  • Application running with a known number of users

  • Projected rollout schedule used to drive model

  • Mainframe side: CICS application, IMS load


Unix platform workloads l.jpg

UNIX Platform Workloads

  • Two primary workloads:

    • MQSeries userids (mqm*) - memory intensive

    • Messaging application processes (MDA*) - “CPU intensive”


Workload modeling methodology l.jpg

Workload Modeling Methodology

  • MQSeries - Calculate relative workload intensity, enter model ratio.

  • Messaging application processes - Keep constant until application is removed from platform (“design loop” - always uses 1 CPU). Must adjust across CPU upgrade to continue using 1 CPU.


Track across upgrade l.jpg

CPUUpgrade

Track Across Upgrade


Model spreadsheet l.jpg

Model Spreadsheet


Model presentation l.jpg

Model Presentation

Timeframe:April 2000

#Users:180, 100

Ratios:1.27, 1.00

Config:F50/02,2GB

Comment:Add Event1 Users


Validation tracking users on mainframe l.jpg

Validation - Tracking Users(on mainframe)

//ECLUSRS EXEC SASV8,REGION=0M

//ECLD1 DD DSN=XYZ.PRD.A.AAAPRD.I.VOLFIL,DISP=SHR

//ECLDPDB DD DSN=CAPLAN.PRD.ECLDPDB,DISP=OLD

//SYSIN DD *,DLM=@@

data ecld1;

format date date.;

format dt datetime.;

INFILE ECLD1 MISSOVER;

INPUT @1 RECNUM $CHAR5.

@6 RECTYPE $CHAR8.

@14 USERCT $CHAR5.

@19 USERMAX $CHAR5.;

if recnum =: '99999' and rectype =: 'TCSCONFG';

dt = datetime();

date = datepart(dt);

hour = hour(dt);

data ecldpdb.users;

update ecldpdb.users ecld1;

by date hour;

proc print;

title 'Ecloud1 Users';


Example 3 server replacement l.jpg

Example 3: Server Replacement

  • Project: replace “old” NT servers

  • Application: Imaging servers

  • Capacity sizing data:

    • Rules-of-thumb analysis by vendor, using projected claims/minute and processor clock speeds

    • Benchmark information


Server replacement process l.jpg

Server Replacement Process

  • Multiple servers: each server is a workload, must be sized separately.

  • Enumerate and measure servers.

  • Apply growth rates and determine processing power requirements for the replacements.

  • Research available configurations and order appropriate server configurations.

  • Track CPU utilization across the upgrades.

  • Update relative capacity specs for next upgrade.


Server sizing l.jpg

Server Sizing

  • Find (or derive) benchmark capacity ratings for starting and replacement configurations.

  • Apply an estimate of current CPU utilization, a growth percentage, and a “peak/average” and performance buffer (+100% for this study).

  • Output: estimated percentages of a standard configuration. The number of estimated CPUs needed (23) came very close to the vendor’s original number of 24.


Sizing spreadsheet l.jpg

Sizing Spreadsheet


Example 4 hundreds of servers l.jpg

Example 4: Hundreds of Servers

  • Data capture

  • Reporting

  • Business drivers


Data capture l.jpg

Data Capture

  • Time-based scheduling product

  • Script-based data “pull”

  • Issue: data loss, time to find and rebuild

  • Potential fixes:

    • Product

    • Data “push” from servers


Data reporting analysis l.jpg

Data Reporting, Analysis

  • Color-based “health index” (Concord NetHealth metric).

  • Statistical Analysis (over two standard deviations from mean)

  • Thumbnail drilldown graphs

  • Automatic generation of html

  • “Treemap” graphs


Health index l.jpg

Health Index *

* Concord NetHealth metric


Statistical process control l.jpg

Statistical Process Control

cmg


Thumbnail html l.jpg

Thumbnail Html


Automatic generation of html l.jpg

Automatic Generation of Html

  • Driven by “matrix”

    • Originally spreadsheet

    • Converted to relational database

    • Ultimate capacity planning solution: information by server, application, platform, business driver

  • SAS code - builds web pages and hyperlinks


Treemap l.jpg

Treemap

ASSDSDFVVBNM

XSDFFGFRRFHFHJKJKLLXXXXX

XESDGFKOKJHHMM

XESDG

SDEFBJMGG

XESDGFKOKJ

DERFFVBBNHGFF

XES

XESDG

Paper by Ben Shneiderman, University of Maryland, http://www.cs.umd.edu/hcil/treemaps


Business drivers l.jpg

Business Drivers

  • Capacity Councils - business units responsible for capacity planning of “demand” side

  • Capacity Planners - build projections based on business drivers and historical trending


Business driver based forecasts l.jpg

Business Driver Based Forecasts

Application

Business

Driver

Projections

Server

Application

Business

Driver

Projections

Application


Regression analysis l.jpg

Regression Analysis

Input = CPU and Business Drivers by month

Output = Coefficients

f1

f2

f3

Widgets

Gadgets

Customers

CPU

By month (input = Widgets, Gadgets, Customers):

projection =Widgets*f1 + Gadgets*f2 + Customers*f3;


Graphical output l.jpg

Graphical Output

Widgets Gadgets Customers


Enterprise capacity at a glance l.jpg

Enterprise “Capacity at a Glance”


Summary issues l.jpg

SummaryIssues

  • Access patterns and schedules

  • Platforms (more types and numbers)

  • Resources (what to track)

  • Levels of capacity management

  • Reporting of utilization and service levels, for large numbers of platforms

  • Higher availability (redundancy, reporting)

  • Deriving and reporting projections


Summary deriving projections l.jpg

SummaryDeriving Projections

  • Basic capacity planning:

    • Growth rates

    • Upgrade thresholds

  • Aggressive estimate of “scenario” demand

  • Bracket growth:

    • Lower end: “baseline”

    • Upper end: “scenarios”


Summary types of projections l.jpg

SummaryTypes of Projections

  • Number of transactions

  • Number of users

  • Number of platforms

  • Application sizing input

  • Application complexity

  • Fraction of an existing workload

  • Growth rate


Summary capacity planning l.jpg

SummaryCapacity Planning

  • Projections based on application and platform

  • Levels of capacity planning service

  • Report on all enterprise resources

  • Organize data with “matrix” database


  • Login