balancing batch workloads and cpu activity in a parallel sysplex environment
Download
Skip this Video
Download Presentation
Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment

Loading in 2 Seconds...

play fullscreen
1 / 24

Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment - PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on

Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment. Prepared by Kevin Martin McKesson For CMG Canada Spring Seminar 2006. Introduction. Pharma applications run in a data center in California. Application support is in San Francisco and Dallas.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment' - unity-mullen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
balancing batch workloads and cpu activity in a parallel sysplex environment

Balancing Batch Workloads and CPU Activity in a Parallel Sysplex Environment

Prepared by Kevin Martin

McKesson

For CMG Canada

Spring Seminar 2006

introduction
Introduction
  • Pharma applications run in a data center in California. Application support is in San Francisco and Dallas.
  • We implemented parallel sysplex environments last July to improve availability.
  • We also installed a 2086-350 and 2086-250. The CPU engines have the same speed, facilitating reporting and workload balancing.
reasons for imbalanced cpu activity
Reasons for Imbalanced CPU Activity
  • Originally the Pharma application ran on one production LPAR. Hard to decide how to split processing and maintain data integrity.
  • Software licenses: IMS and COMPAREX only on the 350 and SAS only on the 250
  • System tasks: TWS controller (job scheduling) on the 350 and DFHSM migrates and backups on the 250
  • Other restrictions due to problems and data integrity concerns
job routing
Job Routing
  • Our goal was to avoid modifying JCL
  • We used WLM scheduling environments, and a tool to assign programs or jobs to the scheduling environments
wlm scheduling environments
WLM Scheduling Environments
  • DDCANY run on DDCA or DDCO
  • DDCA DDCA jobs
  • DDCOJOBS DDCO jobs
  • SAS SAS programs
  • DDCO Jobs that run on DDCO using class 6
  • EDE EDICKP DD statement
  • MQSERIES MQSERIES
  • REEL 3420 tapes
  • EDETEST EDE test jobs DM99Txxx
  • DDCSPECL programs that run on the 350
sdsf resource display
SDSF Resource Display
  • RESOURCE DDCA DDCO
  • DDCANY ON ON
  • DDCO OFF ON
  • DDCOJOBS OFF ON
  • DDCSPECL ON OFF
  • DDNAMES ON OFF
  • EDE ON ON
  • EDETEST ON ON
  • IMSTEST ON ON
  • MQSERIES ON OFF
  • REEL ON OFF
  • SAS OFF ON
wlm and jes mode initiators
WLM and JES Mode Initiators
  • For each job class you can specify MODE=WLM or MODE=JES in the JES2 parameters
  • WLM mode initiators can start dynamically on any LPAR
  • JES mode initiators are set for each LPAR in permanent initiators
  • WLM and JES mode classes can run at the same time. However, ensure that there are enough JES mode initiators.
wlm and jes mode initiators1
WLM and JES Mode Initiators
  • CLASS Status Mode Wait-Cnt Xeq-Cnt Hold-Cnt JCLim
  • H NOTHELD WLM 3 100
  • L NOTHELD WLM 1 100
  • M NOTHELD WLM 1 100
  • N NOTHELD WLM 100
  • O NOTHELD WLM 100
  • 1 NOTHELD WLM 100
  • 2 NOTHELD JES 100
  • 3 NOTHELD JES 7 100
  • 4 NOTHELD WLM 100
  • 5 NOTHELD JES 100
  • 6 NOTHELD JES 100
problem 1 slower turnaround on one lpar more jobs running
Problem # 1: slower turnaround on one LPAR – more jobs running.
  • TWS controller is on DDCA. When a job is released, a WLM initiator is available on the same LPAR first.
  • For example, there could be 15 jobs on DDCA and only 5 jobs on DDCO. So the jobs on DDCA get slower turnaround than the ones on DDCO.
  • This gets worse if high priority jobs are running on the busy LPAR. The low priority jobs will run very slowly.
  • Checked DASD response and tuned JES MAS parms.
  • We routed several large priority jobs to DDCO by assigning specific job names to a scheduling environment named DDCOJOBS.
problem 2 releasing many jobs at the same time
Problem # 2: Releasing many jobs at the same time
  • 8 or 16 large jobs are released at once. They are on the critical path for a schedule and they have a high priority.
  • With WLM mode initiators most of the jobs could start on one LPAR because that LPAR was not busy at the time that the jobs were released.
  • For example, DDCA could get 2 jobs and DDCO could get 6 jobs. The jobs on DDCA would finish earlier, and then DDCA would be idle while DDCO was still busy.
  • We assigned these groups of large priority jobs to JES mode job classes to balance the LPAR activity better. Defined four class 5 initiators on DDCA and four class 5 initiators on DDCA. Assigned DY65 jobs to class 5.
problem 3 wlm initiators and jobs on the input queue
Problem # 3: WLM initiators and jobs on the input queue
  • Priority jobs would start, but lower priority jobs would wait on the input queue
  • With over 10,000 jobs running per day, we found some jobs that were incorrectly classified.
  • We defined a WLM policy override to change the BATLOW service class to importance level 3, the same importance level as the higher priority batch. After the FIXINPUT policy override was activated, the jobs on the input queue would start. Sometimes it would take 10 minutes to start all of the jobs. Afterwards the regular policy was activated again.
how to make wlm policy overrides
How to make WLM policy overrides
  • On the WLM service policy selection list, specify action code 2=COPY to copy the base policy to a new policy named FIXINPUT.
  • Then specify action code 7=Override Service Classes to modify the service class goals for FIXINPUT.
  • Then specify action code 3=Override Service Class to modify the goals for specific service classes in the policy override.
  • To activate the policy, enter: V WLM,POLICY=FIXINPUT
  • To display the WLM policy, enter: D WLM
jobs on the input queue
Jobs on the input queue
  • Apar UA21235 on z/OS 1.4 systems.
  • Correction was released in October, 2005
  • “Currently WLM does not start additional initiators for local batch work with system affinities when idle initiators exist on other systems in the sysplex. This can lead to situations where local batch jobs are delayed for a significant period of time because a local shortage of initiators exists. The situation is most visible on large sysplex environments with batch work having system affinities to only few systems. WLM improves to start initiators by looking more closely at the number of initiators which can really handle the affinity work.“
summary
Summary
  • Balance LPAR activity in order to optimize capacity in a parallel sysplex environment.
  • WLM mode initiators work well in most cases. It is essential that the correction for UA21235 is installed.
  • It is OK to mix WLM mode and JES mode job classes, provided that there are always enough fixed initiators for each JES mode job class.
changes in cpu utilization
Changes in CPU utilization
  • Overall CPU activity decreased from September to January due to tuning.
  • DDCA decreased due to tuning improvements.
  • DDCO increased in August and then remained at the same utilization due to better workload balancing.
  • The following graphs show how the LPAR activity became more balanced.
ad