openvms marvel ev7 proof points of olympic proportions n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions PowerPoint Presentation
Download Presentation
OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions

Loading in 2 Seconds...

play fullscreen
1 / 71

OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions - PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on

OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions. Tech Update - Sept 2003 Steve.Lieman@hp.com. OpenVMS Marvel EV7 Proof Points of Olympic Proportions. Live, mission-critical, production systems Multi-dimensional Before and after comparisons Upgrades from GS140 & GS160 to GS1280

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
openvms marvel ev7 proof points of olympic proportions
OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions

Tech Update - Sept 2003

Steve.Lieman@hp.com

openvms marvel ev7 proof points of olympic proportions1
OpenVMS Marvel EV7 Proof Points of Olympic Proportions
  • Live, mission-critical, production systems
  • Multi-dimensional
  • Before and after comparisons
  • Upgrades from GS140 & GS160 to GS1280
  • Proof of impact on maximum headroom
can your enterprise benefit from an upgrade to a gs1280
Can your enterprise benefit from an upgrade to a GS1280?
  • Systems with high MPsynch
  • Systems with high primary CPU interrupt load
  • Poor SMP scaling
  • Heavy locking
  • Heavy IO, Direct, Buffered, Mailbox
  • Heavy use of Oracle, TCPIP, Multinet
  • Look closer if:
    • Systems with poor response time
    • Systems with insufficient peak period throughput
t4 data sources
T4 - Data Sources
  • Data for these comparisons was collected using the internally developed T4 (tabular timeline tracking tool) suite of coordinated collection utilities and analyzed with TLViz
  • The T4 kit & TLViz have consistently proved themselves invaluable for this kind of before and after comparison project. We havenow made T4 publiclyavailable for download (will ship with OpenVMS 7.3-2 in SYS$ETC: )
  • T4 could be a useful adjunct to your performance management program.
would you like to participate in our marvel proof point program
Would you like to participate in our Marvel Proof Point Program?
  • Contact steve.lieman@hp.com for more information about how you can take part
  • Download T4 kit from public web site:

http://h71000.www7.hp.com/OpenVMS/products/t4/index.html

  • Start creating a compact, portableT4 based performance history of your most important systems
  • The T4 data will create a common and efficient language for our discussions. We can then work with you and help you evaluate your unique pattern of use and the degree to which the advantages of Marvel EV7 on OpenVMS can most benefit you.
want even more detail
Want even more detail?
  • The electronic version of this presentation contains extensive captions and notes on each slide for your further study, reflection, and review.
slide7

CASE 1- Production System12P GS140 700 MHzvs. 16P GS1280 1.15 GHzTremendous Gains in Headroom Oracle Database Server with Multinet

compute queue completely evaporates with gs1280
Compute Queue Completely Evaporates with GS1280

Peak Queues of 57 drop to queues of 1 or 2

Green is GS140 at 700 MHz with 12 CPU

Red is GS1280 at 1.15GHZ with 16 CPUs

cpu o idle time
CPU O Idle Time

With GS1280, 73% spare CPU 0 during absolute peak. With GS140 CPU 0 is completely consumed during peaks (e.g. at 11 AM)

Green is GS140 at 700 MHz with 12 CPU

Red is GS1280 at 1.15GHZ with 16 CPUs

almost 4 to 1 reduction in cpu busy with gs1280
Almost 4 to 1 reduction in CPU Busy with GS1280

GS140 is nearly maxed out at more than 1150% busy of 1200% while

GS1280 is cruising along at 250% to 350% busy of 1600%

Green is GS140 at 700 MHz with 12 CPU

Red is GS1280 at 1.15GHZ with 16 CPUs

directio includes network traffic
DirectIO (includes network traffic)

GS1280 is able to push to higher peaks when load gets heavy, while still having huge spare capacity for more work.

GS140 is close to maxed out at 10,000 DIRIO per second

Green is GS140 at 700 MHz with 12 CPU

Red is GS1280 at 1.15GHZ with 16 CPUs

mpsynch
MPsynch

MPsynch drops from 90% to under 10% leaving plenty of room for further scaling

Green is GS140 at 700 MHz with 12 CPU

Red is GS1280 at 1.15GHZ with 16 CPUs

slide13
Packets Per Second Sent – key throughput metric - Estimate actual maximum rate for GS1280 at more than 20,000/sec

GS140 maxes out at about 5,000 packets per second with little or no spare capacity. GS1280 reaches 6,000 with substantial spare capacity

Blue is GS140 at 700 MHz with 12 CPU

Red is GS1280 at 1.15GHZ with 16 CPUs

case 1 summary 12p gs140 to 16p gs160
Case 1 Summary 12P GS140 to 16P GS160
  • GS1280 delivers an estimated increase in headroom of at least 4X
  • Eliminates CPU 0 bottleneck
  • Drastically cuts MPsynch
  • Able to handle higher peaks as they arrive
  • Almost 4 to 1 reduction in CPU use while doing slightly more work
slide16

Case 2 – Production System10P GS140 700 MHz vs. 8P GS1280 1.15 GHzTremendous Gains in Headroom for a Oracle Database Server despite reduced CPU countPoised to Scale

compute queue completely evaporates with gs1280 and the current workload demand
Compute Queue Completely Evaporates with GS1280 and the current workload demand

Peak Queues of 32 drop to 3

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

cpu o idle time1
CPU O Idle Time

With GS1280, 69% spare CPU 0 during absolute peak with this workload. With GS140 CPU 0 is completely consumed during peaks (e.g. at 10:30 for many minutes at a time)

Green is GS140 at 700 MHz with 12 CPU

Red is GS1280 at 1.15GHZ with 16 CPUs

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

more than 3 to 1 reduction in cpu busy with gs1280
More than 3 to 1 reduction in CPU Busy with GS1280

GS140 is completed maxed out at more than 1000% busy while

GS1280 is cruising along at 200% to 350% busy of 800%

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

directio includes network traffic1
DirectIO (includes network traffic)

GS1280 is able to push to higher peaks of 10,500 when the load temporarily gets heavier, while still having huge spare capacity for more work (appx 5 CPUs) The 10P GS140 is maxed out at slightly over 8,000 DIRIO per second.

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

mpsynch more than a 9 to 1 reduction with this workload
MPsynch (more than a 9 to 1 reduction with this workload)

MPsynch drops from peaks of 67% to peaks of only 7% leaving plenty of room for further scaling

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

slide22

Packets Per Second Sent – a key throughput metric - Estimate actual max rate for 8P GS1280 at more than 11,000/sec. With 16P this would rise to 20,000/sec

The10 P GS140 maxes out at about 4,200 packets per second with no spare capacity. The 8P GS1280 reaches 4,800 with more than 4.5 CPUs to spare

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

cpu 0 interrupt is well poised for scaling to 8 12 and even more cpus with the gs1280
CPU 0 interrupt – is well poised for scaling to 8, 12, and even more CPUs with the GS1280

During peak periods, despite the fact that the GS1280 with 8P is doing slightly more work, it uses a factor of 3.5X less CPU 0 for interrupt activity

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

At peaks of only 20%, the GS 1280 stands ready to handle substantially higher workloads

slide24
Disk operations rate – This shows the same head and shoulders pattern as direct IO and packets per second

During peak periods, the 10P GS140 maxes out at 2,200 disk operations per second. With this workload, the 8P is able to reach 2,900 per second with lots of room to spare

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

As the load demand on the GS1280 increases, this 8P model looks capable of driving the disk op rate to 6,000/sec

interrupt load during peak periods drops by a factor of almost 5 to 1 from 240 to 50
Interrupt load during peak periods drops by a factor of almost 5 to 1 from 240% to 50%.

This is another excellent sign of the potential future scalability of this GS1280 to 8 CPUs, 12 CPUs and beyond.

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

microseconds of cpu per each direct io
Microseconds of CPU per each Direct IO

Normalized statistics like this show the relative power of each GS1280 CPU at 1.15 GHZ is between 3 to 4 times more than the GS140’s 700 MHz CPUs

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

disk reads per second
Disk Reads Per Second

This shows same head and shoulders pattern but even more pronounced than what we saw with network packets

Red is GS1280 at 1.15GHZ with 8 CPUs

Green is GS140 at 700 MHz with 10 CPUs

case 2 summary 10p gs140 to 8p gs1280
Case 2 Summary 10P GS140 to 8P GS1280
  • GS1280 with fewer CPUs delivers an estimated headroom increase more than 2X
  • Eliminates CPU busy bottleneck
  • Drastically cuts MPsynch
  • Able to handle higher peaks as they arrive
  • Well positioned to scale to 8, 12, or higher CPUs and achieve headroom increases of 3.5X or even higher.
proof point patterns
Proof Point Patterns
  • Dramatic cuts in MPsynch
  • Large drops in Interrupt mode
  • Higher, short-lived bursts of throughput
    • directIO, packets per second, etc.
    • The “HEAD and SHOULDERS”
  • Large increase in spare capacity and headroom
    • Overall CPU, primary CPU

Where the workload stays relatively flat at the point of transition, the overall throughput numbers are not that different, but the shape of the new curve with its sharp peaks tells an important story

case 3 stress test marvel 32p rms1
Case 3 –Stress Test Marvel 32P – RMS1
  • This case shows a segment of our RMS1 testing on the 32P Marvel EV7 @ 1.15 GHz
  • Using Multiple 4 GB Ramdisks
  • Started at 16P, ramped up workload
  • Then increased to 24P, throughput dropped
  • Then affinitized jobs, throughput jumped
  • Combines timeline data from t4, spl, bmesh
background to this test
Background to this test
  • RMS1 is based on a customer developed database benchmark test originally written using Rdb and converted to carry out the same task with RMS
  • To generate extremely high rates of IO in order to discover the limits of Marvel 32P performance, we ran multiple copies of RMS1, each using their own dedicated RAMdisk
  • Caution: The net effect is a test that generates an extremely heavy load, but that cannot be considered to mirror any typical production environment.
timing of changes
Timing of Changes
  • 12:05 16 CPUs
  • 12:11 Start ramp up with 4GB ramdisks
  • 12:30 Increase to 24 CPUs
  • 12:38 Set Process Affinity
  • 12:55 Turn off dedicated lock manager

<Observe how timelines help make sense

of this complicated situation>

direct io up to 70 000 per second
Direct IO up to 70,000 per second!

For the RMS1 workload, the rate of direct IO per second is a key metric of maximum throughput.

Increasing to 24 CPUs, at 12:30 does not increase throughput.

Turning on Affinity causes throughput to jump from 55,000 to over 70,000, and increase of approximately 30% (1.3X)

kernel mpsynch switch roles
Kernel & MPsynch switch roles

12:30 is when we jumped from 16 CPUs to 24 CPUs. Note how MPsynch (green) jumps up substantially at that time to over 950%.

At 12:37, we started affinitizing the different processes to CPUs we believed to be close to where there associated RAMdisk was located.

Note how MPsynch and Kernel mode cross over at that point.

lock busy from t4 shows jump with affinity
Lock Busy % from T4 shows jump with affinity

We had dedicated lock manager turned on for this test which creates a very heaving locking load.

Note that there is no change when the number of CPUs is increased at around 12:30.

Note the big jump in Lock % busy that happens when we affinitize.

At over 90% busy, locking is a clear primary bottleneck that will prevent further increases in throughput even with more CPUs.

lock requests per sec vs xfc write a true linear relationship
Lock requests per sec vs XFC writeA True Linear Relationship

The maximum rate of lock requests per minute is an astounding 450,000 per second.

case 3 summary
Case 3 - SUMMARY
  • These are by far the best throughput numbers we have ever seen on this workload for:
    • Direct IO, Lock requests per second.
  • Performance is great out of the box.
  • New tools simplify bottleneck identification
  • Straightforward tuning pushes to even higher values with a surprising large upward jump
  • Workloads show consistent ratios between key statistics (e.g. Lock Requests per DIRIO)
  • Spinlock related bottlenecks remain with us, albeit at dramatically higher throughput levels
case 4 production system
Case 4 – Production System
  • Upgrade from 16 CPU Wildfire EV68 running at 1.224 GHz (the fastest Wildfire)
  • Compared to 16 CPU Marvel EV7 running at 1.15 GHz
  • Oracle, TCPIP, Mixed Database Server and Application Server
cpu busy cut in half note color switch
CPU Busy cut in halfNote Color Switch!!!

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

cpu 0 interrupt is cut by a factor of more than 3 to 1
CPU 0 Interrupt is cut by a factor of more than 3 to 1

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

buffered io sustained higher peaks
Buffered IO – sustained higher peaks

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

direct io sustained higher peaks
Direct IO – sustained higher peaks

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

system wide interrupt diminished by a factor of 4 to 1
System Wide Interrupt diminished by a factor of 4 to 1

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

mpsynch shrinks by more than 8 to 1
MPsynch shrinks by more than 8 to 1

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

kernel mode decreases from 260 to 150
Kernel Mode decreases from 260 to 150

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

user mode decreases from about 480 to 240
User Mode decreases from about 480 to 240

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

compute queue disappears
Compute Queue disappears

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

packets per second head and shoulders with higher peaks
Packets per second – head and shoulders with higher peaks

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

mailbox writes head and shoulders with higher peaks
Mailbox Writes – head and shoulders with higher peaks

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

dedicated lock manager busy drops from 18 down to about 6
Dedicated Lock Manager Busy drops from 18% down to about 6%

Red is GS1280 with 16 CPUs at 1.15 GHz

Green is GS160 with 16 CPUs at 1.224 GHz

case 4 summary
Case 4 - SUMMARY
  • The GS160 with 16 CPUs had been highly tuned, yet was unable to handle the heaviest peak loads presented.
  • Bottleneck was related to reaching maximum TCPIP throughput, related MPsynch, and limits on the max BUFIO
  • GS1280 immediately, without further adjustment, provided dramatic increase in maximum throughput & huge improvement in spare capacity and headroom.
case 5 production system
Case 5 – Production System
  • NOTE color switch in slides
  • Upgrade 12P GS140 to 16P GS1280
  • Mixed Application and Database Server
mpsynch almost disappears drops from 130 to under 10
MPsynch almost disappearsDrops from 130 to under 10%

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

kernel mode shrinks by more than 5 to 1
Kernel Mode shrinks by more than 5 to 1

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

system wide interrupt also shrinks by more than 5 to 1
System Wide Interrupt also shrinks by more than 5 to 1

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

user mode is cut in half
User Mode is cut in half

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

cpu busy drops by almost 3 to 1
CPU busy drops by almost 3 to 1

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

cpu 0 interrupt almost disappears and drops by more than 6 to1
CPU 0 Interrupt almost disappears and drops by more than 6 to1

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

buffered io shows consistently higher peaks there was a real backlog of work waiting to be serviced
Buffered IO – shows consistently higher peaks. There was a real backlog of work waiting to be serviced

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

direct io shows substantially higher peaks which are short lived
Direct IO shows substantially higher peakswhich are short-lived

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

mailbox write increases from 1400 to over 2400
Mailbox write increases from 1400 to over 2400

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

compute queue evaporates
Compute Queue evaporates

Red is GS140 with 12 CPUs

Green is GS1280 with 16 CPUs

case 5 summary
Case 5 - SUMMARY
  • Huge backlog of work can now be handled successfully during long lasting peak periods as demonstrated by higher buffered IO and other throughput metrics.
  • Substantial further reserves of spare capacity
  • Large changes in key performance metrics such as MPsynch, interrupt.
proof point summary
Proof Point Summary
  • Marvel EV7 GS1280 systems are the best performing VMS systems ever.
  • Excellent out-of-the-box performance
  • Superior SMP scaling
  • Huge increases in maximum throughput, some realized immediately, the rest held in reserve as spare capacity.
  • Marvel provides the headroom for future growth
can your enterprise benefit from an upgrade to a gs12801
Can your enterprise benefit from an upgrade to a GS1280?
  • Systems with high MPsynch
  • Systems with high primary CPU interrupt load
  • Poor SMP scaling
  • Heavy locking
  • Heavy IO, Direct, Buffered, Mailbox
  • Heavy use of Oracle, TCPIP, Multinet
  • Look closer if:
    • Systems with poor response time
    • Systems with insufficient peak period throughput
would you like to participate in our marvel proof point program1
Would you like to participate in our Marvel Proof Point Program?
  • Contact steve.lieman@hp.com for more information about how you can take part
  • Download T4 kit from public web site:

http://h71000.www7.hp.com/OpenVMS/products/t4/index.html

  • Start creating a compact, portable,T4 based performance history of your most important systems
  • The T4 data will create a common and efficient language for our discussions. We can then work with you and help you evaluate your unique pattern of use and the degree to which the advantages of Marvel EV7 on OpenVMS can most benefit you.