Performance Engineering

1 / 39

# Performance Engineering - PowerPoint PPT Presentation

Performance Engineering. Looking at Random Data & A Simulation Example. Prof. Jerry Breecher. Goals:. Look at the nature of random data. What happens as random data is used in multiple operations?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Performance Engineering' - tanika

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Performance Engineering

Looking at Random Data &

A Simulation Example

Prof. Jerry Breecher

### Goals:

Look at the nature of random data. What happens as random data is used in multiple operations?

Look at how network arrivals really work – are arrivals random or do they follow some other pattern?

Use our simulation techniques to study these patterns (so this is really an example of simulation usage).

Determine the difference in behavior as a result of network arrival patterns.

Random Arrivals

### Random Data

Suppose we have a random number generator. And suppose we run a program using that data multiple times.

Do the results of those multiple program executions converge or diverge?

There is no simple intuitive answer to this question, so let’s try it.

### Random Data

Let’s take a very simple piece of code:

if ( random() >= 0.5 )

else

When we run the program, we collect the value of the variable every 100 million iterations – and do it for a total of 1 billion iterations.

Here’s a sample run.

After 400 million iterations, there were 3192 more “heads” than “tails”.

### Random Data

Now lets do that same thing for 8 processes

What do you think will happen to the numbers?

Will some process always have more heads than tails?

Will the difference between results for processes depend on how many iterations have been done?

Here’s the result for 8 processes:

### Random Data

And here’s the graph for those 8 processes – note there’s been a constant amount added to each value to get all the outputs positive.

### Random Data

As you can see in the last graph, the statistics are terrible – it’s hard to determine the pattern for multiple runs.

So the program was run 10,000 times. And the minimum and maximum count was taken at each time interval for those 10,000 runs.

### Random Data

But, what happens if the processes doing random events interact with each other?

This is the case if the programs are all accessing the same disk – we randomly choose which block in a large file is being written to. But each process must compete for the file lock and for disk access.

Here’s the behavior of 10 disk-writing processes for 10,000 seconds. The numbers represent disk writes for that process during the time interval.

### Random Data

The accesses are clearly very close to each other

### Random Data

Comparing the 10 processes. This is the spread (difference) of the maximum less the minimum accesses for the process.

### Random Data

Comparing the 10 processes. Here’s how their relative performance varies over time. Note that no one process is always the minimum or the maximum performer.

### Another Numerical Example

I have two virtual cats, who share a single can of food at each meal. My cats are very finicky and get angry if their portions are unequal. I am finicky too, and I don't like dirtying dishes when I divvy it up.

To split the food, then, I upend the open can of food onto a flat plate, then carefully lift the can off, leaving a perfectly formed virtual cylinder of food.

Then I use the vanishingly small circular edge of the can to carefully cut the food into two exactly equal portions, one of which is shaped like a crescent moon, the other a cat's eye, or mandorla.

### Another Numerical Example

X

B

B

A

A

a

a

X

// //////////////////////////////////////////////////////////////////////

// We're trying to solve the following problem.

// Given two circles, how close should the centers of the circles be such

// that the area subtended by the arcs of the two circles is exactly one

// half the total area of the circle.

//

// See example 2.3.8 in Leemis & Park.

// We use the book's definition for Uniform - see 2.3.3

// Here's how this works. Try a number of different distances between

// the two circle centers. Then for the ones that are most successful,

// zoom in to do them in more detail.

// //////////////////////////////////////////////////////////////////////

#include <math.h>

#include <stdlib.h>

#define PI 3.1415927

#define TRUE 1

#define FALSE 0

// Prototypes

double GetRandomNumber( void );

void InitializeRandomNumber( );

double ModelTwoCircles( double, int );

double Uniform( double min, double max) {

return( min + (max - min)*GetRandomNumber() );

}

### Another Numerical Example

int main( int argc, char *argv[] ) {

double Distance, Result = 0;

double FirstSample = 0.1, LastSample = 1.9;

double Increment, NewFirstSample;

double BestDistance;

int NumberOfSamples = 5000;

InitializeRandomNumber();

printf( "\nNext Iteration starts at %f\n", FirstSample );

Increment = (LastSample - FirstSample)/10;

NumberOfSamples = 2 * NumberOfSamples;

for ( Distance = FirstSample; Distance <= LastSample; Distance += Increment ){

Result = ModelTwoCircles( Distance, NumberOfSamples );

if ( Result - 0.5000 > 0 )

NewFirstSample = Distance;

if ( (0.5 - Result) < 0.0001 && (Result - 0.5) < 0.0001 ) {

BestDistance = Distance;

}

printf( "Distance = %8.6f, Fraction = %8.6f\n", Distance, Result );

}

FirstSample = NewFirstSample - 2 * Increment;

LastSample = FirstSample + 4 * Increment;

}

printf( "\nThe best Distance is at %f using %d samples\n",

BestDistance, NumberOfSamples );

}

double ModelTwoCircles( double Distance, int NumberOfSamples ) {

double HitsInOneCircle = 0, HitsInTwoCircles = 0;

double x, y, SecondDistance;

int Samples;

for ( Samples = 0; Samples < NumberOfSamples; Samples++ ) {

do {

x = Uniform( -1, 1 );

y = Uniform( -1, 1 );

} while ( (x * x) + (y * y) >= 1 ); // Loop until value in circle

HitsInOneCircle++;

SecondDistance = sqrt( ( x - Distance ) * (x - Distance ) + (y * y) );

if ( SecondDistance < 1.0 ) {

HitsInTwoCircles++;

// printf( "Samples: Second Distance = %8.6f\n", SecondDistance );

}

} // End of for

return( HitsInTwoCircles / HitsInOneCircle );

}

Random Arrivals

### Network Arrivals

In our queueing analysis, we’ve assumed random arrivals (Poisson distribution, with exponentially distributed inter-arrival times.)

This leads to our analysis of M/M/1 queues with

Utilization = Service Time/Arrival Time and with

Queue Length = U / ( 1 – U ).

We generated uniformly distributed random numbers and based on those were able to derive the exponential arrival times and Poisson distributions.

But is this how networks behave?

Self-Similar Arrivals

• On the Self-Similar Nature of Ethernet Traffic
• Leland, Taqqu, Willinger, Wilson. IEEE/ACM ToN, Vol. 2, pp 1-15, 1994
• Establish self-similar nature of Ethernet traffic
• Illustrate the differences between self-similar and standard models
• Show serious implications of self-similar traffic for design, control and performance analysis of packet-based communication systems

### Network Arrivals

This how networks really behave?

Millions of packets from many workstations, as recorded on Bellcore internal networks.

### What Did Leland et.al Measure?

Significance of self-similarity

Nature of traffic generated by individual Ethernet users. Aggregate traffic study provides insights into traffic generated by individual users. Nature of congestion produced by self-similar models differs drastically from that predicted by standard formal models. We will show this by the simulation we perform here.

Why is Ethernet traffic self-similar?

Plausible physical explanation of self similarity in Ethernet traffic. (People don’t generate traffic randomly. They come to work at the same time, get tired at the same time, etc.)

Mathematical Result

• Superposition of many ON/OFF sources whose ON-periods and OFF-periods have high variability or infinite variance produces aggregate network traffic that is self-similar or long range independent.

(Infinite variance here means that there are some samples with a very long inter-arrival time (lunch hour is a very long time!)

The answer is the data is bunched together – it’s not spread uniformly – and to be self-similar, the “bunches” themselves form “super-bunches”.

### Where does “Self-Similar” Data Occur?

It occurs throughout nature. Also called Pareto Distribution, Bradford, Zipf, and various other names.

Distribution of books checked out of a library.

Distribution of lengths of rivers in the world.

It’s NOT the same as an exponential distribution! (But it can look fairly close.)

Fractals are an example of self-similarity.

In these equations:

a = 1 (exponent falls to 1/e when x = 1.) The mean of these values is 1. Turns out the variance is also 1. The exponent is special that way.

X0 is = 2. Then b was adjusted so that it gave a mean of 1.

Arrivals for both distributions therefore have the same mean value.

### Exponential and Self-Similar Data

Exponential Cumulative Function F(x) = 1 – e(-ax)

Exponential Probability Density Function (PDF) f(x) = a e(-ax)

Pareto Cumulative Function F(x) = 1 – (X0 / (X0 + x) )b

Pareto Probability Density Function (PDF) f(x) = b X0 b/ (X0+x) (b+1)

### Exponential and Self-Similar Data

Note that the Pareto data has a higher value at the limits – this is what leads to it being self-same and to the data having a large variance.

Pareto PDF (Purple)

Exp PDF (Black)

Simulation

So I wrote a simulator.

There are two parts I especially want to show you:

• The “guts” of the simulator – how events are taken off a queue and are processed; that processing generates new events.
• How data is generated – starting with a random number in the range 0  1, how do we get an exponential distribution.
• Here’s the code I used for the simulation. It’s not beautiful, but the price is right.

http://www.cs.wpi.edu/~jb/CS533/Lectures/ArrivalSimulation.c

Simulation Example

Simulation

Initialize

Event Queue

Determine Next Event

SCHEMATIC OF EVENT DRIVEN SIMULATION OF A NETWORK

Set current time to the time of this event.

Is it arrival or completion?

Packet approaches network

Network Service Completed

Put packet on network; if queue WAS empty, generate a completion event

Take packet off queue; if queue still has a packet, then generate completion.

Determine future timefor next packet arriving.

Determine when next packet will finish.

Generate event for “Packet arrives at Q"

Generate event for “Service Completed"

Update Statistics

Simulation Example

The Guts of the Simulation

while( Iterations < RequestedArrivals ) {

RemoveEvent( &CurrentSimulationTime, &EventType );

if ( EventType == ARRIVAL ) {

if ( ArrivalDiscipline == EXPONENTIAL )

NextEventTimeInterval = GetExponentialArrival( ExponentialArrivalValue );

if ( ArrivalDiscipline == PARETO )

NextEventTimeInterval = GetParetoArrival( ParetoArrivalValue );

StoreStats( NextEventTimeInterval );

AddEvent( CurrentSimulationTime + NextEventTimeInterval, ARRIVAL );

if ( QueueLength == 0 ) { // Schedule completion event for this request

NextEventTimeInterval = GetExponentialArrival( ServiceRate );

AddEvent( CurrentSimulationTime + NextEventTimeInterval, COMPLETION );

}

// Do counting of state for stats purposes

QueueLength++;

} // End of EventType == ARRIVAL

if ( EventType == COMPLETION ) {

QueueLength--;

if ( QueueLength > 0 ) { // Something else needs service

NextEventTimeInterval = GetExponentialArrival( ServiceRate );

AddEvent( CurrentSimulationTime + NextEventTimeInterval, COMPLETION );

}

} // End of EventType == COMPLETION

} // End of while iterations

// Print out the statistics:

PrintStats();

Simulation Example

Data Generation

Here’s the question we want to answer – given a PDF, how do we find what value generates a particular value of that PDF.

For instance, applying this question to the

Exponential Probability Density Function (PDF) f(x) = a e(-ax) , or

f(x) = e –x for a == 1.

what value of x produces the resultant f(x)? We generate random numbers in the range of 0  1. These are the f(x). So what values of x will give us this range of f(x)?

For x = 0, f(x) == 1;

For x = infinity, f(x) = 0.

This inverse mapping is most easily accomplished by taking the inverse function. x = -ln( f(x) )  x = -ln( rand() )

Here’s the essence of this code:

double GetExponentialArrival( double Argument ) {

return( -log( 1.0 - GetRandomNumber() )/ Argument );

} // End of GetExponentialArrival

Simulation Example

Data Generation
• So having an inverse function is very nice – it’s one reason that using exponential function is so handy, and so universal. But for the Pareto PDF
• f(x) = b X0b / (X0+x)(b+1)
• The inverse function is much more difficult to find in this case. I solved this by doing a search. The binary search algorithm goes like this:
• Pick a random number in the range 0 1; R = random();
• Calculate an f(y), and f(z) such that one of these is larger than R and one is smaller than R.
• Calculate f( (y + z )/2 ) – for a value half way between y and z.
• Determine y and z such that f(y) and f(z) again straddle R.
• Loop to Step 3 until the value of ( R – f(y) ) is arbitrarily small.
• All this is messy and compute intensive – but that’s the way it is when there’s no inverse function.

Simulation Example

Simulation Results

Results look very similar to the analytical functions.

Simulation Example

Simulation Results

Simulation Example

Graphs

Simulation Example

### Marriage & Divorce Simulation

The goal of this exercise to show the simulation of a “society”. In the larger context, it’s an example of how students might perform a simulation.

Given a body of data, how do we arrange that data in order to represent how the society is behaving. This is essentially a “model” using the data.

There are three ways we go about putting numerical values on this model.:

Given a series of equations, can we simply solve the equations?

If the equations don’t have a closed form solution, can we solve them recursively. There are no statistics involved here, but all we do is solve each equation over and over again and hope that it converges. This method gives us no details about the population since we’re simply solving equations.

We can try for a “real” simulation. In this case, we use the probabilities and a random generator to try to simulate good years and bad years. This allows us to answer much more complex situations. We could now track characteristics for each individual in our society. We could, possibly, see how long a person in our society stays married for instance.

Simulation Example

### Marriage & Divorce Simulation

There’s lots of stuff on the web, confusing and maybe contradictory:

All data is for the US.

In 2007, there were 2,200,000 marriages. This represents a rate of 7.5 per 1000 total population. Note this is 2.2M / 296M = 7.5. (Total US population is higher but some states don’t report.)

Another metric which may be saying the same thing is that there are 39.9 marriages per 1000 single women. We’re going to use the first number here.

In 2007, there were 856,000 divorces. This is 3.6 per 1000 total population.

Interesting numbers, but not used here:

41% of 1st marriages end in divorce.

60% of 2nd marriages end in divorce.

74% of 3rd marriages end in divorce.

The average remarriage occurs 3.3 years after a divorce.

In 2007 there were 2.400,000 deaths representing a rate of 8.2 per 1000. Details of this on next page.

60% of all marriages last until 1 partner dies

Birth rate is 13.8 per 1,000 population

Recent statistics say that 51% of the adult population is married. This is important because we don’t use it directly as one of our equations – we use it to test if our model gives approximately this answer.

Simulation Example

### Marriage & Divorce Simulation

In 2007 there were 2.400,000 deaths representing a rate of 8.2 per thousand.

Details on this mortality data are for men and women 65+ :

Death rate for married man is defined as 1.00

Death rate for a widowed man is 1.06 times that of a married man.

Death rate for a divorced or separated man is 1.14 times that of a married man.

Death rate for a never-married man is 1.05 times that of a married man.

Death rate for married woman is defined as 1.00

Death rate for widowed woman is defined as 1.15

Death rate for divorced or separated woman is defined as 1.26

Death rate for a never-married woman is 1.18 times that of a married woman.

This information is from “US Mortality by Economic, Demographic, and Social Characteristics: The National Longitudinal Mortality Study”, Sorlie, Backlund, and Keller, 1995

We use a rate that’s above and below the 8.2 per 1000 for the national average to take into account single and married rates.

DeathMarriedRate = 7.6 per 1000

DeathSingleRate = 8.7 per 1000

Simulation Example

Marriage & Divorce Simulation

Zombie

Reincarnation = 100%

Birth Rate

Death while Single

Single

Divorce Rate

Widowed

Marriage Rate

Married

Death while Married

Simulation Example

Leaving Zombie:

DZ = - Rbirth * ( S + M )

Entering Zombie:

DZ = + Rdeath-single * S + Rdeath-married * M

Leaving Single:

DS = -2 * Rmarriage * ( S + M ) - Rdeath-single * S

Entering Single:

DS = + Rbirth * ( S + M ) + 2 * Rdivorce * ( S + M ) + Rdeath-married * M

Leaving Married:

DM= -2 * Rdivorce * ( S + M ) - Rdeath-married * M

Entering Married:

DM= + 2 * Rmarriage * ( S + M )

In Steady State – leaving equals entering

+ Rdeath-single * S + Rdeath-married * M - Rbirth * ( S + M ) = 0

+ Rbirth * ( S + M ) + 2 * Rdivorce * ( S + M ) + Rdeath-married * M -2 * Rmarriage * ( S + M ) - Rdeath-single * S = 0

+ 2 * Rmarriage * ( S + M ) - 2 * Rdivorce * ( S + M ) - Rdeath-married * M = 0

Marriage & Divorce Simulation

Simulation Example

In Steady State – leaving equals entering

+ Rdeath-single * S + Rdeath-married * M - Rbirth * ( S + M ) = 0

+ Rbirth * ( S + M ) + 2 * Rdivorce * ( S + M ) + Rdeath-married * M -2 * Rmarriage * ( S + M ) - Rdeath-single * S = 0

+ 2 * Rmarriage * ( S + M ) - 2 * Rdivorce * ( S + M ) - Rdeath-married * M = 0

Rearranging these equations gives:

- Rbirth * ( S + M ) + Rdeath-single * S + Rdeath-married * M = 0

+ Rbirth * ( S + M ) - 2 * Rmarriage * ( S + M ) + 2 * Rdivorce * ( S + M ) - Rdeath-single * S + Rdeath-married * M = 0

+ 2 * Rmarriage * ( S + M ) - 2 * Rdivorce * ( S + M ) - Rdeath-married * M = 0

Maybe there’s a solution, but they seem redundant to me.

Marriage & Divorce Simulation

Here are links to the code and executables for this simulation:

MarriageAndDivorceSimulation1.c // Recursively solves the equations

MarriageAndDivorceSimulation1.exe

MarriageAndDivorceSimulation2.c // Does a statistical simulation

MarriageAndDivorceSimulation2.exe

Simulation Example

WRAPUP

This section has shown the result of a simulation. It’s gone through the coding, the data generation, and the interpretation of results.

If network arrivals are Self-Similar, what about all kinds of other data generated by computers? What about requests arriving at a disk? What about processes arriving at a ready queue?

Is there any computer data that REALLY is random, or is it all self-similar?

Simulation Example