260 likes | 394 Views
Comparing alternative systems. Simultaneous output analyses for more than one system. Introduction. Previously, we considered only one system and decided the run length as well as # of runs (replications) one should have for statistical analyses.
E N D
Comparing alternative systems Simultaneous output analyses for more than one system
Introduction • Previously, we considered only one system and decided the run length as well as # of runs (replications) one should have for statistical analyses. • Here, we discuss statistical analyses of several different simulation models that might represent competing system designs or alternate operating policies. • This is much more practical than considering just one system, because in reality an organization is always trying many things simultaneously. • As was shown in previous section, just taking one replication leads to erroneous inferences about the multiple systems as well.
Introduction • Considering multiple systems simultaneously also has additional complexities. • Other issues, such as correlation between outputs of these systems, must be considered. • Another issue is whether to calculate descriptive statistics for each systems individually and then compare or adopt a different method which considers all the models together. • That is, whether to look for individual differences in each run or look for collective difference. • First we will look at comparison between two terminating simulation models and then look at non-terminating case.
Comparing two systems • We consider a special case of comparing two systems on the basis of some performance measure (or expected system response). • We calculate the confidence interval for the difference in the two expectations. • Hypothesis testing approach is not used because confidence interval approach gives us additional information. • For i = 1, 2, let Xi1, Xi2, … Xini be a sample of ni IID observations from system i. • Let µi = E[Xi] be the expected response of the interest. • We want to build confidence interval for the difference ζ = µ1 - µ2.
Comparing two systems: paired t-test • Let n1 = n2 = n. • This means that if n1 > n2, then to make the two samples of equal size, we loose some data from sample 1. • Let X1j be the performance measure from system 1 in the first replication. Correspondingly let X2jbe the measure from second system. • Let Zj = X1j – X2j for j = 1,2 … n. • Then the Zj variables are IID variables and its expected value, E[Zj] = ζ. • And this the quantity for which we want to build the confidence interval.
Comparing two systems: paired t-test • We get that: • And using this, we form the 100(1 – α) percent confidence interval as:
Comparing two systems: paired t-test • If Zj’s are normally distributed, then this confidence interval is exact – it covers ζ with probability 1 – α. • Otherwise we rely on the central limit theorem, which implies that this coverage probability will be near 1 – α for large n. • We did not assume that X1j and X2j are independent. • Actually, allowing positive correlation between X1j and X2j actually reduces the variance of Zj’s. • Neither did we assume that equality of variance Var(X1j) = Var(X2j). • Such confidence interval is called paired t-confidence interval.
Modified two-sample-t-Confidence interval • One may argue that in the previous method, we lost some information because we wanted to keep sample sizes same for both the models. • Also, some more information may have been lost because of pairing-up the two systems right upfront. • We look at this modified method where we need not pair up the observations from the two systems. • However, this method does require that Xij’s be independent. • There are two versions of the two-sample-t-confidence interval: Classical and Modified.
Modified two-sample-t-Confidence interval • In the classical method, we must have the same variance condition satisfied. That is, Var(X1j) = Var(X2j). • If this condition is not satisfied, we may get serious hit in the coverage of the confidence interval. • Note that this is not that severe when the sample sizes are equal. • However, the condition of equality of variances is almost never satisfied. • Hence we use the modified method which does not require the condition.
Modified two-sample-t-Confidence interval • Let • The estimated degrees of freedom is given by: • And the 100(1 – α) confidence interval for ζ is given by:
Comparison based on steady-state performance measures • Here, we can’t easily replicate the models, since the initialization (warm-up) effects may bias the output. • Hence, generally it is more difficult to compare two systems based on steady-state performance measures. • One way of achieving that could be to define: where li is warm-up time for system i and mi is the minimum number of Dijp’s in any replications.
Comparison based on steady-state performance measures • Then we can define the difference variable: Zj = X1j – X2j for j = 1, 2,…n. • Let νi be the steady-state mean behavior of the system i. then Zj variables gives an estimation of ν1 – ν2.
Comparison of more than two systems • While analyzing more than two systems, we have to make several confidence-interval statements simultaneously. • Hence individual confidence levels should be raised upwards so that overall confidence level of all intervals covering their respective targets is at the desired level 1 – α. • We use Bonferroni method. • Suppose that Is is a 100(1 – αs) percent confidence interval for the measure of performance µs (where s = 1, 2, …k.)
Comparison of more than two systems • Then the probability that allk confidence intervals simultaneously contain their respective true measures satisfies: • Suppose that one constructs 90% confidence intervals, that is αs = 0.1, for s, for 10 different systems. • Then the probability that each of the 10 confidence intervals contain their true measure can only be claimed to be greater than or equal to zero. • Thus one cannot have much overall confidence in drawing any conclusions from such study.
Comparison of more than two systems • If we want to make some c number of confidence interval statements, then the trick is to make each separate interval at level 1 – α/c, so that the overall confidence level associated with all intervals covering their targets will be at least 1 – α. • For example, if we want to make c = 10 intervals and get an overall confidence level of 100(1 – α) percent = 90%, then we must make each individual interval at the 99% level. • Clearly for large c, this implies that the individual intervals may become quite wide.
Comparison of more than two systems Comparing with the standard • Suppose that one of the model variant is a “standard,” perhaps representing the existing system or policy. • Say the standard system is “System 1” and other variant systems are 2, 3, …k, the goal then is to construct k – 1 confidence intervals for the k – 1 differences: µ2 – µ1, µ3 – µ1, µ4 – µ1,… µk – µ1 with overall confidence level 1 – α. • Thus we make c = k – 1 individual intervals, so they must each be constructed at level 1 – α/(k – 1).
Comparison of more than two systems Comparing with the standard • Then we can say (with a confidence level of at least 1 – α) that for i = 2, 3, …k, the system i differs from the standard if the interval for µi – µ1 misses 0, and that system i is not significantly different from the standard if this interval contains zero.
Comparison of more than two systems All pair-wise comparisons • Sometimes we may want to compare each system with every other system to detect and quantify any significant pair-wise differences. • There may not an existing system and all k alternatives represent possible implementations that should be treated in the same way. • One approach would be to form confidence intervals for the differences µi2 – µi1 for all i1 and i2 between 1 and k with i1 < i2. • Hence there will be k(k – 1)/2 individual intervals. • Hence each must be made at level (1 – α/[k(k – 1)/2]) in order to have confidence level of at least 1 – α for all the intervals together.
Comparison of more than two systems Selection of the best out of the k systems • Let Xij be the performance measure of interest from the jth replication of the ith system. • For all the selection problems, we assume that all Xij’s are independent of each other. That is, replications for a given alternative are independent, and the runs for each different alternatives are also made independently. • For example, this Xij could be the average total cost per month for the jth replication of policy i for the inventory model. • Let µil be the lth smallest of the population performance measure µi’s so that: µi1 <= µi2 <= µi3…. <= µik. • Our goal is to select a system with the smallest expected response, µi1.
Selection of the best out of the k systems • Let “CS” denote the event of “correct selection.” • The inherent randomness of the observed Xij’s implies that we can never be absolutely sure that we shall make the CS, we would like to be able to pre-specify the probability of CS. • Also, if µi1 and µi2 are very close to each other, we might not case if we erroneously choose system i2 (the one with the meanµi2). • So, we want a method that avoids making a large number of replications to resolve this unimportant difference. • Problem statement: We want Pr{CS} >= P* provided µi1– µi2 >= d*, where the minimum probability P* and the “indifference” tolerance d* is specified.
Selection of the best out of the k systems • One might naturally ask: What if µi1– µi2< d*? • The method specified here guarantees that with probability P*, the expected performance measure of the selected system will be no larger than µi1 + d*. • Thus, we are protected (with a probability P*) against selecting a system with mean that is more than d* worse than that of the best selected system. • The proposed involves “two-staged” sampling from each of the k systems.
Selection of the best out of the k systems • In the first stage we make a fixed number of replications of each system. • Then use the resulting variance estimates to determine how many more replications from each system are necessary in the second stage. • We assume that Xij’s are normally distributed, however we need not assume equality of variances; nor do we have to assume that the population variances are known, that is σi2 = Var(Xij) are known for all i. • In fact, the method is robust even if the normality assumption is violated if the Xij’s are averages.
Selection of the best out of the k systems • In the first stage of sampling, we make n0 > 2 replications for each of the k systems and define the first stage sample mean and variance: • Then we compute the total sample size Ni needed for system i.
Selection of the best out of the k systems • Here, h1 – that depends on k, P*, and n0 – is a constant, and can be obtained from a standard table. • Next, we make Ni – n0 additional replications for system i. We obtain the second stage sample mean: • Then we define the two weights to be used to get the weighted average of first stage and second stage sample means.
Selection of the best out of the k systems • First weight: • Finally, define the weighted sample means
Selection of the best out of the k systems • Finally, we select the system with the smallest weighted sample mean. • The choice of P* and d* depend on the goals and the particular systems under study. • Choice of these should be made considering trade-off between the computing cost or obtaining a large # of replications and large P* and small d*.