- 166 Views
- Uploaded on
- Presentation posted in: General

Input Data Analysis 3 Goodness-of-Fit Tests

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

**1. **1 Input Data Analysis (3) Goodness-of-Fit Tests

**2. **2 Input Data Analysis Activity I: Hypothesizing families of distributions (“What does it look like?”)
Activity II: Estimation of parameters (“How is it represented?”)
Activity III: Determining how representative the fitted distributions are (“How accurate is the representation?”)

**3. **3 Goodness-of-Fit Test Used to test hypothesis that a given number of data points (e.g., collected from the system - field data) are independent samples from a particular probability distribution
Observation: X1, X2, X3, …, Xn
H0: Xi’s are IID random variables with distribution function F
e.g., uniform, normal, exponential, etc.

**4. **4 Goodness-of-Fit Test Note:
Failure to reject H0 should NOT be interpreted as “accepting H0 as being true”
Because these tests are not very powerful for small to moderate sample size n (often real field data are limited in size) - not sensitive to subtle deviation between sample data and the fitted distribution
If n is very large, then these test will almost always reject H0, since H0 is virtually never exactly true
But for practical reasons, “nearly” correct is acceptable

**5. **5 The Chi-Square Test Pearson (1900)
First, divide the entire range of the fitted distribution into k adjacent intervals with equal lengths except the 1st and the last: [a0, a1), [a1, a2), [a2, a3), …, [ak-1, ak]
a0 = -?, ak = +? ? First: (-?, a1), Last: [ak, +?)
Then tally
Nj = number of Xi’s in the jth interval [aj-1, aj)
for j = 1, 2, …, k (note: ?(j=1 to k) Nj = n)
Compute the expected Proportion Pj of the Xi’s that would fall in the jth interval

**6. **6 The Chi-Square Test For continuous case
For discrete data (our testing interest)

**7. **7 The Chi-Square Test Test statistics
Since npj is the expected number of n Xi’s that would fall in the jth interval
If H0 were true, we would expect ?2 to be small
Reject H0, if ?2 is too large
Testing for large n with an approximate ?
Reject H0 is: ?2 > ?2k-1,1-? (L&K: Fig. 6.45)
where ?2k-1,1-? is the upper 1-? critical point for a Chi-Square distribution with k-1 d.f.

**9. **9 The Chi-Square Test The Chi-Square test is only valid, i.e., of level ?, asymptotically as n ? ?
(L&K: Fig. 6.46)
Also, need to estimate m parameters of the fitted distribution (m ? 1)
When maximum likelihood estimates are used, if H0 is true, as n ? ? the distribution of ?2 converges to a distribution that lies between the distribution functions of Chi-Square distributions with k-1 and k-m-1 d.f.

**11. **11 The Chi-Square Test Clear-Cut Cases
If ?2 > ?2k-1,1-? ? reject H0
If ?2 < ?2k-1,1-? ? do not reject H0
What if ?2k-m-1,1-? ? ?2 ? ?2k-1,1-?
Recommend:
Reject H0 only if ?2 > ?2k-1,1-? ? conservative
i.e., the actual probability ?’ of committing a type I error (reject H0 when H0 is true) is at least as small as the stated probability ?
Result: loss of power (small probability of rejecting a false H0) of the test

**12. **12 The Chi-Square Test Clear-Cut Cases (Continue)
Normally m?2 and if k is fairly large, the difference between ?2k-m-1,1-? and ?2k-1,1-? is very small
Difficulty:
Choose of number of intervals
1. Equal length
2. Equal-probable (p1= p2 = … = pk)
- difficult in practice
Less conservative: (maybe more practical)
If ?2 > ?2k-m-1,1-? ? reject H0

**13. **13 The Chi-Square Test Ex: Time between Arrival (TBA). 60 data points (all greater than 0) and Exp(20.1)
Cell Frequency Theoretical prop.
npj
-? [0,10) 19 23.52 0.869
[10,20) 16 14.31 0.200
[20,30) 12 8.69 1.261
[30,40) 8 5.29 1.388
[40,50) +? 5 3.21 1.236
? = 60 ?2 = 4.954
f(x) = (1/20.1) e-x/20.1 for x > 0
F(x) = 1 - e-x/20.1 [F(x) = ?(from 0 to x) f(x) dx]
df = 5 - 1 - 1 = 3 ? ?2 = 4.954 < ?23,1-0.05 = 7.81

**14. **14 The Chi-Square Test Ex: Inventory Model
n = 156 observations on the (discrete) number of items demanded in a week from an inventory over a 3-year period. The weekly demands are:
0 (59) 1 (26) 2 (24) 3 (18) 4 (12)
5 (5) 6 (4) 7 (3) 9 (3) 11 (2)

**15. **15 The Chi-Square Test Ex: Inventory Model (Continue)
Summary Statistics
Min. = 0.000 Max. = 11.000 Mode = 0
Mean = 1.891 Median = 1.000 Variance = 5.285
Lexis ratio ? = ?2/? = 2.795
Skewness ? = E[(X - ?)3]/(?2)3/2 = 1.655
Try to fit a geometric distribution geom(0.346)
With roughly equal number of observations: 3 intervals
j Interval Nj npj
1 {0} 59 53.960 0.471
2 {1,2} 50 58.382 1.203
3 {3,4,…} 47 43.658 0.256
?2 = 1.930
Compare with ?2 = 1.930 < ?23-1, 0.90 = 4.605
H0 is not rejected at ? = 0.10 level

**16. **Kolmogorov-Smirnov (K-S) Test The Kolmogorov-Smirnov test (K–S test) is a form of minimum distance estimation used as a non-parametric test of equality of one-dimensional probability distributions used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).
The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples.
The null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution (in the two-sample case) or that the sample is drawn from the reference distribution (in the one-sample case).
In each case, the distributions considered under the null hypothesis are continuous distributions but are otherwise unrestricted.
16

**17. **17 Kolmogorov-Smirnov (K-S) Test Compare an empirical distribution function with the distribution function of the hypothesized distribution
Advantages
K-S tests do not require to group data in any way, so no information is lost; this also eliminates the troublesome problem of interval specification
K-S tests are valid (exactly) for any sample size n (in the all-parameters-known case), whereas Chi-Square tests are valid only in an asymptotic sense
K-S tests tend to be more powerful than Chi-Square tests against many alternative distributions

**18. **18 Kolmogorov-Smirnov (K-S) Test Disadvantages
The range of applicability is more limited than that for Chi-Square tests
For discrete data, the required critical values are not readily available and must be computed using a complicated set of formulas
The original form of the K-S test is valid only if all the parameters of the hypothesized distribution are known and the distribution is continuous
Allow for estimation of the parameters in the cases of normal (lognormal), exponential, and Weibull distributions

**19. **19 Kolmogorov-Smirnov (K-S) Test An empirical distribution Fn(x) from data X1,…, Xn:
or
where is the indicator function, equal to 1 if Xi = x and equal to 0 otherwise.
Thus, Fn(x) is a step function
If is the fitted distribution function, a natural assessment of goodness of fit is some measure between Fn(x) and

**20. **Kolmogorov-Smirnov (K-S) Test 20

**21. **21 Kolmogorov-Smirnov (K-S) Test Dn can be calculated by:
Notes:
Direct computation of Dn+ and Dn– requires sorting the data to obtain X(i)’s
For moderate values of n (up to several hundreds), sorting can be done quickly by simple methods
If n is large, sorting becomes expensive!
A large value of Dn indicates a poor fit, so that it is to reject the null hypothesis H0 if Dn exceeds some constant dn,1-?, where ? is specified level of the test

**24. **24 Kolmogorov-Smirnov (K-S) Test Case 1: If all parameters of are known
None of the parameters is estimated in any way from the data, the distribution of Dn does not depend on , assuming that it is continuous
Instead of testing for Dn > dn,1-?, we reject H0 if
where c1-? are given in the all-parameters-known row of Table 6.14
This case is the original form of the K-S test

**26. **26 Kolmogorov-Smirnov (K-S) Test Case 2: Suppose that the hypothesized distribution is N(?, ?2) with both ? and ?2 unknown
Estimate ? and ?2 by X(n) and S2(n), respectively
Define the distribution function to be N(X(n), S2(n))
Using this , Dn is computed in the same way
We reject H0 if
where c’1-? are in the N(X(n), S2(n)) row of Table 6.14
This case includes a K-S test for the lognormal distribution if the Xi’s are the logarithms of the basic data points

**27. **27 Kolmogorov-Smirnov (K-S) Test Case 3: Suppose the hypothesized distribution is expo(?) with ? unknown
? is estimated by its MLE X(n)
Define to be the expo(X(n)) distribution function
Using this , Dn is computed
We reject H0 if
where c”1-? are given in the Expo(X(n)) row of Table 6.14

**28. **28 Kolmogorov-Smirnov (K-S) Test Case 4: Suppose the hypothesized distribution is Weibull with both shape parameter ? and scale parameter ? unknown
Estimate parameters ? and ? by their respective MLEs
is taken to be Weibull (MLEs of ? and ? )
Dn is computed in the usual fashion
We reject H0 if the adjusted K-S statistic is greater than the modified critical value c*1-? given in Table 6.15
Note that critical values are available only for certain sample sizes n, and that the critical values for n = 50 and ? are, fortunately, very similar