Tr 555 statistics refresher lecture 2 distributions and tests l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 171

TR 555 Statistics “Refresher” Lecture 2: Distributions and Tests PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on
  • Presentation posted in: General

WARNING! 175 slides! Print in draft mode, 6 to a page to conserve paper and ink!. TR 555 Statistics “Refresher” Lecture 2: Distributions and Tests. Binomial, Normal, Log Normal distributions Chi Square and K.S. tests for goodness of fit and independence Poisson and negative exponential

Download Presentation

TR 555 Statistics “Refresher” Lecture 2: Distributions and Tests

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tr 555 statistics refresher lecture 2 distributions and tests l.jpg

WARNING! 175 slides! Print in draft mode, 6 to a page to conserve paper and ink!

TR 555 Statistics “Refresher”Lecture 2: Distributions and Tests

  • Binomial, Normal, Log Normal distributions

  • Chi Square and K.S. tests for goodness of fit and independence

  • Poisson and negative exponential

  • Weibull distributions

  • Test Statistics, sample size and Confidence Intervals

  • Hypothesis testing

WARNING! 175 slides! Print in draft mode, 6 to a page to conserve paper and ink!


Another good reference l.jpg

Another good reference

  • http://www.itl.nist.gov/div898/handbook/index.htm


Another good reference3 l.jpg

Another good reference

  • http://www.ruf.rice.edu/~lane/stat_sim/index.html


Bernoulli trials l.jpg

Bernoulli Trials

  • Only two possible outcomes on each trial

    (one is arbitrarily labeled success, the other failure)

  • The probability of a success = P(S) = p is the same for each trial

    (equivalently, the probability of a failure = P(F) =

    1-P(S) = 1- p is the same for each trial

  • The trials are independent


Binomial a probability distribution l.jpg

Binomial, A Probability Distribution

  • n = a fixed number of Bernoulli trials

  • p = the probability of success in each trial

  • X = the number of successes in n trials

    The random variable X is called a binomial random variable. Its distribution is called a binomial distribution


The binomial distribution with n trials and success probability p is denoted by the equation l.jpg

The binomial distribution with n trials and success probability p is denoted by the equation

or


The binomial distribution with n trials and success probability p has l.jpg

The binomial distribution with n trials and success probability p has

  • Mean =

  • Variance =

  • Standard deviation =


Slide12 l.jpg

Binomial Distribution with p=.2, n=5


Slide13 l.jpg

Binomial Distribution with p=.2, n=10


Slide14 l.jpg

Binomial Distribution with p=.2, n=30


Slide15 l.jpg

Binomial Distributions with p=.2

n=5

n=10

n=30


Transportation example l.jpg

Transportation Example

  • The probability of making it safely from city A to city B is.9997 (do we generally know this?)

  • Traffic per day is 10,000 trips

  • Assuming independence, what is the probability that there will be more than 3 crashes in a day

  • What is the expected value of the number of crashes?


Transportation example17 l.jpg

Transportation Example

  • Expected value = np = .0003*10000 = 3

  • P(X>3) = 1- [P (X=0) + P (X=1) + P (X=2) + P (X=3)]

  • e.g.,P (x=3) = 10000!/(3!*9997!) *.0003^3 * .9997^9997 = .224

  • don’t just hit 9997! On your calculator!

  • P(X>3) = 1- [.050 + .149 + .224 + .224] = 65%


Continuous probability density functions l.jpg

Continuous probabilitydensity functions


Continuous probability density functions19 l.jpg

Continuous probabilitydensity functions

  • The curve describes probability of getting any range of values, say P(X > 120), P(X<100), P(110 < X < 120)

  • Area under the curve = probability

  • Area under whole curve = 1

  • Probability of getting specific number is 0, e.g. P(X=120) = 0


Histogram area of rectangle probability l.jpg

Histogram(Area of rectangle = probability)


Decrease interval size l.jpg

Decrease interval size...


Decrease interval size more l.jpg

Decrease interval size more….


Normal special kind of continuous p d f l.jpg

Normal: special kind of continuous p.d.f


Normal distribution l.jpg

Normal distribution


Characteristics of normal distribution l.jpg

Characteristics of normal distribution

  • Symmetric, bell-shaped curve.

  • Shape of curve depends on population mean and standard deviation .

  • Center of distribution is .

  • Spread is determined by .

  • Most values fall around the mean, but some values are smaller and some are larger.


Probability area under curve l.jpg

Probability = Area under curve

  • Normal integral cannot be solved, so must be numerically integrated - tables

  • We just need a table of probabilities for every possible normal distribution.

  • But there are an infinite number of normal distributions (one for each  and )!!

  • Solution is to “standardize.”


Standardizing l.jpg

Standardizing

  • Take value X and subtract its mean  from it, and then divide by its standard deviation . Call the resulting value Z.

  • That is, Z = (X- )/

  • Z is called the standard normal. Its mean  is 0 and standard deviation  is 1.

  • Then, use probability table for Z.


Using z table l.jpg

Using Z Table


Slide30 l.jpg

Suppose we want to calculate

where

We can calculate

And then use the fact that

We can find

from our Z table


Probability below 65 l.jpg

Probability below 65?


Slide32 l.jpg

Suppose we wanted to calculate

The using the law of complements, we have

This is the area under the curve to the right of z.


Probability above 75 l.jpg

Probability above 75?


Slide34 l.jpg

Now suppose we want to calculate

This is the area under the curve between a and b. We calculate this by first calculating the area to the left of b then subtracting the area to the left of a.

Key Formula!


Probability between 65 and 70 l.jpg

Probability between 65 and 70?


Transportation example36 l.jpg

Transportation Example

  • Average speeds are thought to be normally distributed

  • Sample speeds are taken, with X = 74.3 and sigma = 6.9

  • What is the speed likely to be exceeded only 5% of the time?

  • Z95 = 1.64 (one tail) = (x-74.3)/6.9

  • x = 85.6

  • What % are obeying the 75mph speed limit within a 5MPH grace?


Assessing normality l.jpg

Assessing Normality

  • the normal distribution requires that the mean is approximately equal to the median, bell shaped, and has the possibility of negative values

  • Histograms

  • Box plots

  • Normal probability plots

  • Chi Square or KS test of goodness of fit


Transforms log normal l.jpg

Transforms:Log Normal

  • If data are not normal, log of data may be

  • If so, …


Example of lognormal transform l.jpg

Example of Lognormal transform


Example of lognormal transform41 l.jpg

Example of Lognormal transform


Chi square test l.jpg

Chi Square Test

  • AKA cross-classification

  • Non-parametric test Use for nominal scale data (or convert your data to nominal scale/categories)

  • Test for normality (or in general, goodness of fit)

  • Test for independence(can also use Cramer’s coefficient for independence or Kendall’s tau for ratio, interval or ordinal data)

  • if used it is important to recognize that it formally applies only to discrete data, the bin intervals chosen influence the outcome, and exact methods (Mehta) provide more reliable results particularly for small sample size


Chi square test43 l.jpg

Chi Square Test

  • Tests for goodness of fit

  • Assumptions

    • The sample is a random sample.

    • The measurement scale is at least nominal

    • Each cell contains at least 5 observations

    • N observations

    • Break data into c categories

    • H0 observations follow some f(x)


Chi square test44 l.jpg

Chi Square Test

  • Expected number of observations in any cell

  • The test statistic

  • Reject (not from the distribution of interest) if chi square exceeds table value at 1-α (c-1-w degrees of freedom, where w is the number of parameters to be estimated)


Chi square test45 l.jpg

Chi Square Test

  • Tests independence of 2 variables

  • Assumptions

    • N observations

    • R categories for one variable

    • C categories for the other variable

    • At least 5 observations in each cell

  • Prepare an r x c contingency table

  • H0 the two variables are independent


Chi square test46 l.jpg

Chi Square Test

  • Expected number of observations in any cell

  • The test statistic

  • Reject (not independent) if chi square exceeds table value at 1-α distribution with (r - 1)(c - 1) degrees of freedom


Transportation example48 l.jpg

Transportation Example

Number of crashes during a year


Transportation example49 l.jpg

Transportation Example


Transportation example50 l.jpg

Transportation Example

Adapted from Ang and Tang, 1975


K s test for goodness of fit l.jpg

K.S. Test for goodness of fit

  • Kolmogorov-Smirnov

  • Non-parametric test

  • Use for ratio, interval or ordinal scale data

  • Compare experimental or observed data to a theoretical distribution (CDF)

  • Need to compile a CDF of your data (called an EDF where E means empirical)

  • OK for small samples


Poisson distribution l.jpg

Poisson Distribution

  • the Poisson distribution requires that the mean be approximately equal to the variance

  • Discrete events, whole numbers with small values

  • Positive values

  • e.g., number of crashes or vehicles during a given time


Transportation example 1 l.jpg

Transportation Example #1

  • On average, 3 crashes per day are experienced on a particular road segment

  • What is the probability that there will be more than 3 crashes in a day


Slide56 l.jpg

  • P(X>3) = 1- [P (X=0) + P (X=1) + P (X=2) + P (X=3)]

  • e.g.,P (x=3) = = .224

    P(X>3) = 1- [.050 + .149 + .224 + .224] = 65% (recognize this number???)


Transportation example 2 l.jpg

Transportation Example #2


Negative binomial distribution l.jpg

Negative Binomial Distribution

  • An “over-dispersed” Poisson

  • Mean > variance

  • Also used for crashes, other count data, especially when combinations of poisson distributed data

  • Recall binomial:

  • Negative binomial:


Negative exponential distribution l.jpg

(Negative) Exponential Distribution

  • Good for inter-arrival time (e.g., time between arrivals or crashes, gaps)

  • Assumes Poisson counts

  • P(no occurrence in time t) =


Transportation example61 l.jpg

Transportation Example

  • In our turn bay design example, what is the probability that no car will arrive in 1 minute? (19%)

  • How many 7 second gaps are expected in one minute??? 82% chance that any 7 sec. Period has no car … 60/7*82%=7/minute


Weibull distribution l.jpg

Weibull Distribution

  • Very flexible empirical model


Sampling distributions l.jpg

Sampling Distributions


Sampling distributions64 l.jpg

Sampling Distributions

  • Some Definitions

  • Some Common Sense Things

  • An Example

  • A Simulation

  • Sampling Distributions

  • Central Limit Theorem


Definitions l.jpg

Definitions

  • Parameter: A number describing a population

  • Statistic: A number describing a sample

  • Random Sample: every unit in the population has an equal probability of being included in the sample

  • Sampling Distribution: the probability distribution of a statistic


Common sense thing 1 l.jpg

Common Sense Thing #1

A random sample should represent the population well, so sample statistics from a random sample should provide reasonable estimates of population parameters


Common sense thing 2 l.jpg

Common Sense Thing #2

All sample statistics have some error in estimating population parameters


Common sense thing 3 l.jpg

Common Sense Thing #3

If repeated samples are taken from a population and the same statistic (e.g. mean) is calculated from each sample, the statistics will vary, that is, they will have a distribution


Common sense thing 4 l.jpg

Common Sense Thing #4

A larger sample provides more information than a smaller sample so a statistic from a large sample should have less error than a statistic from a small sample


Distribution of when sampling from a normal distribution l.jpg

Distribution of when sampling from a normal distribution

  • has a normal distribution with

  • mean =

  • and

  • standard deviation =


Central limit theorem l.jpg

If the sample size (n) is large enough, has a normal distribution with

mean =

and

standard deviation =

regardless of the population distribution

Central Limit Theorem


Slide72 l.jpg

What is Large Enough?


Does have a normal distribution l.jpg

Does have a normal distribution?

Is the population normal?

Yes

No

is normal

Is ?

Yes

No

is considered to be

normal

may or may not be

considered normal

(We need more info)


Situation l.jpg

Situation

  • Different samples produce different results.

  • Value of a statistic, like mean or proportion, depends on the particular sample obtained.

  • But some values may be more likely than others.

  • The probability distribution of a statistic (“sampling distribution”) indicates the likelihood of getting certain values.


Transportation example75 l.jpg

Transportation Example

  • Speed is normally distributed with mean 45 MPH and standard deviation 6 MPH.

  • Take random samples of n = 4.

  • Then, sample means are normally distributed with mean 45 MPH and standard error 3 MPH [from 6/sqrt(4) = 6/2].


Using empirical rule l.jpg

Using empirical rule...

  • 68% of samples of n=4 will have an average speed between 42 and 48 MPH.

  • 95% of samples of n=4 will have an average speed between 39 and 51 MPH.

  • 99% of samples of n=4 will have an average speed between 36 and 54 MPH.


What happens if we take larger samples l.jpg

What happens if we take larger samples?

  • Speed is normally distributed with mean 45 MPH and standard deviation 6 MPH.

  • Take random samples of n = 36 .

  • Then, sample means are normally distributed with mean 45 MPH and standard error 1 MPH [from 6/sqrt(36) = 6/6].


Again using empirical rule l.jpg

Again, using empirical rule...

  • 68% of samples of n=36 will have an average speed between 44 and 46 MPH.

  • 95% of samples of n=36 will have an average speed between 43 and 47 MPH.

  • 99% of samples of n=36 will have an average speed between 42 and 48 MPH.

  • So … the larger the sample, the less the sample averages vary.


Sampling distributions for proportions l.jpg

Sampling Distributions for Proportions

  • Thought questions

  • Basic rules

  • ESP example

  • Taste test example


Rule for sample proportions l.jpg

Rule for Sample Proportions


Proportion heads in 50 tosses l.jpg

Proportion “heads” in 50 tosses

  • Bell curve for possible proportions

  • Curve centered at true proportion (0.50)

  • SD of curve = Square root of [p(1-p)/n]

  • SD = sqrt [0.5(1-0.5)/50] = 0.07

  • By empirical rule, 68% chance that a proportion will be between 0.43 and 0.57


Esp example l.jpg

ESP example

  • Five cards are randomly shuffled

  • A card is picked by the researcher

  • Participant guesses which card

  • This is repeated n = 80 times


Many people participate l.jpg

Many people participate

  • Researcher tests hundreds of people

  • Each person does n = 80 trials

  • The proportion correct is calculated for each person


Who has esp l.jpg

Who has ESP?

  • What sample proportions go beyond luck?

  • What proportions are within the normal guessing range?


Possible results of esp experiment l.jpg

Possible results of ESP experiment

  • 1 in 5 chance of correct guess

  • If guessing, true p = 0.20

  • Typical guesser gets p = 0.20

  • SD of test = Sqrt [0.2(1-0.2)/80] = 0.035


Description of possible proportions l.jpg

Description of possible proportions

  • Bell curve

  • Centered at 0.2

  • SD = 0.035

  • 99% within 0.095 and 0.305 (+/- 3SD)

  • If hundreds of tests, may find several (does it mean they have ESP?)


Transportation example87 l.jpg

Transportation Example


Concepts of confidence intervals l.jpg

Concepts of Confidence Intervals


Confidence interval l.jpg

Confidence Interval

  • A range of reasonable guesses at a population value, a mean for instance

  • Confidence level = chance that range of guesses captures the the population value

  • Most common confidence level is 95%


General format of a confidence interval l.jpg

General Format of a Confidence Interval

  • estimate +/- margin of error


Transportation example accuracy of a mean l.jpg

Transportation Example: Accuracy of a mean

  • A sample of n=36 has mean speed = 75.3.

  • The SD = 8 .

  • How well does this sample mean estimate the population mean ?


Standard error of mean l.jpg

Standard Error of Mean

  • SEM = SD of sample / square root of n

  • SEM = 8 / square root ( 36) = 8 / 6 = 1.33

  • Margin of error of mean = 2 x SEM

  • Margin of Error = 2.66 , about 2.7


Interpretation l.jpg

Interpretation

  • 95% chance the sample mean is within 2.7 MPH of the population mean. (q. what is implication on enforcement of type I error? Type II?)

  • A 95% confidence interval for the population mean

  • sample mean +/- margin of error

  • 75.3 +/-2.7 ; 72.6 to 78.0


For large population l.jpg

For Large Population

  • Could the mean speed be 72 MPH ?

  • Maybe, but our interval doesn't include 72.

  • It's likely that population mean is above 72.


C i for mean speed at another location l.jpg

C.I. for mean speed at another location

  • n=49

  • sample mean=70.3 MPH, SD = 8

  • SEM = 8 / square root(49) = 1.1

  • margin of error=2 x 1.1 = 2.2

  • Interval is 70.3 +/- 2.2

  • 68.1 to 72.5


Do locations 1 and 2 differ in mean speed l.jpg

Do locations 1 and 2 differ in mean speed?

  • C.I. for location 1 is 72.6 to 78.0

  • C.I. for location 2 is 68.1 to 72.5

  • No overlap between intervals

  • Looks safe to say that population means differ


Thought question l.jpg

Thought Question

  • Study compares speed reduction due to enforcement vs. education

  • 95% confidence intervals for mean speed reduction

    • Cop on side of road : 13.4 to 18.0

    • Speed monitor only : 6.4 to 11.2


Part a l.jpg

Part A

  • Do you think this means that 95% of locations with cop present will lower speed between 13.4 and 18.0 MPH?

  • Answer : No. The interval is a range of guesses at the population mean.

  • This interval doesn't describe individual variation.


Part b l.jpg

Part B

  • Can we conclude that there's a difference between mean speed reduction of the two programs ?

  • This is a reasonable conclusion. The two confidence intervals don't overlap.

  • It seems the population means are different.


Direct look at the difference l.jpg

Direct look at the difference

  • For cop present, mean speed reduction = 15.8 MPH

  • For sign only, mean speed reduction = 8.8 MPH

  • Difference = 7 MPH more reduction by enforcement method


Confidence interval for difference l.jpg

Confidence Interval for Difference

  • 95% confidence interval for difference in mean speed reduction is 3.5 to 10.5 MPH.

    • Don't worry about the calculations.

  • This interval is entirely above 0.

  • This rules out "no difference" ; 0 difference would mean no difference.


Confidence interval for a mean l.jpg

Confidence Interval for a Mean

when you have a “small” sample...


As long as you have a large sample l.jpg

As long as you have a “large” sample….

A confidence interval for a population mean is:

where the average, standard deviation, and n depend on the sample, and Z depends on the confidence level.


Transportation example104 l.jpg

Random sample of 59 similar locations produces an average crash rate of 273.2. Sample standard deviation was 94.40.

Transportation Example

We can be 95% confident that the average crash rate was between 249.11 and 297.29


What happens if you can only take a small sample l.jpg

What happens if you can only take a “small” sample?

  • Random sample of 15 similar location crash rates had an average of 6.4 with standard deviation of 1.

  • What is the average crash rate at all similar locations?


If you have a small sample l.jpg

If you have a “small” sample...

Replace the Z value with a t value to get:

where “t” comes from Student’s t distribution, and depends on the sample size through the degrees of freedom “n-1”

Can also use the tau test for very small samples


Student s t distribution versus normal z distribution l.jpg

Student’s t distribution versus Normal Z distribution


T distribution l.jpg

T distribution

  • Very similar to standard normal distribution, except:

  • t depends on the degrees of freedom “n-1”

  • more likely to get extreme t values than extreme Z values


Let s compare t and z values l.jpg

Let’s compare t and Z values

For small samples, T value is larger than Z value.

So,T interval is made to be longer than Z interval.


Ok enough theorizing let s get back to our example l.jpg

OK, enough theorizing!Let’s get back to our example!

Sample of 15 locations crash rate of 6.4 with standard deviation of 1.

Need t with n-1 = 15-1 = 14 d.f.

For 95% confidence, t14 = 2.145

We can be 95% confident that average crash rate is between 5.85 and 6.95.


What happens as sample gets larger l.jpg

What happens as sample gets larger?


What happens to ci as sample gets larger l.jpg

What happens to CI as sample gets larger?

For large samples:

Z and t values become almost identical, so CIs are almost identical.


One not so small problem l.jpg

One not-so-small problem!

  • It is only OK to use the t interval for small samples if your original measurements are normally distributed.

  • We’ll learn how to check for normality in a minute.


Strategy for deciding how to analyze l.jpg

Strategy for deciding how to analyze

  • If you have a large sample of, say, 60 or more measurements, then don’t worry about normality, and use the z-interval.

  • If you have a small sample and your data are normally distributed, then use the t-interval.

  • If you have a small sample and your data are not normally distributed, then do not use the t-interval, and stay tuned for what to do.


Hypothesis tests l.jpg

Hypothesis tests

  • Test should begin with a set of specific, testable hypotheses that can be tested using data:

    • Not a meaningful hypothesis – Was safety improved by improvements to roadway

    • Meaningful hypothesis – Were speeds reduced when traffic calming was introduced.

  • Usually to demonstrate evidence that there is a difference in measurable quantities

  • Hypothesis testing is a decision-making tool.


Hypothesis step 1 l.jpg

Hypothesis Step 1

  • Provide one working hypothesis – the null hypothesis – and an alternative

  • The null or nil hypothesis convention is generally that nothing happened

    • Example

      • speeds were not reduced after traffic calming – Null Hypothesis

      • Speed were reduced after traffic calming – Alternative Hypothesis

  • When stating the hypothesis, the analyst must think of the impact of the potential error.


Step 2 select appropriate statistical test l.jpg

Step 2, select appropriate statistical test

  • The analyst may wish to test

    • Changes in the mean of events

    • Changes in the variation of events

    • Changes in the distribution of events


Step 3 formulate decision rules and set levels for the probability of error l.jpg

Reject

Accept

Area where we

incorrectly accept

Type II error, referred

to as b

Area where we

incorrectly reject

Type I error,

referred to as a

(significance level)

Step 3, Formulate decision rules and set levels for the probability of error


Type i and ii errors l.jpg

Type I and II errors

m lies in m lies in

acceptance interval rejection interval

Accept the No errorType II error

Claim

Reject the Type I error No error

claim


Levels of a and b l.jpg

Levels of a and b

  • Often b is not considered in the development of the test.

  • There is a trade-off between a and b

  • Over emphasis is placed on the level of significance of the test.

  • The level of a should be appropriate for decision being made.

    • Small values for decisions where errors cannot be tolerated and b errors are less likely

    • Larger values where type I errors can be more easily tolerated


Step 4 check statistical assumption l.jpg

Step 4 Check statistical assumption

  • Draw new samples to check answer

  • Check the following assumption

    • Are data continuous or discrete

    • Plot data

    • Inspect to make sure that data meets assumptions

      • For example, the normal distribution assumes that mean = median

    • Inspect results for reasonableness


Step 5 make decision l.jpg

Step 5 Make decision

  • Typical misconceptions

    • Alpha is the most important error

    • Hypothesis tests are unconditional

      • It does not provide evidence that the working hypothesis is true

    • Hypothesis test conclusion are correct

      • Assume

        • 300 independent tests

        • 100 rejection of work hypothesis

        • a = 0.05 and b = 0.10

        • Thus 0.05 x 100 = 5 Type I errors

        • And 0.1 x 200 = 20 Type II errors

        • 25 time out of 300 the test results were wrong


Transportation example125 l.jpg

Transportation Example

The crash rates below, in 100 million vehicle miles, were calculated for 50, 20 mile long segment of interstate highway during 2002

.34.22.40.25.31.34.26.55.43.34

.31.43.28.33.23.40.39.38.21.43

.20.20.36.48.36.30.27.42.27.28

.43.45.38.54.39.55.25.35.39.43

.26.17.30.40.16.32.34.46.37.33

x = 0.35

s2= 0.0090

s = 0.095


Example continued l.jpg

Example continued

Crash rates are collected from non-interstate system highways built to slightly lower design standards. Similarly and average crash rate is calculated and it is greater (0.53). Also assume that both means have the same standard deviation (0.095). The question is do we arrive at the same accident rate with both facilities. Our hypothesis is that both have the same means mf = mnf Can we accept or reject our hypothesis?


Example continued127 l.jpg

Reject

Accept

Area where we

incorrectly accept

Type II error

Area where we

incorrectly reject

Type I error

Example continued

Is this part of the crash rate distribution

for interstate highways

0.34

0.53


Example continued128 l.jpg

Example continued

  • Lets set the probability of a Type I error at 5%

    • set (upper boundary - 0.35)/ 0.095 =1.645 (one tail Z (cum) for 95%)

    • Upper boundary = 0.51

    • Therefore, we reject the hypothesis

    • What’s the probability of a Type II error?

    • (0.51 – 0.53)/0.095 = -0.21

    • 41.7%

    • There is a 41.7% chance of what?


The p value example grade inflation l.jpg

Data

n = 36

s = 0.6

and

The P value … example: Grade inflation

H0: μ = 2.7

HA: μ > 2.7

Random sample

of students

Decision Rule

Set significance level α = 0.05.

If p-value 0.05, reject null hypothesis.


Example grade inflation l.jpg

Example: Grade inflation?

Population of

5 million college

students

Is the average GPA 2.7?

How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?

Sample of

100 college students


The p value illustrated l.jpg

The p-value illustrated

How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7?


Determining the p value l.jpg

Determining the p-value

H0: μ = average population GPA = 2.7

HA: μ = average population GPA > 2.7

If 100 students have average GPA of 2.9 with standard deviation of 0.6, the P-value is:


Making the decision l.jpg

Making the decision

  • The p-value is “small.” It is unlikely that we would get a sample as large as 2.9 if the average GPA of the population was 2.7.

  • Reject H0. There is sufficient evidence to conclude that the average GPA is greater than 2.7.


Terminology l.jpg

Terminology

  • H0: μ = 2.7 versus HA: μ > 2.7 is called a “right-tailed” or a “one-sided” hypothesis test, since the p-value is in the right tail.

  • Z = 3.33 is called the “test statistic”.

  • If we think our p-value small if it is less than 0.05, then the probability that we make a Type I error is 0.05. This is called the “significance level” of the test. We say, α=0.05, where α is “alpha”.


Alternative decision rule l.jpg

Alternative Decision Rule

  • “Reject if p-value  0.05” is equivalent to “reject if the sample average, X-bar, is larger than 2.865”

  • X-bar > 2.865 is called “rejection region.”


Minimize chance of type i error l.jpg

Minimize chance of Type I error...

  • … by making significance level  small.

  • Common values are  = 0.01, 0.05, or 0.10.

  • “How small” depends on seriousness of Type I error.

  • Decision is not a statistical one but a practical one (set alpha small for safety analysis, larger for traffic congestion, say)


Type ii error and power l.jpg

Type II Error and Power

  • “Power” of a test is the probability of rejecting null when alternative is true.

  • “Power” = 1 - P(Type II error)

  • To minimize the P(Type II error), we equivalently want to maximize power.

  • But power depends on the value under the alternative hypothesis ...


Type ii error and power138 l.jpg

Type II Error and Power

(Alternative is true)


Factors affecting power l.jpg

Factors affecting power...

  • Difference between value under the null and the actual value

  • P(Type I error) = 

  • Standard deviation

  • Sample size


Strategy for designing a good hypothesis test l.jpg

Strategy for designing a good hypothesis test

  • Use pilot study to estimate std. deviation.

  • Specify . Typically 0.01 to 0.10.

  • Decide what a meaningful difference would be between the mean in the null and the actual mean.

  • Decide power. Typically 0.80 to 0.99.

  • Simple to use software to determine sample size …


How to determine sample size l.jpg

How to determine sample size

Depends on experiment

Basically, use the formulas and let sample size be the factor you want to determine

Vary the confidence interval, alpha and beta

http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/index.html


If sample is too small l.jpg

If sample is too small ...

  • … the power can be too low to identify even large meaningful differences between the null and alternative values.

    • Determine sample size in advance of conducting study.

    • Don’t believe the “fail-to-reject-results” of a study based on a small sample.


If sample is really large l.jpg

If sample is really large ...

  • … the power can be extremely high for identifying even meaningless differences between the null and alternative values.

    • In addition to performing hypothesis tests, use a confidence interval to estimate the actual population value.

    • If a study reports a “reject result,” ask how much different?


The moral of the story as researcher l.jpg

The moral of the storyas researcher

  • Always determine how many measurements you need to take in order to have high enough power to achieve your study goals.

  • If you don’t know how to determine sample size, ask a statistical consultant to help you.


Important boohoo point l.jpg

Important “Boohoo!” Point

  • Neither decision entails proving the null hypothesis or the alternative hypothesis.

  • We merely state there is enough evidence to behave one way or the other.

  • This is also always true in statistics! No matter what decision we make, there is always a chance we made an error.

  • Boohoo!


Comparing the means of two dependent populations l.jpg

Comparing the Means of Two Dependent Populations

The Paired T-test ….


Assumptions 2 sample t test l.jpg

Assumptions: 2-Sample T-Test

  • Data in each group follow a normal distribution.

  • For pooled test, the variances for each group are equal.

  • The samples are independent. That is, who is in the second sample doesn’t depend on who is in the first sample (and vice versa).


What happens if samples aren t independent l.jpg

What happens if samples aren’t independent?

That is, they are

“dependent” or “correlated”?


Do signals with all red clearance phases have lower numbers of crashes than those without l.jpg

Do signals with all-red clearance phases have lower numbers of crashes than those without?

All-RedNo All-Red

60 32

32 44

80 22

50 40

Sample Average: 55.5 34.5

Real question is whether intersections with similar traffic volumes have different numbers of crashes. Better then to compare the difference in crashes in “pairs” of intersections with and without all-red clearance phases.


Now a paired study l.jpg

Now, a Paired Study

Crashes

TrafficNo all-redAll-redDifference

Low 22 202.0

Medium 29 281.0

Med-high 35 32 3.0

High 80782.0

Averages 41.5 39.52.0

St. Dev 26.1 26.1 0.816

P-value = How likely is it that a paired sample would have a difference as large as 2 if the true difference were 0? (Ho = no diff.) - Problem reduces to a One-Sample T-test on differences!!!!


The paired t test statistic l.jpg

The Paired-T Test Statistic

  • If:

  • there are n pairs

  • and the differences are normally distributed

Then:

The test statistic, which follows a t-distribution with n-1 degrees of freedom, gives us our p-value:


The paired t confidence interval l.jpg

The Paired-T Confidence Interval

  • If:

  • there are n pairs

  • and the differences are normally distributed

Then:

The confidence interval, with t following t-distribution with n-1 d.f. estimates the actual population difference:


Data analyzed as 2 sample t l.jpg

Data analyzed as 2-Sample T

Two sample T for No all-red vs All-red

N Mean StDev SE Mean

No 4 41.5 26.2 13

All 4 39.5 26.1 13

95% CI for mu No - mu All: ( -43, 47)

T-Test mu No = mu All (vs not =): T = 0.11

P = 0.92 DF = 6

Both use Pooled StDev = 26.2

P = 0.92. Do not reject null. Insufficient evidence to conclude that there is a real difference.


Data analyzed as paired t l.jpg

Data analyzed as Paired T

Paired T for No all-red vs All-red

N Mean StDev SE Mean

No 4 41.5 26.2 13.1

All 4 39.5 26.1 13.1

Difference 4 2.000 0.816 0.408

95% CI for mean difference: (0.701, 3.299)

T-Test of mean difference = 0 (vs not = 0):

T-Value = 4.90 P-Value = 0.016

P = 0.016. Reject null. Sufficient evidence to conclude that there IS a difference.


What happened l.jpg

What happened?

  • P-value from two-sample t-test is just plain wrong. (Assumptions not met.)

  • We removed or “blocked out” the extra variability in the data due to differences in traffic, thereby focusing directly on the differences in crashes.

  • The paired t-test is more “powerful” because the paired design reduces the variability in the data.


Ways pairing can occur l.jpg

Ways Pairing Can Occur

  • When subjects in one group are “matched” with a similar subject in the second group.

  • When subjects serve as their own control by receiving both of two different treatments.

  • When, in “before and after” studies, the same subjects are measured twice.


If variances of the measurements of the two groups are not equal l.jpg

If variances of the measurements of the two groups are notequal...

Estimate the standard error of the difference as:

Then the sampling distribution is an approximate t distribution with a complicated formula for d.f.


If variances of the measurements of the two groups are equal l.jpg

If variances of the measurements of the two groups are equal...

Estimate the standard error of the difference using the common pooled variance:

where

Then the sampling distribution is a t distribution with n1+n2-2 degrees of freedom.

Assume variances are equal only if neither sample standard deviation is more than twice that of the other sample standard deviation.


Assumptions for correct p values l.jpg

Assumptions for correct P-values

  • Data in each group follow a normal distribution.

  • If use pooled t-test, the variances for each group are equal.

  • The samples are independent. That is, who is in the second sample doesn’t depend on who is in the first sample (and vice versa).


Interpreting a confidence interval for the difference in two means l.jpg

Interpreting a confidence interval for the difference in two means…


Difference in variance l.jpg

Difference in variance

  • Use F distribution test

  • Compute F = s1^2/s2^2

  • Largest sample variance on top

  • Look up in F table with n1 and n2 DOF

  • Reject that the variance is the same if f>F

  • If used to test if a model is the same (same coefficients) during 2 periods, it is called the Chow test


Experiments and pitfalls l.jpg

Experiments and pitfalls

  • Types of safety experiments

    • Before and after tests

    • Cross sectional tests

      • Control sample

      • Modified sample

      • Similar or the same condition must take place for both samples


Regression to the mean l.jpg

Regression to the mean

  • This problem plagues before and after tests

    • Before and after tests require less data and therefore are more popular

  • Because safety improvements are driven abnormally high crash rates, crash rates are likely to go down whether or not an improvement is made.


San francisco intersection crash data l.jpg

San Francisco intersection crash data


Spill over and migration impacts l.jpg

Spill over and migration impacts

Improvements, particularly those that are expect to modify behavior should be expected to spillover impacts at other locations. For example, suppose that red light running cameras were installed at several locations throughout a city. Base on the video evidence, this jurisdiction has the ability to ticket violators and, therefore, less red light running is suspected throughout the system – leading to spill over impacts. A second data base is available for a similar control set of intersections that are believed not to be impacted by spill over. The result are listed below:


Spill over impact l.jpg

Spill over impact

Assuming that spill over has occurred at the treated sites, the reduction in accidents that would have occurred after naturally is 100 x 112/140 = 80 had the intersection remained untreated. therefore, the reduction is really from 80 crashes to 64 crashes (before vs after) or 20% rather than 100 crashes to 64 crashes or 36%.


Spurious correlations l.jpg

Spurious correlations

During the 1980s and early 1990s the Japanese economy was growing at a much greater rate than the U.S. economy. A professor on loan to the federal reserved wrote a paper on the Japanese economy and correlated the growth in the rate of Japanese economy and their investment in transportation infrastructure and found a strong correlation. At the same time, the U.S. invests a much lower percentage of GDP in infrastructure and our GNP was growing at a much lower rate. The resulting conclusion was that if we wanted to grow the economy we would invest like the Japanese in public infrastructure. The Association of Road and Transportation Builders of America (ARTBA) loved his findings and at the 1992 annual meeting of the TRB the economist from Bates college professor won an award. In 1992 the Japanese economy when in the tank, the U.S. economy started its longest economic boom.


Spurious correlation cont l.jpg

Spurious Correlation Cont.

What is going on here?

What is the nature of the relationship between transportation investment and economic growth?


  • Login