Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

1 / 99

# Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 - PowerPoint PPT Presentation

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01. Professor William Greene Stern School of Business IOMS Department Department of Economics. Part 5 – Hypothesis Testing. Objectives of Statistical Analysis. Estimation How long do hard drives last?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

Professor William Greene

Stern School of Business

IOMS Department

Department of Economics

Objectives of Statistical Analysis
• Estimation
• How long do hard drives last?
• What is the median income among the 99%ers?
• Inference – hypothesis testing
• Did minorities pay higher mortgage rates during the housing boom?
• Is there a link between environmental factors and breast cancer on eastern long island?
General Frameworks
• Parametric Tests: features of specific distributions such as the mean of a Bernoulli or normal distribution.
• Specification Tests (Semiparametric)
• Do the data arrive from a Poisson process
• Are the data normally distributed
• Nonparametric Tests: Are two discrete processes independent?
Hypotheses
• Hypotheses - labels
• State 0 of Nature – Null Hypothesis
• State 1 – Alternative Hypothesis
• Exclusive: Prob(H0 ∩ H1) = 0
• Exhaustive: Prob(H0) + Prob(H1) = 1
• Symmetric: Neither is intrinsically “preferred” – the objective of the study is only to support one or the other. (Rare?)
Does the New Drug Work?
• Hypotheses: H0= .50, H1 = .75
• Priors: P0= .40, P1= .60
• Clinical Trial: N = 50, 31 patients “respond’” p = .62
• Likelihoods:
• L0 (31|  =.50) = Binomial(50,31,.50) = .0270059
• L1 (31|  =.75) = Binomial(50,31,.75) = .0148156
• Posterior odds in favor of H0 = (.4/.6)(.0270059/.0148156) = 1.2152 > 1
• Priors favored H1 1.5 to 1, but the posterior odds favor H0, 1.2152 to 1. The evidence discredits H1even though the ‘data’ seem more consistent with prior P1.
Decision Strategy
• Prefer the hypothesis with the higher posterior odds
• A gap in the theory: How does the investigator do the cost benefit test?
• Starting a new business venture or entering a new market: Priors and market research
• FDA approving a new drug or medical device. Priors and clinical trials
• Statistical Decision Theory adds the costs and benefits of decisions and errors.
An Alternative Strategy
• Recognize the asymmetry of null and alternative hypotheses.
• Eliminate the prior odds (which are rarely formed or available).

http://query.nytimes.com/gst/fullpage.html?res=9C00E4DF113BF935A3575BC0A9649C8B63http://query.nytimes.com/gst/fullpage.html?res=9C00E4DF113BF935A3575BC0A9649C8B63

Classical Hypothesis Testing
• The scientific method applied to statistical hypothesis testing
• Hypothesis: The world works according to my hypothesis
• Testing or supporting the hypothesis
• Data gathering
• Rejection of the hypothesis if the data are inconsistent with it
• Retention and exposure to further investigation if the data are consistent with the hypothesis
• Failure to reject is not equivalent to acceptance.
Asymmetric Hypotheses
• Null Hypothesis: The proposed state of nature
• Alternative hypothesis: The state of nature that is believed to prevail if the null is rejected.
Hypothesis Testing Strategy
• Formulate the null hypothesis
• Gather the evidence
• Question: If my null hypothesis were true, how likely is it that I would have observed this evidence?
• Very unlikely: Reject the hypothesis
• Not unlikely: Do not reject. (Retain the hypothesis for continued scrutiny.)
Some Terms of Art
• Type I error: Incorrectly rejecting a true null
• Type II error: Failure to reject a false null
• Power of a test: Probability a test will correctly reject a false null
• Alpha level: Probability that a test will incorrectly reject a true null. This is sometimes called the size of the test.
• Significance Level: Probability that a test will retain a true null = 1 – alpha.
• Rejection Region: Evidence that will lead to rejection of the null
• Test statistic: Specific sample evidence used to test the hypothesis
• Distribution of the test statistic under the null hypothesis: Probability model used to compute probability of rejecting the null. (Crucial to the testing strategy – how does the analyst assess the evidence?)
Possible Errors in Testing

Hypothesis is Hypothesis is True False

I Do Not Reject the Hypothesis

I Reject the Hypothesis

A Legal Analogy: The Null Hypothesis is INNOCENT

Null Hypothesis Alternative Hypothesis Not Guilty Guilty

Finding: Verdict Not Guilty

Finding: VerdictGuilty

The errors are not symmetric. Most thinkers consider Type I errors to be more serious than Type II in this setting.

(Jerzy) Neyman – (Karl) Pearson Methodology
• “Statistical” testing
• Methodology
• Formulate the “null” hypothesis
• Decide (in advance) what kinds of “evidence” (data) will lead to rejection of the null hypothesis. I.e., define the rejection region
• Gather the data
• Mechanically carry out the test.
Formulating the Null Hypothesis
• Stating the hypothesis: A belief about the “state of nature”
• A parameter takes a particular value
• There is a relationship between variables
• And so on…
• The null vs. the alternative
• By induction: If we wish to find evidence of something, first assume it is not true.
• Look for evidence that leads to rejection of the assumed hypothesis.
• Evidence that rejects the null hypothesis is significant
Example: Credit Scoring Rule
• Investigation: I believe that Fair Isaacs relies on home ownership in deciding whether to “accept” an application.
• Null hypothesis: There is no relationship
• Alternative hypothesis: They do use homeownership data.
• What decision rule should I use?
Some Evidence

= Homeowners

5469

5030

1845

1100

Hypothesis Test
• Acceptance rate for homeowners = 5030/(5030+1100) = .82055
• Acceptance rate for renters is .74774
• H0: Acceptance rate for renters is not less than for owners.
• H0: p(renters) > .82055
• H1: p(renters) < .82055
The Rejection Region

What is the “rejection region?”

• Data (evidence) that are inconsistent with my hypothesis
• Evidence is divided into two types:
• Data that are inconsistent with my hypothesis (the rejection region)
• Everything else
My Testing Procedure
• I will reject H0 if p(renters) < .815 (chosen arbitrarily)
• Rejection region is sample values of p(renters) < 0.815
Distribution of the Test Statistic Under the Null Hypothesis
• Test statistic p(renters) = 1/N i Accept(=1 or 0)
• Use the central limit theorem:
• Assumed mean = .82055
• Implied standard deviation= sqr(.82055*.17945/7413)=.00459
• Using CLT, normally distributed. (N is very large).
• Use z = (p(renters) - .82055) / .00459
Alpha Level and Rejection Region
• Prob(Reject H0|H0 true) = Prob(p < .815 | H0 is true)= Prob[(p - .82055)/.00459)= Prob[z < -1.209]= .11333
• Probability of a Type I error
• Alpha level for this test
The Test
• The observed proportion is 5469/(5469+1845) = 5469/7314 = .74774
• The null hypothesis is rejected at the 11.333% significance level (by the design of the test)
Application: Breast Cancer On Long Island
• Null Hypothesis: There is no link between the high cancer rate on LI and the use of pesticides and toxic chemicals in dry cleaning, farming, etc.
• Neyman-Pearson Procedure
• Examine the physical and statistical evidence
• If there is convincing covariation, reject the null hypothesis
• What is the rejection region?
• The NCI study:
• Working null hypothesis: There is a link: We will find the evidence.
• How do you reject this hypothesis?
Formulating the Testing Procedure
• Usually: What kind of data will lead me to reject the hypothesis?
• Thinking scientifically: If you want to “prove” a hypothesis is true (or you want to support one) begin by assuming your hypothesis is not true, and look for evidence that contradicts the assumption.
Hypothesis About a Mean
• I believe that the average income of individuals in a population is \$30,000.
• H0 : μ = \$30,000 (The null)
• H1: μ ≠ \$30,000 (The alternative)
• I will draw the sample and examine the data.
• The rejection region is data for which the sample mean is far from \$30,000.
• How far is far????? That is the test.
Application
• The mean of a population takes a specific value:
• Null hypothesis: H0: μ = \$30,000H1: μ ≠ \$30,000
• Test: Sample mean close to hypothesized population mean?
• Rejection region: Sample means that are far from \$30,000
Deciding on the Rejection Region
• If the sample mean is far from \$30,000, reject the hypothesis.
• Choose, the region, for example,

The probability that the mean falls in the rejection region even though the hypothesis is true (should not be rejected) is the probability of a type 1 error. Even if the true mean really is \$30,000, the sample mean could fall in the rejection region.

Rejection

Rejection

29,500 30,000 30,500

Reduce the Probability of a Type I Error by Making the (non)Rejection Region Wider

Reduce the probability of a type I error by moving the boundaries of the rejection region farther out.

Probability outside this interval is large.

28,500 29,500 30,000 30,500 31,500

You can make a type I error impossible by making the rejection region very far from the null. Then you would never make a type I error because you would never reject H0.

Probability outside this interval is much smaller.

Setting the α Level
• “α” is the probability of a type I error
• Choose the width of the interval by choosing the desired probability of a type I error, based on the t or normal distribution. (How confident do I want to be?)
• Multiply the z or t value by the standard error of the mean.
Testing Procedure
• The rejection region will be the range of values greater than μ0 + zσ/√N orless than μ0 - zσ/√N
• Use z = 1.96 for 1 - α = 95%
• Use z = 2.576 for 1 - α = 99%
• Use the t table if small sample, variance is estimated and sampling from a normal distribution.
Deciding on the Rejection Region
• If the sample mean is far from \$30,000, reject the hypothesis.
• Choose, the region, say,

Rejection

Rejection

I am 95% certain that I will not commit a type I error (reject the hypothesis in error). (I cannot be 100% certain.)

The Test Procedure
• Choosing z = 1.96 makes the probability of a Type I error 0.05.
• Choosing z = 2.576 would reduce the probability of a Type I error to 0.01.
• Reducing the probability of a Type I error reduces the power of the test because it reduces the probability that the null hypothesis will be rejected.
P Value
• Probability of observing the sample evidence assuming the null hypothesis is true.
• Null hypothesis is rejected if P value < 
P value < Prob[p(renter) < .74774] = Prob[z < (.74774 - .82055)/.00459] = (-15.86) = .59946942854362260 * 10-56Impossible

=.11333

Confidence Intervals
• For a two sided test about a parameter, a confidence interval is the complement of the rejection region. (Proof in text, p. 338)
Confidence Interval
• If the sample mean is far from \$30,000, reject the hypothesis.
• Choose, the region, say,

Rejection

Confidence

Rejection

I am 95% certain that the confidence interval contains the true mean of the distribution of incomes. (I cannot be 100% certain.)

One Sided Tests
• H0 = 0, H10 Rejection region is sample mean far from 0 in either direction
• H0 = 0, H1>0. Sample means less than 0 cannot be in the rejection region.
• Entire rejection region is above 0.
• Reformulate: H0<0, H1>0.
Carrying Out the LR Test
• In most cases, exact distribution of the statistic is unknown
• Use -2log  Chi squared [1]
• For a test about 1 parameter, threshold value is 3.84 (5%) or 6.45 (1%)
Specification Tests
• Generally a test about a distribution where the alternative is “some other distribution.”
• Test is generally based on a feature of the distribution that is true under the null but not true under the alternative.
Poisson Specification Tests
• 3820 observations on doctor visits
• Poisson distribution?
Deviance Test
• Poisson Distribution p(x) = exp(-)x/x!
• H0: Everyone has the same Poisson Distribution
• H1: Everyone has their own Poisson distribution
• Under H0, observations will tend to be near the mean. Under H1, there will be much more variation.
• Likelihood ratio statistic (Text, p. 348)
Dispersion Test
• Poisson Distribution p(x) = exp(-)x/x!
• H0: The distribution is Poisson
• H1: The distribution is something else
• Under H0, the mean will be (almost) the same as the variance
• Approximate Likelihood ratio statistic (Text, p. 348) = N * Variance / Mean
• For the doctor visit data, this is 22,348.6 vs. chi squared with 1 degree of freedom. H0 is rejected.
Specification Test - Normality
• Normal Distribution is symmetric and has kurtosis = 3.
• Compare observed 3rd and 4th moments to what would be expected from a normal distribution.
Kurtosis: t[5] vs. Normal

Kurtosis of normal(0,1) = 3

Kurtosis of t[k] = 3 + 6/(k-4); for t[5] = 3+6/(5-4) = 9.

Testing for a Distribution
• H0: The distribution is assumed
• H1: The assumed distribution is incorrect
• Strategy: Do the features of the sample resemble what we would observe if H0 were correct
• Continuous: CDF of data resemble CDF of the assumed distribution
• Discrete: Sample cell probabilities resemble predictions from the assumed distribution
Chi Squared Test for a Discrete Distribution
• Outcomes = A1, A2,…, AM
• Predicted probabilities based on a theoretical distribution = E1(), E2(),…,EM().
• Sample cell frequencies = O1,…,OM
V2 Rocket Hits

Adapted from Richard Isaac, The Pleasures of Probability, Springer Verlag, 1995, pp. 99-101.

576 0.25Km2 areas of South London in a grid (24 by 24)

535 rockets were fired randomly into the grid = N

P(a rocket hits a particular grid area) = 1/576 = 0.001736 = θ

Expected number of rocket hits in a particular area = 535/576 = 0.92882

How many rockets will hit any particular area? 0,1,2,… could be anything up to 535.

The 0.9288 is the λ for a Poisson distribution:

1

2

3

4

5

6

7

8

9

10

11

12

13

1 2 3 4 5 6 7 8 9 10 11 12 13

1

2

3

4

5

6

7

8

9

10

11

12

13

1 2 3 4 5 6 7 8 9 10 11 12 13

1

2

3

4

5

6

7

8

9

10

11

12

13

1 2 3 4 5 6 7 8 9 10 11 12 13

Poisson Process
• θ = 1/169
• N = 144
• λ = 144 * 1/169 = 0.852
• Probabilities:
• P(X=0) = .4266
• P(X=1) = .3634
• P(X=2) = .1548
• P(X=3) = .0437
• P(X=4) = .0094
• P(X>4) = .0021
λ = 0.852

Probabilities:

P(X=0) = .4266

P(X=1) = .3634

P(X=2) = .1548

P(X=3) = .0437

P(X=4) = .0094

P(X>4) = .0021

There are 169 squares

There are 144 “trials”

Expect .4266*169 = 72.1 to have 0 hits/square

Expect .3634*169 = 61.4 to have 1 hit/square

Etc.

Expect the average number of hits/square to = .852.

Interpreting The Process
Difference in Means of Two Populations
• Two Independent Normal Populations
• Common known variance
• Common unknown variance
• Different Variances
• One and two sided tests
• Paired Samples
• Means of paired observations
• Treatments and Controls – Diff-in-Diff SAT
• Nonparametric – Mann/Whitney
• Two Bernoulli Populations
Household Incomes, Equal Variances

------------------------------------------------------

t test of equal means INCOME by MARRIED

------------------------------------------------------

MARRIED = 0 Nx = 817 MARRIED = 1 Ny = 3057

t [ 3872] = 3.7238 P value = .0002

------------------------------------------------------

Mean Std.Dev. Std.Error

INCOME ----------------------------------------------

MARRIED = 0 .27982 .12939 .00453

MARRIED = 1 .30145 .15194 .00275

------------------------------------------------------

2 Proportions
• Two Bernoulli Populations:Xi ~ Bernoulli with Prob(xi=1) = xYi ~ Bernoulli with Prob(yi=1) = y
• H0: x = y
• The sample proportions arepx = (1/Nx)ixi and py = (1/Ny)iyi
• Sample variances are px(1-px) and py(1-py).
• Use the Central Limit Theorem to form the test statistic.
z Test for Equality of Proportions

Application: Take up of public health insurance.

------------------------------------------------------

t test of equal means PUBLIC by FEMALE

------------------------------------------------------

FEMALE =0 Nx = 1812 FEMALE =1 Ny = 1565

t [ 3375] = 5.8627 P value = .0000

------------------------------------------------------

Mean Std.Dev. Std.Error

PUBLIC ----------------------------------------------

FEMALE = 0 .84713 .35996 .00846

FEMALE = 1 .91310 .28178 .00712

Paired Sample t and z Test
• Observations are pairs (Xi,Yi), i = 1,…,N
• Hypothesis x = y.
• Both normal distributions. May be correlated.
• Medical Trials: Smoking vs. Nonsmoking (separate individuals, probably independent)
• SAT repeat tests, before and after. (Definitely correlated)
• Test is based on Di = Xi – Yi. Same as earlier with H0:D = 0.
Treatment Effects
• SAT Do Overs
• Experiment: X1, X2, …, XN = first SAT score, Y1, Y2, …, YN = second
• Treatment: T1,…,TN = whether or not the student took a Kaplan (or similar) prep score
• Hypothesis, y > x.
• Placebo: In Medical trials, N1 subjects receive a drug (treatment), N2 receive a placebo.
• Hypothesis: Effect is greater in the treatment group than in the control (placebo) group.
Treatment Effects in Clinical Trials
• Does Phenogyrabluthefentanoel (Zorgrab) work?
• Investigate: Carry out a clinical trial.
• N+0 = “The placebo effect”
• N+T – N+0 = “The treatment effect”
• The hypothesis is that the difference in differences has mean zero.

Placebo Drug Treatment

No Effect N00 N0T

Positive Effect N+0 N+T

A Test of Independence
• In the credit card example, are Own/Rent and Accept/Reject independent?
• Hypothesis: Prob(Ownership) and Prob(Acceptance) are independent
• Formal hypothesis, based only on the laws of probability: Prob(Own,Accept) = Prob(Own)Prob(Accept) (and likewise for the other three possibilities.
• Rejection region: Joint frequencies that do not look like the products of the marginal frequencies.
Contingency Table Analysis

The Data: Frequencies Reject Accept TotalRent 1,845 5,469 7,214Own 1,100 5,030 6,630Total 2,945 10,499 13,444

Step 1: Convert to Actual Proportions Reject Accept TotalRent 0.13724 0.40680 0.54404Own 0.08182 0.37414 0.45596Total 0.21906 0.78094 1.00000

Independence Test

Step 2: Expected proportions assuming independence: If the factors are independent, then the joint proportions should equal the product of the marginal proportions.

[Rent,Reject] 0.54404 x 0.21906 = 0.11918[Rent,Accept] 0.54404 x 0.78094 = 0.42486[Own,Reject] 0.45596 x 0.21906 = 0.09988[Own,Accept] 0.45596 x 0.78094 = 0.35606

When is the Chi Squared Large?
• Critical values from chi squared table
• Degrees of freedom = (R-1)(C-1).

Critical chi squaredD.F. .05 .01 1 3.84 6.63 2 5.99 9.21 3 7.81 11.34 4 9.49 13.28 5 11.07 15.09 6 12.59 16.81 7 14.07 18.48 8 15.51 20.09 9 16.92 21.6710 18.31 23.21

Analyzing Default
• Do renters default more often (at a different rate) than owners?
• To investigate, we study the cardholders (only)

DEFAULT

OWNRENT 0 1 All

0 4854 615 5469

46.23 5.86 52.09

1 4649 381 5030

44.28 3.63 47.91

All 9503 996 10499

90.51 9.49 100.00

Multiple Choices: Travel Mode
• 210 Travelers between Sydney and Melbourne
• 4 available modes, air, train, bus, car
• Among the observed variables is income.
• Does income help to explain mode choice?
• Hypothesis: Mode choice and income are independent.
Travel Mode Choices and Income

+----------------------------------------------------------+

| Travel MODE Data |

+--------+-------------------------------------------------+

|INCOME | AIR TRAIN BUS CAR || Total |

+--------+-------------------------------------++----------+

|LOW | 10 36 9 8 || 63 |

| | 0.04761 0.17143 0.04286 0.03810 || 0.30000 |

|----------------------------------------------++----------+

|MEDIUM | 19 20 13 24 || 76 |

| | 0.09048 0.09524 0.06190 0.11429 || 0.36190 |

|----------------------------------------------++----------+

|HIGH | 29 7 8 27 || 71 |

| | 0.13810 0.03333 0.03810 0.12857 || 0.33810 |

|==============================================++==========+

|Total | 58 63 30 59 || 210 |

| | 0.27619 0.30000 0.14286 0.28095 || 1.00000 |

+--------+-------------------------------------+-----------+

Contingency Table

+----------------------------------------------------------+

| Travel MODE Data |

+--------+-------------------------------------------------+

|INCOME | AIR TRAIN BUS CAR || Total |

+--------+-------------------------------------++----------+

| | 10 36 9 8 || 63 |

|LOW | 0.04761 0.17143 0.04286 0.03810 || 0.30000 || | 0.08286 0.09000 0.04286 0.08429 ||

|----------------------------------------------++----------+

| | 19 20 13 24 || 76 |

|MEDIUM | 0.09048 0.09524 0.06190 0.11429 || 0.36190 || | 0.09995 0.10857 0.05170 0.10168 ||

|----------------------------------------------++----------+

| | 29 7 8 27 || 71 |

|HIGH | 0.13810 0.03333 0.03810 0.12857 || 0.33810 || | 0.09338 0.10143 0.04830 0.09499 ||

|==============================================++==========+

|Total | 58 63 30 59 || 210 |

| | 0.27619 0.30000 0.14286 0.28095 || 1.00000 |

+--------+-------------------------------------+-----------+

Assuming independence, P(Income,Mode) = P(Income) x P(Mode).

Computing Chi Squared

For our transport mode problem, R = 3, C = 4, so DF = 2x3 = 6. The critical value is 12.59. The hypothesis of independence is rejected.