CH7 Distribution Free Inference: Computer-Intensive Techniques

CH7 Distribution Free Inference: Computer-Intensive Techniques 1.Random Sampling 2.Bootstrap sampling 3.Bootstrap Testing

Why do we need distribution free-inference? For parametric models we need to: Specifying the function form Or the parametric family of the distribution.

In this Chapter • We do not rely on the function form or the parameter; • We rely on the observations in our sample; • Present computer-intensive techniques for inference;

7.1 Random Sampling from Reference distribution • Generate the Empirical Reference Distribution for some statistics.(Once we have such Distribution, we can find out the p-value of our test). 1.Generating the Empirical Reference Distribution in the parametric model, 2.Generating the Empirical Reference Distribution in the nonparametric model.

Generating the Empirical Reference Distribution in the parametric model Idea: (1) N, the repetition time is given. (2) Generate n observations from the distribution under Null hypothesis(with the claimed parameter), calculate the value of the interested statistic. (3) Repeat (2) N times, and get N values of the statistic. (4) Build the distribution table of the statistic-the Empirical Reference Distribution.

Ex1:In a manufacturing process, the proportion of plates from the process having more than one blemish need to be controlled under 10%. Suppose that in a sample of 50 plates we find 8 blemished plates, shall we be satisfied? Solution: Generate 50 obs from Bin(1, 0.1),let X1=( # of blemish among the 50 obs)/50. If we repeat this 1000 times, we’ll get X1,X2,……X1000. From these 1000 statistics, we can build a histogram (the empirical distribution) and find the p-vlaue of the test.

Generating the Empirical Reference Distribution in the Nonparametric model Idea: (1) N, the repetition time is given. (2) Generate n observations from the distribution under Null hypothesis(with the claimed info), calculate the value of the interested statistic. (3) Repeat (2) N times, and get N values of the statistic. (4) Build the distribution table of the statistic-the Empirical Reference Distribution.

Ex2:Consider a Hypothesis on the length of aluminum pins(Lengthwcp in Almpin.dat).Our Sample avg=60.028, and want to test H0: pop avg=60.1 vs H1: pop avg<60.1 Sol:In our sample we have z1,z2,……z70.Let yi = zi+(60.1-60.028).Then Avg of y’s is 60.1 and the dist of y’s is just a shift of dist of z’s. We draw 70 obs from y’s WR, we can calculate the ave of these 70 obs, and denote it by X1.If we repeat this 1000 times ,we’ll get X1 ,X2,……X1000. In the Textbook page 233, there is a table7.2. Since the 0.01 quantile of the dist is 60.0869>60.028, p-value of the test<1%, so reject H0 …..

7.2 Bootstrap Sampling • 7.2.1 the Bootstrap Method 1.The Bootstrap method was introduced in 1970 by B.Efron. 2. The method performs statistical inference by computer and without the extensive assumption and intricate theory.

Details of Bootstrap Method

Properties of the EBD • It’s centered at the sample statistics tn. • The mean of the EBD is an estimate of the mean of the sampling distribution of the statistic T over all possible sample. • The SD of the EBD is the Bootstrap estimate of the standard error of T • We can find the alpha quantiles of the EBD as the quantiles limits of the distribution of T(for the purpose of CI).

Example 7.3 • Generate n=100 obs from Normal(20.5,12.5^2). • From the above obs, assuming we do not know its original distribution,find out the estimator of mean and it’s Bootstrap confidence interval. • Use sample average (18.709) to estimate the population average(20.5).Let M=100,use Bootstrap method, then we get 100 Bootstrap sample means(BSMs). The SD of these 100 BSMs is an estimator of SE of (sample average). • Let alpha=5%, the 97.5% quantile of BSMs is 21.1142, the 2.5% quantile of BSMs is 16.0498. Thus the Bootstrap confidence interval for the mean is (16.0498,21.1142).

7.2.2 Examining the bootstrap method • Question: Is Bootstrap Method good? • Answer:

7.2.3 Harnessing the Bootstrap Method • Sometimes it is hard to calculate a statistic when it has a very complicated formula( or you may have no idea about the formula), use bootstrap method make things easier(e.g look at the formula for 5.2.6 of page 167). • And when sample size is large, the approximation is very precise in many cases.

7.3 Bootstrap Testing of Hypothesis • Idea: (1)Under Null hypothesis, construct an empirical reference distribution for interested statistic. (2) Based on the empirical reference distribution, find out the corresponding p-value…..

7.3.1 Bootstrap testing and CI for the mean

7.3.2 Studentized Test for the mean

Ex7.4: In data file”Hybrid1.dat” for Res3 with n=32 obs. Avg=21.434, H0:=2150, Sol: M=500 (1) use bootstrap test for mean/confidence interval find out that 95% confidence interval is (2109.5,2179.91) which cover 2150. So do not reject H0. (2) use bootstrap studentized test find out that tn=-0.374, but p-value=.708. not reject again.

CH7 Distribution Free Inference: Computer-Intensive Techniques

CH7 Distribution Free Inference: Computer-Intensive Techniques

Presentation Transcript

East is East - Distribution

Nutrition in Intensive Care

Chapter 12

Chapter 7: Computer-Assisted Audit Techniques [CAATs]

Elliptical Distributions

Inference in First-Order Logic

Inference for Distributions - for the Mean of a Population

Early Childhood: Intensive Instruction ’12-13

Early Childhood: Intensive Instruction ’12-13

Supporting Students with Intensive Literacy Needs

Constraint satisfaction inference for discrete sequence processing in NLP

ADMISSION CRITERIA TO THE INTENSIVE CARE UNIT

New Interaction Techniques

New Interaction Techniques

Structured Probabilistic Inference in an Embodied Construction Grammar

Constraint satisfaction inference for discrete sequence processing in NLP

Colin de la Higuera

Problem Solving: Search Techniques

4-1 Statistical Inference