Inferential statistics

1 / 30

# Inferential statistics - PowerPoint PPT Presentation

Suppose, we have a bag of nuts. I will choose one of nut s , I will crack it and it will be empty. What then I can conclude? The optimist says: „ But this! O nly one nut is bad and I have to pull it. At least we got rid of it.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Inferential statistics' - vachel

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Suppose, we have a bag of nuts.I will choose one of nuts, I will crack it and itwill be empty. What then I can conclude?

The optimist says: „But this! Only one nut is bad and I have to pull it. At least we got rid of it.

"Pessimist says:" This is what I was afraid of, the bag is full of bad nuts ". What will say Statistician? I declare that both pessimist and optimist may be right.

Inferential

statistics

To determine whether the nuts in the bag are bad,

it is enough to select few nuts from different places of bag and crack them …

doc.Ing. Zlata Sojková, CSc.

Statistical inference is based on the sample investigation

Statistical inference is the process of using sample results to draw conclusions about the parameters of a population.

The sample should be a representativesample of the population. On the picture it’s not so ...

doc.Ing. Zlata Sojková, CSc.

Examples of inferential statistics
• Household accounts
• Marketing research of consumer behavior (patterns?)
• Sample investigation of agricultural enterprises
• Survey of public opinions
• Quality control

doc.Ing. Zlata Sojková, CSc.

Inferential statistics (or Statistical inference)
• Assume that we are working with the sample and we calculate a sample statistics such: sample average, sample variance , sample standard deviation.
• Based on the sample we assume the properties of a population.
• This means , the values of a sample statistics are used to estimate the unknown values of population parameters
• Usually we estimate parameters of population such : population mean, population variance, standard deviation of population.

doc.Ing. Zlata Sojková, CSc.

Graphicaly

Symbols:

parameters of population: , 2, , generally Q

sample characteristics :

Generally:un

Sample

with size n

Population – size N,

resp. (infinity)

doc.Ing. Zlata Sojková, CSc.

Statistical inference (SI)

statistical estimation - unknown population parameters are estimated by sample characteristics

Statistical hypothesis testing- we express assumptions about the unknown parameters of the population. If we canformulate these assumptions to statistical hypotheses and we can verify their validity by statistical procedures, then these statistical process is statistical hypothesis testing.

doc.Ing. Zlata Sojková, CSc.

• To determinate size of sample (n), which will be enough for reliable (spoľahlivý) estimation of parameters
• To determinate some methods of statistical units sampling from population

Explanation: the sample characteristics are deterministic in relationship to the sample, but they are random variables in relationship to the population , so they have some probability distribution.

That means, important is choosing of the right model of sample characteristic distribution, which we have to use in statistical inference (this made for us statisticians). Arithmetic average has usually Student distribution, but in large sample (n>30) we can approximate Student distribution by Normal distribution

doc.Ing. Zlata Sojková, CSc.

Random sampling

There are a lot of methods that can be used to select a sample from a population

• from the repetition point of view
• selection with replacement
• selection without replacement
• Classification based on the subdivision file
• simple random sample (finite or infinite population)
• or composite, which can be:.
• Based on choosing of groups
• Quota sampling …..e.t.c.

doc.Ing. Zlata Sojková, CSc.

Theory of Estimation (TO)

Repetition:

the main goal of theory of estimation is to estimate population parameters such: mju, sigma by using sample characteristics

There are two types of estimators

• Point estimate – bodový odhad
• Interval estimate – intervalový odhad

doc.Ing. Zlata Sojková, CSc.

Point estimation of population parameter Q (generally)
• Point estimator– is a single numerical value used as an estimate of population parameter Q - geometrically that means one point
• Estimate- estimator – abbrev.est.

sign: est Q = un

Q  un

Mostly we estimate :

• population mean 
• variance of population 2 and standard deviation of population 

doc.Ing. Zlata Sojková, CSc.

Attributes of point estimates

The best estimator satisfies (meets) following conditions:

• Unbiasedness - neskreslenosť (nevychýlenosť)
• Consistency - konzistencia
• Efficiency - výdatnosť

We eplain two first condition

doc.Ing. Zlata Sojková, CSc.

Unbiasedness

E(un - Q) = 0 E( un )= Q

we will repeat sampling more

times, always we will get some

another error – so we will get

another average .

According to the unbiasedness

we require that expected value of

all errors should be equal to zero. We require that all errors are only random, so we don’t underestimate or overestimate the mean of population.

Asymptotically unbiased estimator of Q is sample characteristic , which satisfy condition :

doc.Ing. Zlata Sojková, CSc.

Consistency

Principle of consistency lies in the law of large numbers. The consistency provides in statistical practice, that with increasing sample size the error of estimation decreases.

For large samples the error of estimation is very small

Sufficient condition of consistency is asymptotically unbiased estimation of un and meeting of the condition:

doc.Ing. Zlata Sojková, CSc.

Efficiency PE
• Any sample characteristic is a random variable, with some variance
• If we have two unbiased point estimators of the same population parameter, the point with the smaller variance is said to have greater efficiency.

doc.Ing. Zlata Sojková, CSc.

Point estimator of population mean 

! Standard deviation of average , mean standard error of estimation

While offers unbiased estimator of  and:

The sufficiency condition of consistence is satisfied and

is unbiased and consistent estimator of population mean

doc.Ing. Zlata Sojková, CSc.

Point estimator of variance 2 resp. 

Sample variance s 2isn’t unbiased estimator of population variance 2 -it offers negatively biased estimation.

Unbiasedness is equal to

The sample variance is asymptotically unbiased of  2,

while

doc.Ing. Zlata Sojková, CSc.

So, unbiased point estimator of population variance 2is sample variance s12, which is computed:

Difference between s12

and s2 is decreasing with increasing sample size n. At the sample size greater than 50,

( n > 50 ) difference is negligible

Bessel’s correction

Conclusion

doc.Ing. Zlata Sojková, CSc.

Example:At400 random households in one of the regions SR were investigated expenditures on alcoholic drinks and cigarettes. We will make point estimate of mean and standard error.

Estimated average error of mean is relatively small. It is only 1.5% of mean.We can expect that error in estimation of average expenditures on alcoholic drinks and cigarettes is not too large.

doc.Ing. Zlata Sojková, CSc.

Comparison of the statistical distribution of attributes X in the population to the distribution of sample average :

doc.Ing. Zlata Sojková, CSc.

P(q1  Q  q2) = 1-

q1,q2 – lower and upper limit of interval - random

 -risk of estimation

(1 - ) confidency level

/2

/2

q1

q2

Interval estimate of parameter Q

f(g)

doc.Ing. Zlata Sojková, CSc.

Interval estimation of population mean 

Suppose, that the statistical attribute has a Normal distribution X.....N(,2) ,

If we will choose a sample with the size of n,then aritmethic average has Normal distribution too .......N(, 2/n)

Confidence interval for  depends on disponibility of information and sample size:

a)If the variance of population is known (theoretical assumption) we can create standardized normal variables :

uhas N(0,1) independent on

estiamed value

doc.Ing. Zlata Sojková, CSc.

f(u)

1 - 

doc.Ing. Zlata Sojková, CSc.

After transformationwe get

 - sampling error

 -half of the interval,

determinates accurancy of the estimation,

Interval estimate is actually point estimate  , t.j.

doc.Ing. Zlata Sojková, CSc.

b)The population variance is unknown

est 2 = s12 , and the sample size is large, n > 30

We can use N(0,1)

c)If the population variance is unknown

est 2 = s12 , and the sample size is small (less than 30), n  30

t(n-1) –critical value of Student’s distribution at alfa level and at degrees of freedom

doc.Ing. Zlata Sojková, CSc.

Example: Based on the point estimator of household expenditure on cigarette and alcohol we will do interval estimation with 95% of probability

n=400

 = 1.96 * 14.3 = 28.03

973 - 28.03 <  < 973 + 28.03,

t.j 944.97 <  < 1 001.03

With 95% probability we estimate average expenditure from 945 Sk to 1001 Sk.

Excel... NORMSINV(0.975)

doc.Ing. Zlata Sojková, CSc.

Example:It was taken research to investigate the weight loss of carrot, after one week storage. 20 samples of 1 kgweight at the begining of the storage was analyzed and the loss of weight was identified. Average weight loss was 49g with sample standard deviation 4g.We assume, that weight loss have normal distribution. We will estimate average loss of weight with 95% confidence. Because n<30 we will use...

t(n-1) -kvantil Studentovho

rozdelenia, t0.05(19)=2.09

TINV(0.05;19) - Excel

With 95 % confidence, average weight loss of 1kg carrot sample is in interval 47.1g to 50.9g

doc.Ing. Zlata Sojková, CSc.

The large of confidence error  depends on the??
• confidence probability (1- )
• mean error of average which depends on:
• Variability of attributes - we can’t change it ,
• Sample size . That we can change !!!

The sample size which we need for achievement of reliability an accuracy

we can determinate using next formula:

doc.Ing. Zlata Sojková, CSc.

f(2)

/2

1 - 

/2

2 1-/2

2/2

Confidence Interval for variance  2a 

Critical values of CHÍ-square distribution

doc.Ing. Zlata Sojková, CSc.