Statistical Inference: Estimation for Single Populations Chapter 8 MSIS 111 Prof. Nick Dedeke PowerPoint presentations prepared by Lloyd Jaisingh, Morehead State University
Learning Objectives • Know the difference between point and interval estimation. • Estimate a population mean from a sample mean when s is known. • Estimate a population mean from a sample mean when s is unknown. • Estimate the population variance from a sample variance. • Estimate the minimum sample size necessary to achieve given statistical goals.
UnknownParameters:Population meanpopulation variance KnownStatistics:sample meansample variancez-value Concept of Inferential Statistics • In inferential statistics, the objective is to estimate parameters of a large sample using the statistics of a smaller sample drawn from it.
Example: Concept of Estimation • Three managers wanted to investigate absenteeism in their organization. Each of them took a random sample of 2,000 employees. Here are the results: • Bill’s sample yield average of 4 days per year. • Chen’s sample yielded average of 3.2 days per year. • Ayo’s sample yielded average of 3.7 days per year.What should we accept as the average absenteeism for all the 10,000 employees of the firm?
Concept of Confidence Level • After one has specified an interval, the question becomes the following: How confident one is that the population parameter will truly lie in the range we define? • This is an area where central limit theorem may help us. • Central limit theorem states that, given a sufficiently large sample size, the distribution of the sample means would be normally distributed.
X X X X Confidence Level Distributionof the means of all samplesdrawn from the population. Z If we picked three different samples, and calculated the sample means and intervals, we could have the intervals shown above.We see that the three different intervals, of same width, would include the population mean.
X X X X Confidence Level 95% confidence linesare defined toensure that thearea betweenmean and z is 0.95/2. The area in thegrey area is 0.95. Z 95% Confidence level means that if one took several different samples from the population, and calculated the sample mean, 95 out of 100 sample means would fall within the area.
Confidence Level and Interval Estimates 40%confidenceinterval line 60%confidenceinterval line 95%confidenceinterval line X Z We see that the three different intervals presented are of differentwidth. Specifically, to have larger confidence, the interval estimateis wider. Narrower interval estimates reduce our confidence that population mean parameter would lie in interval.
μ xmax xmax Xs Xs xmax xmin xmax xmax Known Population Standard Deviation • The following presents two samples that were taken from the same population. In the first case the mean is higher than the population mean in the other case it is lower.
μ xmax xmax Xs Xs xmax xmin xmax xmax Confidence Interval Estimates • Interval estimate approach defines upper and lower limits around the sample mean using confidence levels. If the acceptable mean of population falls within the limits, the population is accepted, if not it is rejected. Confidenceinterval #1 Confidenceinterval #2
Inferential Statistics Assumptions • For interferential statistics to be accurate, some assumptions must be fulfilled: • The process that the objects or entities passed through are stable, i.e. the variations in attribute observations are not due to special causes • The sample is statistically drawn from the population. • The sample is large enough to represent the population. • The distribution of values for the attribute of the sample and population could be assumed to be normal. • Having statistical estimates about a population can be reasonably used as a basis for decision-making
Statistical Estimation • Point estimate -- the single value of a statistic calculated from a sample which is used to estimate a population parameter • Interval Estimate -- a range of values calculated from a sample statistic(s) and standardized statistics, such as the z • Selection of the standardized statistic is determined by the sampling distribution. • Selection of critical values of the standardized statistic is determined by the desired level of confidence.
Concept of Inferential Statistics Z statistic can be used if both the sample mean and sample standard deviation and the population standard deviation are known. UnknownParameters:Population meanknownpopulation standarddeviation KnownStatistics:sample meansample variancez-value
Confidence Interval Estimate for when is Known • Point estimate • Interval Estimate
X Z 0 Distribution of Sample Meansfor (1-)% Confidence
X Z 0 Areas Under Curve: (1-)% Confidence
X Z 0 Distribution of Sample Meansfor (1-)% Confidence
.025 .025 95% .4750 .4750 X Z -1.96 0 1.96 Distribution of Sample Means for 95% Confidence
95% X X X X X X X 95% Confidence Intervals for
X X X X X X 95% Confidence Intervals for Is our interval, 500.22 519.78, in the red? 95% X
Concept of Inferential Statistics Z statistic can not be used if the population standard deviation is unknown. If distribution is not normaland sample size exceeds 30.We can estimate the parameter. UnknownParameters:Population mean Population standarddeviation KnownStatistics:sample meansample variancez-value
Confidence Level z/2 Value Exercise: Derive Z Values for Common Levels of Confidence 90% 95% 98% 99% ?? 1.96 ??? ??? P(z/2) = [0.5 –(1-0.95)/2)] = 0.5 – 0.025 = 0.475 = from page 788 Table A5. z/2 = 1.96
Estimating the Mean of a Normal Population: Unknown • The population has a normal distribution. • The value of the population standard deviation is unknown. • z distribution is not appropriate for these conditions • t distribution is appropriate
The t Distribution • Developed by British statistician, William Gosset • A family of distributions -- a unique distribution for each value of its parameter, degrees of freedom (d.f.) • Symmetric, Unimodal, Mean = 0, Flatter than a z • t formula
Standard Normal t (d.f. = 25) t (d.f. = 5) t (d.f. = 1) -3 -2 -1 0 1 2 3 Comparison of Selected t Distributions to the Standard Normal
t0.050 t0.100 t0.025 t0.010 t0.005 df 1 3.078 6.314 12.706 31.821 63.656 2 1.886 2.920 4.303 6.965 9.925 3 1.638 2.353 3.182 4.541 5.841 4 1.533 2.132 2.776 3.747 4.604 5 1.476 2.015 2.571 3.365 4.032 23 1.319 1.714 2.069 2.500 2.807 1.711 24 1.318 2.064 2.492 2.797 25 1.316 1.708 2.060 2.485 2.787 t 29 1.311 1.699 2.045 2.462 2.756 30 1.310 1.697 2.042 2.457 2.750 With df = 24 and a = 0.05, ta = 1.711. 40 1.303 1.684 2.021 2.423 2.704 60 1.296 1.671 2.000 2.390 2.660 120 1.289 1.658 1.980 2.358 2.617 1.282 1.645 1.960 2.327 2.576 Table of Critical Values of t
Determining Sample Size when Estimating • z formula • Error of Estimation (tolerable error) • Estimated Sample Size • Estimated