Statistics Sampling and Sampling Distribution
STATISTICSin PRACTICE • MeadWestvaco Corporation’s products include textbook paper, magazine paper, and office products. • MeadWestvaco’s internal consulting group uses sampling to provide information that enables the company to obtain significant productivity benefits and remain competitive.
STATISTICSin PRACTICE • Managers need reliable and accurate information about the timberlands and forests to evaluate the company’s ability to meet its future raw material needs. • Data collected from sample plots throughout the forests are the basis for learning about the population of trees owned by the company.
Contents • The Electronics Associates Sampling Problem • Simple Random Sampling • Point Estimation • Introduction to Sampling Distributions • Sampling Distribution of p • Properties of Point Estimators • Other Sampling Methods
Statistical Inference • The purpose of statistical inference is to obtain • information about a population from • information contained in a sample. • A population is the set of all the elements of • interest. • A sample is a subset of the population.
Statistical Inference • The sample results provide only estimates of • the values of the population characteristics. • With proper sampling methods, the sample • results can provide “good” estimates of the • population characteristics. • A parameter is a numerical characteristic of a • population.
The Electronics Associates Sampling Problem • Often the cost of collecting information from a sample is substantially less than from a population, • Especially when personal interviews must be conducted to collect the information.
Simple Random Sampling:Finite Population • Finite populations are often defined by lists such as: • Organization membership roster • Credit card account numbers • Inventory product numbers
Simple Random Sampling:Finite Population • A simple random sample of size n from a • finite population of size N is a sample • selected such that each possible sample of • size n has the same probability of being • selected.
Simple Random Sampling:Finite Population • Replacing each sampled element before • selecting subsequent elements is called • sampling with replacement. • Sampling without replacement is the • procedure used most often.
Simple Random Sampling • Random Numbers: the numbers in the table are random, these four-digit numbers are equally likely.
Simple Random Sampling:Infinite Population • Infinite populations are often defined by an ongoing process whereby the elements of the population consist of items generated as though the process would operate indefinitely.
Simple Random Sampling:Infinite Population • A simple random sample from an infinite • population is a sample selected such that the • following conditions are satisfied. • Each element selected comes from the same • population. • Each element is selected independently.
Simple Random Sampling:Infinite Population • In the case of infinite populations, it is • impossible to obtain a list of all elements • in the population. • The random number selection procedure • cannot be used for infinite populations.
We refer to as the point estimator of the • population mean . Point Estimation • In point estimation we use the data from the • sample to compute a value of a sample statistic • that serves as an estimate of a population • parameter.
is the point estimator of the population • proportion p. Point Estimation • sis the point estimator of the population standard • deviation .
Point Estimation • Example: to estimate the population mean, the population standard deviation and population proportion.
Sampling Error • When the expected value of a point estimator • is equal to the population parameter, the point estimator is said to be unbiased. • The absolute value of the difference between an unbiased point estimate and the corresponding population parameter is called the sampling error.
Sampling Error • Sampling error is the result of using a subset • of the population (the sample), and not the • entire population. • Statistical methods can be used to make • probability statements about the size of the • sampling error.
for sample mean for sample standard deviation for sample proportion Sampling Error • The sampling errors are:
Example: St. Andrew’s St. Andrew’s College receives 900 applications annually from prospective students. The application form contains a variety of information including the individual’s scholastic aptitude test (SAT) score and whether or not the individual desires on-campus housing.
Example: St. Andrew’s The director of admissions would like to know the following information: • the average SAT score for the 900 applicants, and • the proportion of applicants that want to live on campus.
Example: St. Andrew’s We will now look at three alternatives for obtaining The desired information. • Conducting a census of the entire 900 applicants • Selecting a sample of 30 applicants, using a random number table • Selecting a sample of 30 applicants, using Excel
Conducting a Census • If the relevant data for the entire 900 applicants were in the college’s database, the population parameters of interest could be calculated using the formulas presented in Chapter 3. • We will assume for the moment that conducting a census is practical in this example.
Conducting a Census • Population Mean SAT Score • Population Standard Deviation for SAT Score • Population Proportion Wanting On-Campus Housing
Simple Random Sampling • Now suppose that the necessary data on the current year’s applicants were not yet entered in the college’s database. • Furthermore, the Director of Admissions must • obtain estimates of the population parameters of • interest for a meeting taking place in a few hours.
Simple Random Sampling • Now suppose that the necessary data on the current year’s applicants were not yet entered in the college’s database. • Furthermore, the Director of Admissions must obtain estimates of the population parameters of interest for a meeting taking place in a few hours.
Simple Random Sampling • The applicants were numbered, from 1 to 900, as their applications arrived.
Simple Random Sampling:Using a Random Number Table • Taking a Sample of 30 Applicants • Because the finite population has 900 elements, we will need 3-digit random numbers to randomly select applicants numbered from 1 to 900. • We will use the last three digits of the 5-digit random numbers in the third column of the textbook’s random number table , and continue into the fourth column as needed.
Simple Random Sampling:Using a Random Number Table • Taking a Sample of 30 Applicants • The numbers we draw will be the numbers of • the applicants we will sample unless the • random number is greater than 900 or • the random number has already been used. • We will continue to draw random numbers • until we have selected 30 applicants for our • sample.
Simple Random Sampling:Using a Random Number Table • (We will go through all of column 3 and • part of column 4 of the random number table, • encountering in the process five numbers • greater than 900 and one duplicate, 835.)
Simple Random Sampling:Using a Random Number Table • Use of Random Numbers for Sampling 3-Digit Random Number Applicant Included in Sample 744 No. 744 436 No. 436 865 No. 865 790 No. 790 835 No. 835 902 Number exceeds 900 190 No. 190 836 No. 836 . . . and so on
Simple Random Sampling:Using a Random Number Table • Sample Data Random Number SAT Score Live On- Campus No. Applicant 1 744 Conrad Harris 1025 Yes 2 436 Enrique Romero 950 Yes 3 865 Fabian Avante 1090 No 4 790 Lucila Cruz 1120 Yes 5 835 Chan Chiang 930 No . . . . . . . . . . 30 498 Emily Morse 1010 No
Simple Random Sampling:Using a Computer • Taking a Sample of 30 Applicants • Computers can be used to generate random • numbers for selecting random samples. • For example, Excel’s function • = RANDBETWEEN(1,900) • can be used to generate random numbers • between 1 and 900. • Then we choose the 30 applicants • corresponding to the 30 smallest random • numbers as our sample.
as Point Estimator of – • pas Point Estimator of p Point Estimation • s as Point Estimator of
Point Estimation Note: Different random numbers would have identified a different sample which would have resulted in different point estimates.
= Sample mean SAT score = Sample pro- portion wanting campus housing Summary of Point Estimates Obtained from a Simple Random Sample Population Parameter Parameter Value Point Estimator Point Estimate m = Population mean SAT score 990 997 80 s = Sample std. deviation for SAT score 75.2 s = Population std. deviation for SAT score .72 .68 p = Population pro- portion wanting campus housing
Sampling Distribution • Example: Relative Frequency Histogram of Sample Mean Values from 500 Simple Random Samples of 30 each.
Sampling Distribution • Example: Relative Frequency Histogram of Sample Proportion Values from 500 Simple Random Samples of 30 each.
The value of is used to make inferences about the value of m. The sample data provide a value for the sample mean . Sampling Distribution of • Process of Statistical Inference A simple random sample of n elements is selected from the population. Population with mean m = ?
E( ) = Sampling Distribution of The sampling distribution of is the probability distribution of all possible values of the sample mean . Expected Value of where: = the population mean
Sampling Distribution of • Standard Deviation of InfinitePopulation Finite Population • A finite population is treated as being • infinite if n/N< .05.
Sampling Distribution of • is the finite correction factor. • is referred to as the standard error of the mean.
Form of the Sampling Distribution of • If we use a large (n> 30) simple random sample, • the central limit theorem enables us to conclude • that the sampling distribution of can be • approximated by a normal distribution. • When the simple random sample is small (n < 30), • the sampling distribution of can be considered • normal only if we assume the population has a • normal distribution.
Central Limit Theorem • Illustration of The Central Limit Theorem
Relationship Between the Sample Size and the Sampling Distribution of Sample Mean • A Comparison of The Sampling Distributions of Sample Mean for Simple Random Samples of n = 30 and n = 100.
Sampling Distribution offor SAT Scores Sampling Distribution of
Sampling Distribution offor SAT Scores What is the probability that a simple random sample of 30 applicants will provide an estimate of the population mean SAT score that is within +/-10 of the actual population mean ? In other words, what is the probability that will be between 980 and 1000?
Sampling Distribution offor SAT Scores Step 1: Calculate the z-value at the upper endpoint of the interval. z = (1000 - 990)/14.6= .68 Step 2: Find the area under the curve to the left of the upper endpoint. P(z< .68) = .7517
Sampling Distribution offor SAT Scores Cumulative Probabilities for the Standard Normal Distribution