Confidence Intervals in Statistics and Research

HLST/RECL 3P07 Estimating the Confidence Interval for a Sample Mean

Re-consider our sampling rules • In research, we often collect information from selected individuals. • Since the procedures for information collection cost money, researchers are forced to make the assumption that the selected individuals are a good representation of a larger group. • In quantitative analyses we call the larger group the population, (represented by the letter 'N'), and we call the selected subgroup the sample (represented by the letter 'n').

Most Important! • If we assume that our sample represents a population, then we must also assume that any computations, estimates, or inferences based on the numbers from the sample, must also represent the population from which the sample was selected

As such, the average score computed for the sample is assumed to represent the average score for the population; • Likewise, the variablility of scores within the sample (the subgroup) should represent the variability of the scores within the population (the larger group); and • The standardized estimate of the differences computed for the sample should represent the standardized estimate of differences computed for the population.

Therefore: µ = sample mean ± (sampling error) where: • µ refers to the measure of central tendency for the population • sample mean refers to the measure of central tendency in the sample • error due to randomness

Two basic assumptions • First, we assume that the sample mean is only our best estimate of the true population mean. • Second, we assume that the chance associated with the sample mean's ability to represent the true population mean is dependent upon the ability of the sample scores to represent the population scores. • So that by adding or subtracting the sampling error to or from the sample mean we will be able to identify the range within which the true population estimate falls.

Determine the width of the sampling error • The term sampling error refers to the errors made in the collection of the data. • Sampling error is expected and should thus be accounted for in the computation of the estimates which represent the data. • Researchers state that the estimates (measures of central tendancy, frequencies, or ratio estimates) produced from a selected sample are expected to represent the true population estimate within a specific range.

For example, the researcher states that: They are 95% confident that the sample mean represents the true population mean within 10% error. • Typically, researchers indicate that they would like to be at least 95% confident that the sample mean is an estimate of the population mean. Therefore, the researcher is suggesting that 19 out of 20 times the sample mean ± sampling error will include [µ]

The confidence interval is based on the following relationship between the sample mean and the true population mean or [µ]: lower limit of sample mean < µ < upper limit of sample mean • This sentence is read as: The lower limit of the sample mean is less than the true population estimate which is less than the upper limit of the sample mean.

Unpacking the concept of a 95% C.I. So how does 95% fit into this computation. Given that this exercise is essentially to demonstrate to the research community how the set of sample scores are associated with the true set of population scores, then we need to find some way of relating the sample distribution to the population distribution (or how is the set of scores for the sample related to the set of scores for the population).

Unpacking the concept of a 95% C.I. • One way to illustrate such a relationship is to standardize the scores for both the sample distribution and the population distribution. • In statistics when we want to standarize an estimate we typically relate the estimate to a device called the normal curve.

The Normal Curve • The normal curve is a graphical representation of the standard normal distribution (ie. the frequency distribution graph of an expected distribution of scores within a "normal population"). By using the normal curve, researchers can describe how closely their sample distribution represents a population distribution.

Unpacking the Normal Curve • Understanding the role of the normal curve is important to inferential statistics. • The normal curve is a graphical presentation of the frequency distribution for a set of standardized (or adjusted) scores. • The standardized scores are ratio scores based on the difference between any score within a set of scores and the measure of central tendency for that set of scores, divided by the standardized error attributed to that set of scores.

A sample data set to compute z scores xi (xi-x) (xi-x)2 13 13 - 31.2 = -18.2 331.24 19 19 - 31.2 = -12.2 148.84 30 30 - 31.2 = -1.2 1.44 43 43 - 31.2 = 11.8 139.24 51 51 - 31.2 = 19.8 392.04 156 0 1012.8 xi= 156mean = 156/5 = 31.2 (xi-x)= 0(xi-x)2 1012.8= 253.2 n-1 5-1 s2= = s=¦253.2 = 15.91

The set of standard scores computed from the original observations is also called the set of "z" scores and can be computed by the following formula: (xi - x ) z = standard deviation • Therefore, for any set of numbers we could create a set of z scores or standard scores.

(xi-x) s Use the mean, and s to compute z xi (xi-x) (xi-x)2z=(xi-x)/s 13 13 - 31.2 = -18.2 331.24 13-31.2/15.91 = -1.14 19 19 - 31.2 = -12.2 148.84 19-31.2/15.91 = -0.77 30 30 - 31.2 = -1.2 1.44 30-31.2/15.91 = -0.08 43 43 - 31.2 = 11.8 139.24 43-31.2/15.91 = 0.74 51 51 - 31.2 = 19.8 392.04 51-31.2/15.91 = 1.25 156 0 1012.8 zi= The range of z scores in this sample is from-1.15 to 1.25

.... and likewise, • For any set of z scores a percentile estimate can be attributed to each z score. • This has been shown several times and is commonly known as the Z table of estimates or the Table for the normal curve. • Conversely then for any percentile we could determine a standardized estimate or z score. That is, we could determine the z score for a percent of confidence such as the 95% confidence value.

To read the table of the normal curve we proceed through the following stepwise procedures: i) determine how confident you want to be that the true population mean is captured by the range of the sample mean. For example, 95% confident. ii) divide the selected confidence value by 100 to eliminate the per cent value. 95/100 = 0.95 iii) divide the quotient by 2 (the 2 is used to designate that you are interested in a two-tailed test), as in the following example using 95%: 0.95/2 = 0.4750

iv) find the value (i.e. 0.475) in the normal distribution table (also called z table or table of z scores) and move to the left column to identify the prefix of the Z score, then move to the top row of the table to find the trailing numbers of THE Z SCORE. The score you are compiling from the left vertical column combined with the top horizontal row is called the z score attributed to the given % confidence value. For example, if we wish to identify the z score associated with a 95% confidence interval work through the following steps with the table: 95/100 = 0.95 0.95/2 = 0.4750

go to the table and find the number 0.4750 z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 0.1 0.2 . . . 1.9 0.475 Once we identify the number within the table look to the left column to find the prefix of the z score (e.x. 1.9); then look to the horizontal row across the top of the table to find the trailing numbers (in this example the trailing numbers are .06). when we combine 1.9 with 0.6 we see a z score of 1.96. The z score for 95% C.I. is 1.96.

You may be asked to find the z score associated with the two tailed alpha coefficient. This is the same as asking you to find the two tailed z score associated with a 95% confidence level as you have just done. The term alpha coefficient is computed by subtracting the % confidence level from 1 as in the following example.

given: 95% confidence 95/100 = .95 alpha coefficient = 1-.95 = .05 In this example the formula for confidence interval may be written as: µ = sample mean ± zalpha/2 * (sampling error)

The basic premise of estimation and confidence intervals: µ = sample mean ± (sampling error) where: • µ refers to the measure of central tendency for the population • sample mean refers to the measure of central tendency in the sample • error due to randomness

X (xi-x)2 n-1 s 2 = (xi-µ)2 N 2 = Estimates in the sample vs. estimates in the population µ

Our formula for computing the confidence interval includes the z score associated with the 95% confidence interval, as follows: µ = sample mean ± Z95% * (sampling error) • or written in a useful form as: µ = sample mean ± 1.96 * (sampling error)

All that remains in computing confidence intervals is to determine the estimate of the error of the sample selected or the sampling error. • This error is also called the standard error of the mean, and is a measure of “the extent to which the sample means can be expected to vary due to chance”. • In other words, the standard error of the mean is “an estimate of the error associated with the observed mean in this specific sample”, and is due to the sampling characteristics associated with this sample.

The standard error of the mean, is computed by the formula: OR IN WORDS: The standard error is equal to the standard deviation of the sample divided by the square root of the number of subjects in the sample.

Example: Compute the confidence interval at 95% for a given sample mean. mean = 58 ± s = 13 for n=25 subjects • i) compute s.e. using s.e. = 13 /¦25 s.e. = 2.6 • ii) the 95% confidence interval for the mean = 58 is: 58 ± [1.96 * 2.6] 95% confidence interval is 58 ± 5.1 • Which means that there is a 95% probability or chance that the range 52.9 and 63.1 will capture the true population mean µ .

Confidence Intervals in Statistics and Research

Confidence Intervals in Statistics and Research

Presentation Transcript

The Higher Education Academy Subject Hospitality, Leisure, Sport and Tourism HLST Network SPRIG Conference Glasgow 12th

Hospitality, Leisure, Sport and Tourism Network (HLST) A Higher Education Academy Subject Centre 2010-2011 What can HLS

Electronic Health Record

Managing Change and Security

HLST Student Conference

HLST 2040 – Lec 2

HLST Student Conference