Psych 5500/6500

Psych 5500/6500 Statistics and Parameters Fall, 2008

Statistics Two uses of the term: • ‘Statistics’ is a branch of mathematics. • ‘Statistics’ are measures that arises from your sample. The mean, variance, and standard deviation of your sample are all ‘statistics’. Statistics are usually symbolized with Roman letters.

Parameters ‘Parameters’ are measures that arise from the population from which you sampled. The mean, variance, and standard deviation of the population are ‘parameters’. Parameters are usually symbolized with Greek letters.

Estimating Parameters While it is good to be able to describe your sample (using statistics) the goal of research is to understand the population from which the sample was drawn (i.e. to know the values of parameters). We usually cannot calculate the parameters directly, as that would require that we measure everyone in the population. Thus we need tools to estimate parameters based upon our sample data.

Desired Qualities of Estimators • Unbiased • Consistent • Relatively Efficient

Unbiased Any estimate of a population parameter based upon sample data is unlikely to be exactly correct. If several samples are drawn then the estimates of the population parameter are likely to vary across the samples. A method of estimating a parameter is called unbiased if the expected value of the estimate equals the parameter being estimated. The expected value of the estimate is the mean value that would be obtained if an infinite number of estimates were obtained.

Consistent A method for estimating a parameter is called ‘consistent’ if the probability of the estimate being close to the value of the parameter increases as the sample size increases.

Relatively Efficient A method of estimating a parameter is more ‘efficient’ than other methods if the variance of its estimates is less than the other methods. In other words, for any given N, a method is more efficient if its estimates are more closely clustered around the true value of the parameter than the estimates of the other method.

The Mean Statistic: the mean of the sample is a statistic, the formula for computing it is: Parameter: the mean of the population is a parameter, its symbol is μ, and the formula for computing it is:

Estimating μ The mean of the sample is an unbiased, consistent, and efficient estimate of the mean of the population. Note: be sure to indicate that this is an estimate of μ.

Improving our estimate Our estimate of μ has a higher probability of being close to correct if: • We increase N (remember ‘consistency’). • We decrease the variance of the variable we are studying.

The Variance Statistic: the variance of the sample is a statistic, the formula for computing it is: Parameter: the variance of the population is a parameter, its symbol is σ2 , the formula for computing it is:

Estimating σ2 The variance of the sample is a biased estimate of the variance of the population, as the expected value of the sample variances will be less than the variance of the population (in other words the variance of the sample is usually less than the variance of the population). See handout on why the variance of the sample is usually less than the variance of the population.

Unbiased Estimate of σ2 By dividing by (N-1) rather than by (N) we obtain an unbiased estimate of the population variance.

The Standard Deviation Statistic: the standard deviation of the sample is a statistic, its formula is: Parameter: the standard deviation of the population is a parameter, its symbol is σ the formula for computing it is:

Estimate of σ

The problem has to do with the distribution of error estimates around the true value of the standard deviation, taking the square root affects estimates that are too high differently than it affects estimates that are too low. Example: say σ² = 81 and so σ = 9 Sample One: est. σ² = 70 (11 below σ²) Sample Two: est. σ² = 92 (11 above σ²) But: Sample One: 70 = 8.36 (.64 below σ) Sample Two: 92 = 9.59 (.59 above σ)

What this Means Despite that we will still use the second formula. The bias of the estimate of σ is kept in the back of our minds but is not important, because the context in which we will use this ‘est. σ’ will take the bias into account.

Formulas

Useful Formulas for ‘Going Back and Forth’

Other Texts and Software (1) Some texts use ‘S²’ to represent the variance of the sample (like I do) but use ‘s²’ (lower case ‘s’) rather than ‘est. σ²’ to refer to the estimate of the population variance. They then use ‘S’ to represent the standard deviation of the sample and ‘s’ rather than ‘est. σ’ to refer to the estimate of the population standard deviation.

Other Texts and Software (2) Many texts use the term ‘sample variance’ to refer to the estimate of the population variance based upon the sample (est. σ²), rather than to the actual variance of the sample, and they have no term for and never refer to the actual variance of the sample. I prefer to use the term ‘sample variance’ to refer to the actual variance of the sample. The best way to tell which variance is being referred to in a context outside this class is to look for whether the formula uses N in the denominator or N-1.

Other Texts and Software (3) What SPSS calls ‘Variance’ is: SS/(N-1), the estimate of the population variance based upon the sample data (est. σ²). What it calls ‘Standard Deviation’ is the square root of that (est. σ). SPSS doesn’t tell you that and its ‘Help’ menu doesn’t either. This is one of the challenges of using statistical software, trying to determine exactly what it is giving you. In this case I found out what it was by computing S² and est. σ² with a calculator and then seeing what value SPSS gave me for the variance of the data.

Descriptive and Inferential Statistics Descriptive statistics are those that describe the sample: Inferential statistics are those that make inferences about the population. They ‘arise from the sample’ but are used to make estimates about the values of the parameters:

Confidence Intervals Making an estimate of a parameter does not inform us about how far off that estimate might be, we simply know the estimate is unbiased (i.e. across samples the mean of the estimates equals the value of the parameter). It is useful to be able to generate a range of possible values of the parameter.

Confidence Intervals of the Mean Let’s say our sample is as follows: Y = 88, 85, 92, 90, 79, 84, 93, 72, 84, 99 This is our single best estimate of μ but it is unlikely to be exactly correct. It is also possible to generate ‘confidence intervals’ concerning μ which will shed light on how far off that estimate might be. We will look at how to compute these in a later lecture, here will we take a look at what they are.

Confidence Intervals Y = 88, 85, 92, 90, 79, 84, 93, 72, 84, 99 est. μ = 86.6 (This is called a ‘point estimate’). 95% confidence interval: 81.14  μ  92.06 This is the interval that we are 95% confident contains the true value of μ. 99% confidence interval: 78.76  μ  94.44 This is the interval that we are 99% confident contains the true value of μ.

Understanding Confidence Intervals 95% confidence interval: 81.14  μ  92.06 99% confidence interval: 78.76  μ  94.44 • Note that the 99% confidence interval is larger than the 95% interval. To be more confident that the interval contains the true value of μ we need to make the interval larger.

Understanding Confidence Intervals • Confidence intervals get narrower (which is good as it gives us more precision in our estimate) as N increases or variance decreases.

Effect of increasing N. To demonstrate this I’ll simply repeat each score in the sample twice (to simulate doubling N while keeping the variance and the mean of the sample the same): Y = 88, 85, 92, 90, 79, 84, 93, 72, 84, 99, 88, 85, 92, 90, 79, 84, 93, 72, 84, 93 95% confidence interval when N=20: 83.12 £ μ £ 90.08 Compare to 95% confidence interval when N=10: 81.14  μ  92.06 Greater N led to narrower (more precise) confidence interval.

Effect of decreasing variance. To demonstrate this I’ve gone back to an N of 10 but have decreased the variance (without changing the mean): Y = 87, 86, 91, 89, 80, 85, 93, 78, 86, 91 95% confidence interval when S²=20.64: 83.17 £ μ £ 90.03 Compare to 95% confidence interval when S²=52.44 : 81.14  μ  92.06 Less variance led to narrower (more precise) confidence interval.

Understanding Confidence Intervals • a) One common mistake is to say that if our 95% confidence interval is: 47  μ  53, then that means that 95% of our sample means will fall in that range. The confidence interval, however, is about the possible values of μ, not about the possible values of the sample mean. b) Another common mistake is to say that there is a 95% chance that μ is between 47 and 53. What is correct, however, is to say that the formula for computing the confidence interval will produce an interval that contains the true value of μ 95% of the time. See supplemental handout.

Other Confidence Intervals Confidence intervals are available for other parameters as well, including the variance and the standard deviation.

Psych 5500/6500

Psych 5500/6500

Presentation Transcript

Psych 1: Growth & Development

Experimental Designs

Psychological Assessment and Treatment of Pain

Chimpanzee “ Language ”

Section 9: Basic Psychiatric Terminology

Nexus 2000 Connectivity - Supported Topologies

TROI – 5500, 5700, 7700 3D SPI Series

“Calm Down” and Other Stuff not to Say to your Anxious Child

Introduction to PsychToolbox in MATLAB

Download the slides now

Introduction to PsychToolbox in MATLAB

2004 AP Psych Test

Studies on Concept Formation

Employee Benefit Plan Audit Quality Center

Substance Use Disorders

Chimpanzee “ Language ”

Psych 3450 Fall 2014

Introduction