Introduction to Statistics in Psychology: Fundamentals and Terminologies

Statistics • This lecture covers chapter 1 and 2 sections 3.1-3.2 in Howell • Why study maths in psychology? • “Mathematics has the advantage of teaching you the habit of thinking without passion. You learn to use your mind primarily upon material where passion can’t come in, and having trained it that way you can then use it passionately upon matters about which you feel passionately. Then you’re much more likely to come to true conclusions” - Bertrand Russell

Statistical terminology • 2 types of statistics: • Descriptive - describe a sample or population • Inferential - draw inferences about relationships between samples and populations • Samples and populations: • Population: complete set of events we are investigating (eg all IQ scores) • Sample: subset of a population (IQ scores of 10 people)

Terminology 2 • Statistics and parameters: • Statistic: a number which speaks about a sample (abbreviated with a latin letter, eg. s) • Parameter: a number which speaks about a population (abbreviated with a greek letter eg ) • Variable: • a property of an object/event that is measured

Variables • Statistics allows one to look at variables • behaviour • relationships between variables • Types of variables: • Discrete variables: can only take on certain values, eg: • 1 2 3 4 5 …. (only whole numbers) • 1.5 2 2.5 3 3.5 4 4.5…. (whole numbers and halves) • Examples: gender, number of children, sexual preference

Variables (2) • Continuous variables • Can take on any value • (there exists a value between any two values) • eg: 1, 1.1, 1.11, 1.111, 1.1111, 1.11111….. • Examples: length, age, IQ, dosage of Valium • For stats, all variables must contain only numbers • convert “word” values into numbers • eg: male/female becomes 100/101

Scales of measurement • Not all statistical techniques can be applied to all types of variable • which is more - male or female? • By looking at the property a variable represents, and how that property was measured (its scale), we can decide if a particular technique is appropriate

Nominal scale • Simply labels items • 723 = male, 742=female, 857=Prince • Differences between numbers mean nothing • Order of numbers mean nothing • Often expressed as words rather than numbers • Cannot do very much stats with nominal scales

Ordinal Scale • Labels items, puts them in order • Eg expense 1 = Woolworths, 2 = Pick n Pay, 3 = Shoprite • Differences between numbers mean nothing • eg. 4 is not twice as bad as 2 • Order is important • eg. 1 is the best, 5 is worse than 1-4 but better than 6 down, etc. • Useful in ranking items (highest to lowest) when specific values are not important

Interval Scale • Order is important, as is the difference between points • eg. Degrees celcius: 10 C is the same distance from 0 C as 40 C is from 50 C • BUT: it has no absolute zero, so cannot speak about multiplication • eg. “40 is twice as much as 20” - WRONG! • Most Likert-type items are of this scale

Ratio Scale • The most versatile: has differences and multiplication • 40 is twice as much as 20, AND 40-30 = 110-100 • It is like an interval scale, but has an absolute zero. • Very few in psychology: IQ is the best known

Notes on the scales • Discrete variables may be on the nominal or ordinal scales only • Continuous variables can be on any, mostly interval & ratio • Difficult to decide what scale a variable belongs to • “Absolute zero” is contentious • Making a wrong decision can lead to silly stats - the average family has 2.3 children!!

Frequency • A descriptive statistic • Applies to all scales of measurement • Asks: How often did particular things come up? • Mostly a matter of counting!

Expressing frequency • Work with four varieties of frequency • Frequency: how often did this observation occur? • Eg. How many males in this sample? • Cumulative frequency: how often has this score, or scores less than this score, occurred? • Eg. How many people scored 25 marks or less for the test?

Expressing frequency • Percentage frequency: frequency expressed as a percentage of all observations • Eg. 52% of all Capetonians are male • Percentage cumulative frequency: cumulative frequency expressed as a percentage of all observations • Eg. 30% of the class failed the test

Frequency tables • All 4 types of frequency are summarised on a frequency table, which has the columns: • Value F Cum. F %F % Cum F.

Making a freq table - discrete var • Given a sample of x, a discrete variable which ranges from 1-6: • 3 3 5 2 4 3 3 5 6 2 4 • Start the table by putting in the values: • Value F Cum F %F % Cum F • 1 • 2 • 3 • 4 • 5 • 6

Working out F • Add in the F - count how often each value occurs, add it in • Value F Cum F %F % Cum F • 1 0 • 2 2 • 3 4 • 4 2 • 5 2 • 6 1

Working out Cum. F • Add the F for this value to the Cum.F score for the previous value • Value F Cum F %F % Cum F • 1 0 0 • 2 2 2 • 3 4 6 • 4 2 8 • 5 2 10 • 6 1 11

Working out %F • Count the total number of observations, n (11) • For each value, divide F by n, multiply by 100 • Value F Cum F %F % Cum F • 1 0 0 0% • 2 2 2 18% • 3 4 6 36% • 4 2 8 18% • 5 2 10 18% • 6 1 11 9%

Working out % Cum. F • Count the total number of observations, n (11) • For each value, divide Cum. F by n, multiply by 100 • Value F Cum F %F % Cum F • 1 0 0 0% 0% • 2 2 2 18% 18% • 3 4 6 36% 55% • 4 2 8 18% 72% • 5 2 10 18% 90% • 6 1 11 9% 100%

Things to remember • The Cum. F. for the last value must be the same as n • The % Cum. F. for the last value must be 100% • Cum.F and % Cum. F. always get bigger as you go down

Distribution of a variable • The frequency table tells us how x is distributed • The proportion of high and low scores; what scores come up most often; how • “wide” or “narrow” the data is • Distributions tells us what we can expect from a variable - which scores are likely and which are unlikely?

Example: distribution of x • Look at the freq table: • Value F Cum F %F % Cum F • 1 0 0 0% 0% • 2 2 2 18% 18% • 3 4 6 36% 55% • 4 2 8 18% 72% • 5 2 10 18% 90% • 6 1 11 9% 100% • Which values are most likely to occur again? (3 and 2, 4, 5) • The data are widely spread (from 2 all the way to 6)

Drawing a picture of x • We can draw a histogram of x to see things better: Shows distribution visually - handy to understand what is happening

Drawing histograms • Very simple: Use the F column from the table • For each value, draw (in scale) a bar of the height represented by F • Do this for all values • Remember: label the X and Y axes (X: variable name; Y: “Frequency”)

Frequency of continuous variables • Problem: cannot write all the values of a continuous variable: • value: 1, 1.1, 1.111, 1.1111, 1.111111…. • Infinitely many! • This problem can be overcome by using data buckets

Buckets • A bucket is a range of values which you group together, eg [2-3], [3-4]…. • Here, the first bucket holds all values gretaer than or equal to 2 and less than 3, the second all values greater than or equal to 3, less than 4, etc. • Each value in the dataset is placed into a bucket • Once buckets are created, you make a frequency table and histogram in the normal way

Bucket example • x is a continuous variable, from which a sample is drawn: • 2.2, 3.5, 3.75, 2.34, 5.33, 3.2, 3.51 • Use the following buckets: • [0 - 1.5], [1.5 - 3], [3 - 4.5], [4.5 - 6]

Bucket example: F • Bucket F • [0-1.5] 0 • [1.5-3] 2 • [3-4.5] 4 • [4.5-6] 1 • CF, %F, and %CF are worked out as before. A histogram is drawn as before, but labelling the X axis with the buckets.

Introduction to Statistics in Psychology: Fundamentals and Terminologies