Learning outcomes

Introduction to Hypothesis TestingDr Jenny FreemanMathematics & Statistics HelpUniversity of Sheffield

Learning outcomes By the end of the session you should: • Understand what is meant by a probability distribution • Understand the terminology needed for basic hypothesis testing • Understand the difference between a statistically significant difference and a meaningful difference

Download the slides from the MASH website MASH > Resources > Statistics Resources > Workshop materials

Frequency distribution: birthweight • A histogram is a frequency distribution • Histograms are commonly used to look at the spread and shape of the data • Most babies are in the middle with fewer babies at the extremes • The % of babies in the sample between 3 and 4 kgs= 60% • 15% of babies were above 4kg • Would you expect the same results with a different sample? These data look like they are normally distributed…

Probability distributions example: Normal distribution Probability curve: • a very smooth histogram used to estimate probabilities/ percentages in the population • It describes the theoretical distribution of values, for a population with the same mean and standard deviation as the sample data • Sample data can be used to estimate population probabilities/ percentages • For the normal distribution, if we have the sample mean and standard deviation, the population probability curve and associated probabilities can be estimated

Probability distributions example: Normal distribution • The Normal distribution is: • Bell shaped and symmetrical about the mean • Completely described by the mean and standard deviation (i.e. if you know these two quantities, you can draw the entire curve) • Sometimes called the Gaussian distribution after the German mathematician Carl Friedrich Gauss (1777 to 1855) • The mean and the median are the SAME for normally distributed data

Estimating probabilities • We can think of this theoretical curve as representing the Normal probability distribution • The total area under the curve = 1. Can you think why? • It can be used to estimate the probability of individuals having a particular range of values

Estimating probabilities The shaded area represents the probability (p) of obtaining a value greater than a; i.e. P(X > a) = p For example, we could look at P(having a birthweight greater than 4kg)

Using tables for probabilities X = birth weight For ‘greater than’ probabilities, the probabilities get smaller as x increases What if we want to calculate probability of a baby weighing more than 4.2kg?

Using tables for probabilities • Probabilities tabulated for distribution with mean = 3.4, SD = 0.57 What is the probability that a baby weighs more than 4.2 kgs?

Estimating probabilities • The shaded area is the probability of getting a value in the values of a and b: P(a<X < b) • Given that probability tables usually show P(X> a), how might we work out P(a < X < b)?

Estimating probabilities • Note that for continuous probability density functions, we estimate the probability for an interval (based on area), NOT the probability for a single value • This is because as the interval gets smaller, the area gets smaller such that for a single value, the area is zero and thus the exact probability for a single value is 0 See: http://davidmlane.com/hyperstat/z_table.html

Exercise 1: Normal probabilities X = Birthweight; Mean = 3.4kg, SD = 0.57kg What’s the probability of a baby weighing: • More than 4.5kg • More than 2.3kg

Exercise 1: Normal probabilities X = Birthweight; Mean = 3.4kg, SD = 0.57kg What’s the probability of a baby weighing: • Less than or equal to 2.3kg • Between 2.3kg & 4.5kg

Key properties of normal distribution • Values are often discussed as being a number of standard deviation from the mean : • 68% of data lie within approximately 1 standard deviation above and below the mean

Key properties of normal distribution • The middle 95% is often used to describe ‘most people’ • For normally distributed data 95% of data lie within 1.96 standard deviations above and below the mean (Sometimes rounded so approximately 95% of data lie within 2 standard deviations of the mean) • Limits are calculated as:

Example What is the birthweight of most babies? (mean = 3.4kg and SD = 0.57kg) Limits are calculated as: • First calculate 1.96 SD’s: 1.96 x 0.57 = 1.12 • Lower limit: 3.4 – 1.12 = 2.28 • Upper limit: 3.4 + 1.12 = 4.62 95% of babies weigh between 2.28 and 4.62kg

Travel time to work (sample of 30 journeys) X = travel time in minutes We can use this sample to estimate probabilities in the general population

Exercise 2 By how much can I expect my journey time to vary for direction A? (mean = 32.8 minutes and SD = 4.6 minutes) Limits are calculated as: • First calculate 1.96 SD’s: x = • Lower limit: – = • Upper limit: + = 95% of journey times are between and minutes

Commonly used ranges (for info) • Interquartile range contains the middle 50% of values for sample data • Measurements for people are often divided into percentiles e.g. ‘child’s height is in the bottom 20% for their age’.Normal ranges are based on sample data but are used to represent individuals in the population. They are also known as reference ranges • Confidence interval is used to give a range of values for a population parameter e.g. mean (discussed in the next section)

Standard Normal Distribution • A different probability distribution is needed for every combination of mean and SD • Before computers, one special distribution (z) with a mean of 0 and SD of 1 existed

How do we get from a distribution with a mean ≠0 or SD ≠1, to the standard normal distribution, which has a mean of 0 and SD of 1?We standardised!As standardisation is used in other parts of statistics, we will cover it here

Distributions with mean ≠ 0 Standardise data in order to get mean = 0, SD = 1. Standardise using the following formula (where x is the original score for an individual and z represents the transformed score): Standardise all values Mean = 32.8; SD = 4.6 Mean = 0, SD = 1 Note: Z is sometimes called a Z score or a standard deviation (SD) score

Example: 11 journeys to work

Exercise 3: calculating Z scores A baby is born weighing 4.5 kg. Given the mean weight is 3.4 and SD is 0.57, calculate the Z score for this baby x = Individual score of interest • Is this within the 95% normal range? • Use the normal distribution one-sided probability table to calculate the probability of getting a Z score above 1.93

Table: Normal curve tail probabilities (one tailed). Standard normal probability in right-hand tail

Other probability distributions • Other distributions can be used to calculate probabilities • Depends upon data type, distribution, question to be answered • Each has a particular statistic that you calculate. Here are a few examples (there are many others): F distribution: f statistic t distribution: t statistic χ2 distribution χ2statistic • The Z statistic is used for the standard normal distribution

Hypothesis testing

Populations and samples Taking a sample from a population Sample data are used to ‘represents the whole population

Types of statistics • Descriptive statistics summarise and describe sample data we have collected • Inferential statistics are obtained when we use sample data to infer something about the wider population

Hypothesis testing • Hypothesis testing is a method of making decisions about populations using sample data • Sample data are used to decide which of two possible statements about a population is most likely to be true • We do this by comparing what we have observed to what we expected

Hypothesis testing: main steps

Define study question • Think carefully about the main research question. What do you want to know? • What variables will be used to test the question? There are specific tests for different types of data • Think about the analysis before carrying out the study

Null and alternative hypothesis (H0 & H1) State your nullhypothesis (H0) (statement you are looking for evidence to disprove) State your study (alternative)hypothesis (H1 or HA) which is usually the opposite of the null hypothesis

The court case • Members of a jury have to decide whether a person is guilty or innocent based on evidence Null: The person is innocent Alternative: The person is not innocent • The null (innocent) can only be rejected if there is enough evidence to disprove it

The court case • A man may be guilty or innocent of a crime • He is presumed innocent unless there is evidence to suggest otherwise • Members of a jury have to decide whether a person is guilty or innocent based on evidence Decision: Convict/Release

Null and alternative hypothesis (H0 & H1) • For comparing means the null is: There is no difference in the population means Where: μA is the population mean for group A μBis the population mean for group B • When investigating relationships the null is: There is no association between x and y

Exercise 4: Hypotheses What would the null and alternative hypotheses be for these research questions? • Did class affect survival on board the Titanic? • Do students who attend MASH workshops do better in their statistics module than those who do not?

Example: Module marks 10 students who attended a MASH workshop and 10 students who did not attend a MASH workshop Results: Can we conclude that there is a difference between the populations?

Variation in single samples Every sample taken from a population will contain different numbers so the difference between means varies between samples

Test Statistic Test Statistic is a number calculated from sample data to decide whether to reject the null hypothesis about the population or not. It varies between different test and looks at what we observed and compares it to what we would expect under the null hypothesis. For our test (final module mark): Test statistic =

P values If you repeated a study numerous times you would get a variety of test statistics which form a distribution P-value = Probability of getting a test statistic as extreme as the one calculated, if the null is true

Example: Module marks Null: The mean module mark is the same for students who attended a MASH workshop and those who did not If this is true, we would expect some test statistics to be negative and some positive just by chance

Statistical significance We say that our result is statistically significant if the p-value is less than some predefined level, referred to as the significance level (), usually set at 5% Small p-value = null unlikely to be true We cannot say that the null hypothesis is true, only that there is not enough evidence to reject it

Statistical significance Null: The mean module mark for students who attend MASH workshops is the same as for students who do not attend MASH workshops Alternative: The mean module mark for students who attend MASH workshops is the higher than for students who do not attend MASH workshop

Exercise 5: Statistical significance • The significance level is usually set at 5%, this is conventional rather than fixed – for stronger proof could use a level of 1% (0.01) • The smaller the p-value, the more confident we are with our decision to reject • The p-value for the test of a difference in module marks between students who do and do not attend a MASH workshop was 0.02. What would you conclude and how confident are you with your decision?

Example: Module marks • As p < 0.05, there is evidence to suggest that students who attend a MASH workshop do better in their statistics module than students who do not attend a workshop. As p = 0.02, there is a 2% chance of rejecting the null when it is true (i.e. 1 in 50) • What is the difference? • For the sample tested, those who attended a MASH workshop scored 4 percentage points higher than those who did not attend a workshop in their final module exam

Learning outcomes