Analysing continuous data Parametric versus Non-parametric methods

Analysing continuous dataParametric versus Non-parametric methods Scott HarrisOctober 2009

Learning outcomes By the end of this session you should be able to choose between, perform (using SPSS) and interpret the results from (Parametric / Non-parametric equivalent): • One sample t test / Sign test or Wilcoxon signed ranks test. • Independent Samples t test / Mann-Whitney U test. • Paired samples t test / Wilcoxon signed ranks test.

Contents • Introduction • Refresher - types of data. • Data requirements. • The difference between parametric and non-parametric methods. • The example dataset: CISR data. • One sample versus a fixed value (Parametric and non-parametric equivalent: P/NP) • Test information. • ‘How to’ in SPSS.

Contents • Comparison of two Paired samples (P/NP) • Test information. • ‘How to’ in SPSS. • Comparison of two Independent groups (P/NP) • Test information. • ‘How to’ in SPSS.

Refresher: Types of data • Quantitative – a measured quantity. • Continuous – Measurements from a continuous scale: Height, weight, age. • Discrete – Count data: Children in a family, number of days in hospital. • Qualitative – Assessing a quality. • Ordinal – An order to the data: Likert scale (much worse, worse, the same, better, much better), age group (18-25, 26-30…). • Categorical / Nominal – Simple categories: Blood group (O, A, B, AB). A special case is binary data (two levels): Status (Alive, dead), Infection (yes, no).

Data requirements • The Statistical tests that will be covered in this session compare a sample with a continuous outcome against either: • a published or hypothesised value, • a repeated sample from the same individual or • a sample from another group. • A different type of test is used in each of the situations above.

Test requirements: t test • A continuous outcome variable. • Approximately ‘Normally’ distributed.

Skewed distributions Negative skew Positive skew

Skewed distributions… With an obviously skewed distribution such as either of those on the previous slide you have two options open to you: • Make a transformation of the data (such as taking the log) to try to remove the skew and make the data more normal. • Make use of the equivalent non-parametric test.

Parametric and non-parametric methods Parametric methods are based around the assumption of normality as they make use of two parameters to explain the underlying distribution. These parameters are: • The mean (explaining the average result) and • The standard deviation (explaining the variability in the data). Non-parametric methods are, as their name suggests, methods that are not based around the use of any parameters to summarise any underlying distribution. It is for this reason that these non-parametric tests are sometimes referred to as ‘distribution free’ tests. These tests can be used for ordinal variables, even with only a few levels.

Example dataset: Information CISR (Clinical Interview Schedule: Revised) data: • Measure of depression – the higher the score the worse the depression. • A CISR value of 12 or greater is used to indicate a clinical case of depression. • 3 groups of patients (each receiving a different form of treatment: GP, CMHN and CMHN problem solving). • Data collected at two time points (baseline and then a follow-up visit 6 months later).

Example CISR dataset: Raw data

Example CISR dataset: Labelled data

Refresher: Hypothesis testing • First you have to set up a null (H0) and alternative (H1) hypothesis. You then calculate the specific test value for the sample. This is then compared to a critical cut-off value, that can come from published Statistical tables or can be left to the computer. • If the calculated value exceeds the cut-off then the null hypothesis is rejected and the alternative hypothesis is accepted (accept H1). If the calculated value is smaller than the cut-off then there is insufficient evidence against the null hypothesis (do not reject H0).

Comparing one sample against a specific hypothesised value One sample t test or Sign test / Wilcoxon signed ranks test

Normally distributed data One sample t test

One sample t test: hypotheses H0 is the default hypothesis (null). We are testing whether we have enough evidence to reject this and instead accept the alternative hypothesis (H1). Our null hypothesis is that the sample mean is equal to a certain value. The alternative is that it isn’t.

Theory: One sample t test The following equation is used to calculate the value of t: Where: = Sample mean = Hypothesised mean (or test mean) = Sample standard deviation = Size of sample This is distributed with (n-1) degrees of freedom.

Theory: One sample t test… • The t value is then compared against the appropriate critical value from Statistical tables: • Significant evidence against H0 if the absolute value of t is greater than the two-sided 5% significance (0.05) value, for the appropriate d.f.

Checking the distribution of B0SCORE Graphs Legacy Dialogs  Histogram… * Checking the distribution of B0SCORE . GRAPH /HISTOGRAM(NORMAL)=B0SCORE /TITLE= 'Histogram of baseline CISR score'.

Info: Histograms in SPSS • From the menus select ‘Graphs’  ‘Legacy Dialogs’ ‘Histogram…’. • Put the variable that you want to draw the histogram for into the ‘Variable:’ box. • Tick the option to ‘Display normal curve’. • Click the ‘Titles’ button to enter any titles and then click the ‘Continue’ button. • If you want separate histograms for each level of another category variable then you can either: • Add the categorical variable into the ‘Panel by’  ‘Rows’ box or • Make use of the ‘Split file…’ command and draw the histogram as normal. This will produce a separate full size plot for each level of the categorical variable. • Finally click ‘OK’ to produce the histogram(s) or ‘Paste’ to add the syntax for this into your syntax file.

The distribution of baseline CISR

The distribution?

SPSS – One sample t test Analyze  Compare Means  One-Sample T Test… * One sample t test vs. a value of 12 . T-TEST /TESTVAL = 12 /MISSING = ANALYSIS /VARIABLES = B0SCORE /CRITERIA = CI(.95) .

Info: One sample t test in SPSS • From the menus select ‘Analyze’  ‘Compare Means’  ‘One-Sample T Test…’. • Put the variable that you want to test into the ‘Test Variable(s):’ box. • Put the value that you want to test against into the ‘Test Value:’ box. • Finally click ‘OK’ to produce the one sample t test or ‘Paste’ to add the syntax for this into your syntax file.

SPSS – One sample t test: Output Observed summary statistics 95% confidence interval for the true mean difference mean difference 2 sided p value with an alternative hypothesis of non-equality. Highly significant (P<0.001) hence significant evidence against the mean being 12.

Practical Questions Analysing Continuous Data Questions 1 and 2

Practical Questions From the course webpage download the file HbA1c.sav by clicking the right mouse button on the file name and selecting Save Target As. The dataset is pre-labelled and contains data on Blood sugar reduction for 245 patients divided into 3 groups. • Produce Histograms for the reduction in blood sugar (HBA1CRED), both combined across the three treatment groups and split separately by treatment group. Do you think that the data follow a normal distribution?

Practical Questions • Assuming that the outcome variable is normally distributed conduct a suitable statistical test to compare the starting HbA1c level (HBA1C_1) against a value of 7 (the level below which good control of glucose levels is accepted). What are the key statistics that should be reported and what are your conclusions from this test?

Practical Solutions • To produce the overall histogram you can use the options exactly as given. This results in the following syntax: To split the graphs by treatment group, the easiest way is to add the group variable to the Panel By:  Rows box. This results in the following syntax: * Producing the overall histogram . GRAPH /HISTOGRAM(NORMAL)=HBA1CRED /TITLE= 'Overall Histogram of blood sugar reduction'. * Producing the histograms split by group . GRAPH /HISTOGRAM(NORMAL)=HBA1CRED /PANEL ROWVAR=GROUP ROWOP=CROSS /TITLE= 'Histograms of blood sugar reduction, by group'.

Practical Solutions • The histograms do not look highly skewed (although the individual group histograms show some skew) and the combined histogram actually appears to follow a normal distribution very well. There may be an outlier in the Active A group.

Practical Solutions • Along with the observed mean difference, its confidence interval and the p value should be reported. The mean difference between the starting HbA1c level and 7 is 0.13 (95% CI: -0.06, 0.32). The starting HbA1c level is not statistically significantly different from 7 (p=0.168). (NOTE: The CI also includes 0)

Non-normally distributed data Sign test / Wilcoxon signed ranks test

Theory: Sign Test • The simplest non-parametric test. • For each subject, subtract the value you are testing against from each of the observed values, writing down the sign of the difference. (That is write “-” if the difference score is negative, and “+” if it is positive.) • If the groups are the same then we should have equal numbers of “+” and “-”. If we get more of one than the other then we start to build evidence against the group being the same as the test value.

Sign test: Example

Theory: Wilcoxon signed rank test (WSRT) The Sign test ignores all information about magnitude of difference. WSRT looks at the sign of the difference and also the magnitude. • Calculate the difference between the observed value and the test value. • Rank the differences in order from smallest to largest, ignoring the sign of the values. • Assign rank values to the numbers. 1 for the smallest all the way up to n for the largest. • Add up the ranks for the positive differences and then for the negative differences. • If the sample is the same as the test value then we should have equal sums of ranks for the positive and negative differences. If one is higher than the other then we start to build evidence against the sample being the same as the test value.

Wilcoxon signed rank test : Example 2. 3. (When there are ties you apply the average rank to all tied scores.) (Sum: For this section of data): Positive: 1.5 + 8 = 9.5 Negative: 1.5 + 3 + 4 + 5 + 6 + 7 = 26.5 4. + 5. 1.

Checking the distribution of M6SCORE * Checking the distribution of M6SCORE . GRAPH /HISTOGRAM(NORMAL)=M6SCORE /TITLE= 'Histogram of 6 month CISR scores'. Graphs  Legacy Dialogs  Histogram…

The distribution of 6 month CISR

SPSS – Setting up the test value Transform  Compute Variable… * Setting up the test value . COMPUTE TESTVALUE = 12 . EXECUTE .

Info: Creating new variables in SPSS • From the menus select ‘Transform’  ‘Compute Variable…’. • Enter the name of the new variable that you want to create into the ‘Target Variable:’ box. • Enter the formula for the new variable into the ‘Numeric Expression’ box. • In this case we want to create a variable that just contains a constant value, so we just enter that value into the ‘Numeric Expression’ box. • Finally click ‘OK’ to produce the new variable or ‘Paste’ to add the syntax for this into your syntax file.

SPSS – Sign & Wilcoxon signed ranks tests Analyze  Nonparametric Tests  2 Related Samples…

SPSS – Sign & Wilcoxon signed ranks tests * One sample sign test vs. a value of 12 . NPAR TEST /SIGN= M6SCORE WITH TESTVALUE (PAIRED) /STATISTICS DESCRIPTIVES QUARTILES /MISSING ANALYSIS. Sign Test * One sample Wilcoxon signed ranks test vs. a value of 12 . NPAR TEST /WILCOXON=M6SCORE WITH TESTVALUE (PAIRED) /STATISTICS DESCRIPTIVES QUARTILES /MISSING ANALYSIS. Wilcoxon signed ranks Test

Info: Sign & Wilcoxon signed ranks tests in SPSS • From the menus select ‘Analyze’  ‘Nonparametric Tests’  ‘2 Related Samples…’. • Click the two variables that you want to test from the list on the left. After you click the first variable hold down the Ctrl key to select the 2nd. • In this case we select the variable that we want to compare, as well as the newly created constant variable. • Click the button to move this pair of variables into the ‘Test Pairs:’ box. • Ensure that the test(s) that you want to conduct are ticked in the ‘Test Type’ box. • Click the ‘Options’ button and then select ‘Descriptive’ and ‘Quartiles’ from the ‘Statistics’ box. • Finally click ‘OK’ to produce the selected test(s) or ‘Paste’ to add the syntax for this into your syntax file.

SPSS – Sign test: Output Observed numbers of positive, negative and tied values. You would expect half positive and half negative if there was no difference. 2 sided p value with an alternative hypothesis of non-equality. Highly significant (P=0.001) hence strong evidence against the sample median being 12.

SPSS – Wilcoxon signed ranks test: Output Observed sums of ranks (and mean rank) for the positive and negative values. 2 sided p value with an alternative hypothesis of non-equality. Just significant (P=0.047) hence significant evidence against the sample median being 12.

Practical Questions Analysing Continuous Data Question 3

Practical Questions • Assuming that the outcome variable is NOT normally distributed conduct a suitable statistical test to compare the starting HbA1c level (HBA1C_1) against a value of 7. What are the key statistics that should be reported and what are your conclusions from this test?

Practical Solutions • For the non-parametric test there is only a p value to report from the test (although the median could be reported from elsewhere and a CI could be calculated from CIA). The starting HbA1c level is not statistically significantly different from 7 (p=0.253).

Comparing two sets of paired observations Paired samples t test or Wilcoxon signed ranks test

Analysing continuous data Parametric versus Non-parametric methods