410 likes | 423 Views
Learn useful approaches for analyzing data and recognizing different data types. Use flowcharts to determine analysis methods and perform basic analyses with appropriate charts.
E N D
Choosing the right testMathematics & Statistics HelpUniversity of Sheffield
Learning outcomes • By the end of this session you should know about: • Some useful approaches to analysing data • By the end of this session you should be able to: • Recognise different data types • Use a flowchart to decide which analysis method to use • Undertake some basic analyses and construct appropriate charts for your data
Planning a study • What do you want to investigate and why? What are your aims? • How are you going to investigate it? • How will you collect your data? • Who/what is in the sample? • How will you summarise your data? • How will you analyse your data?
Steps for choosing the right test (1) • Clearly define your research question • What is your main outcome of interest? There may be more than one. • What data type is it? The data type will determine the type of analysis • Are the observations paired? • Can it be characterised using a known distribution (i.e. parametric vs non-parametric test)? • What may affect the outcome of interest? • What data type is it/are they? • How will your results be summarised? • What charts can you use to display your results?
Chart types: recap • One variable • Categorical: Pie chart, barchart • Numerical discrete: barchart • Numerical continuous: histogram, boxplots • Two variables • Both categorical: stacked barchart, clustered barchart, multiple pie charts • One categorical / one numerical discrete: boxplots (sometimes!), multiple barcharts • One categorical / one numerical continuous: boxplots, multiple histograms • Both numerical: scatterplot
Steps for choosing the right test (2) • Are you interested: • Testing differences between groups. How many groups are there? • Assessing/modelling the relationship between variables • Are the observations paired? • Is the pairing due to having repeated measurements of the same variable for each subject? • Does the test you have chosen make any assumptions? Are the assumptions met? e.g. assumption of normality for t-test
Test assumptions Generally assume data or some function of the data follows a known distribution e.g. normal • Parametric tests: • Non-parametric: Nonparametric techniques are usually based on ranks/signs rather than actual data
Non-parametric methods are used when: • Dependent variable is ordinal • A plot of the data appears to be very skewed or the data do not seem to follow any particular shape or distribution (e.g. Normal) • Assumptions underlying parametric test not met • There are potentially influential outliers in the dataset • Sample size is small
Comparing averages (1) Independent sample t-test Normally distributed Skewed or ordinal 2 Comparing BETWEEN groups Mann-Whitney One way ANOVA 3+ Kruskall-Wallis
Paired data (1) • Most commonly, measurements from the same individuals collected on more than one occasion • Can be used to look at differences in mean score: • 2 or more time points e.g. before/after a diet • 2 or more conditions e.g. hearing test at different frequencies Each person listened to a sound until they could no longer hear it at three different frequencies. Would use Repeated measures ANOVA to test for a difference between the frequencies.
Comparing averages (1) Independent sample t-test Normally distributed Skewed or ordinal 2 Comparing BETWEEN groups Mann-Whitney One way ANOVA 3+ Kruskall-Wallis Paired t-test 2 Wilcoxon signed rank test Comparing measurements WITHIN the same subject Repeated measures ANOVA Friedman 3+
Example 1: Did gender affect ticket price paid on the Titanic? Steps: • What is the outcome variable? • What is the grouping / explanatory variable? • What methods are available to analyse these data? • Check the assumptions • Conduct the appropriate analysis and report the results What test do you think would be appropriate?
Example 1: Did gender affect ticket price paid on the Titanic? Steps: • What is the outcome variable? Ticket price • What is the grouping / explanatory variable? Gender • What methods are available to analyse these data? Comparing ticket price between two groups (male and female). Most appropriate method is independent samples t-test • Check the assumptions. Assumes that the groups are independent, the data in the two groups are normally distributed and the variability in the two groups is similar. • Conduct the appropriate analysis and report the results. If the assumptions for the t-test are not met, use the Mann-Whitney U test
Example 1: Did gender affect ticket price paid on the Titanic? • Data were positively skewed • A Mann-Whitney U test was carried out to compare the ticket price for men and women • There was highly significant evidence (U=5.5, p < 0.001) to suggest a difference in the distributions of ticket price for male and females What else would be useful to know when interpreting these results? Medians: women £23 vs men £12
Example 2: two categorical variables Survival of the pushiest?
Example 2: Survival of the pushiest Research question: Was survival on the titanic linked to nationality? Dependent: Survival Independent: Nationality What test do you think you should use? • Chi-squared test http://www.independent.co.uk/news/world/australasia/more-britons-than-americans-died-on-titanic-because-they-queued-1452299.html
Example 2: Survival of the pushiest • The data suggests that Americans were more likely to survive as 56% survived compared to 32% of British and 35% of those from other countries • Results from the χ2 test suggest, that there is evidence of a significant relationship between nationality and survival (p < 0.001)
Example 2: Further thoughts • Class was one of the most important predictors of survival on the Titanic • 70% of Americans were travelling in 1st class • A more detailed analysis, using logistic regression showed that nationality was NOT a significant predictor of survival after controlling for class In looking at these data is there any other information that would be useful? The numbers for each nationality
Learning outcomes • You should now know about: • Some useful approaches to analysing data • By the end of this session you should be able to: • Recognise different data types • Use a flowchart to decide which analysis method to use • Undertake some basic analyses and construct appropriate charts for your data
Exercises • Attempt the 4 exercises in SPSS • In each case you need to identify an appropriate analysis based on the dataset provided • Remember to check the assumptions for any analysis you conduct • Add value labels to the data if required • Use the flow charts & table to assist you
Download the data In your web browser, type in the following address and save the files to your computer: http://www.sheffield.ac.uk/mash/workshop_materials
Maths And Statistics Help Statistics appointments: Mon-Fri (10am-1pm) Statistics drop-in: Mon-Fri (10am-1pm), Weds (4-7pm) http://www.sheffield.ac.uk/mash
Resources: All resources are available in paper form at MASH or on the MASH website
Contacts Follow MASH on twitter: @mash_uos