Statistics

Statistics Chemistry December 17, 2007

Statistics • Statistics are a way to test that any differences in your data are a result of the variables you are testing and not a result of random chance. • There are different statistical tests that you can run. The test you need depends on the type of data that you collected. • The following slides will help you determine which statistical test you will need to run on your data.

StatisticsWhat does everything mean? • Null hypothesis: This is a statement that is the antithesis (opposite) of the hypothesis you are testing in your experiment. • Confidence interval: This is the amount of certainty that you are willing to accept in your experiment. For our purposes this should be set at 95%. In other words you want to be 95% sure that the difference in your data is caused by the variable you tested and only 5% sure that the differences are due to random errors. • The results of your statistical test will generate either a p-value or t value (others are possible but these are the 2 most common). Your p-value should be compared to the amount of error you set at the beginning of the experiment. For our purposes this means 5% or 0.05. The t value should be compared to a critical value (looked up on a table of values). • If the p-value is <= 0.05, the null hypothesis is rejected (the differences in the data are due to the tested variables). • The calculated t value must be larger than the critical value for their to be a significant difference between the data sets. • In your appendix you can list the raw data and you should include a box and whiskers plot.

Sample Statistics Section • Your statistics discussion should go in the results section of your paper. After you summarize your data in the results section, your statistics discussion may read… • …A one-way analysis of variance test was used to determine the significance between the cobalt(III) ion concentrations and the measured absorbance. Statistical analyses were calculated using StatView TMSE+ (Abacus Concepts, Inc., Berkeley, CA) statistical software. The ANOVA test returned a p-value = 0.023 indicating that there was significant difference between the data. Since the p-value was less than 0.05, the null hypothesis, Cobalt(III) ion concentrations do not affect absorbance, can be rejected. Bonferroni paired t-tests were performed to determine which Cobalt (III) ion concentrations produced significantly different results. The Bonferroni paired t-tests showed that there was a significant different between the 0.1 M and 0.001 M solutions only.

What Statistics Test Do I Use?

t-test You read of a survey that claims that the average teenager watches 25 hours of TV a week and you want to check whether or not this is true in your school (too simple a project!). • Predicted value of the variable: the predicted 25 hours of TV • Variable under study: actual hours of TV watched • Statistical test you would use: t-test • Use this test to compare the mean values (averages) of one set of data to a predicted mean value. • Back to “What Statistics Test Do I Use?”.

2 Sample t-test You grow 20 radish plants in pH=10.0 water and 20 plants in pH=3.0 water and measure the final mass of the leaves of the plants (too simple an experiment!) to see if they grew better in one fluid than in the other fluid. • Independent variable: pH of the fluid in which the plants were grown • Dependent variable: plant biomass • Statistical test you would use: 2-sample t-test • Use this test to compare the mean values (averages) of two sets of data. • A Mann-Whitney test is a 2-sample t-test that is run on data that are given rank numbers, rather than quantitative values. For example, You want to compare the overall letter-grade GPA of students in one class with the overall letter-grade GPA of students in another class. You rank the data from low to high according to the letter grade (here, A = 1, B = 2, C = 3, D = 4, E =5 might be your rankings; you could also have set A = 5, B = 4, ...). • Back to “What Statistics Test Do I Use?”.

Matched Pairs t-test You give a math test to a group of students. Afterwards you tell ? of the students a method of analyzing the problems, then re-test all the students to see if use of the method led to improved test scores. • Independent variable: test-taking method (your method vs. no imparted method) • Dependent variable: (test scores after method - test scores before method) • Statistical test you would use: matched-pairs t-test • Use this test to compare data from the same subjects under two different conditions. • Back to “What Statistics Test Do I Use?”.

ANalysis Of VAriance You grow radish plants given pesticide-free water every other day, radish plants given a 5% pesticide solution every other day, and radish plants given a 10% pesticide solution every other day, then measure the biomass of the plants after 30 days to find whether there was any difference in plant growth among the three groups of plants. • Independent variable: pesticide dilution • Dependent variable: plant biomass • Statistical test you would use: ANOVA • Use this test to compare the mean values (averages) of more than two sets of datawhere there is more than one independent variable but only one dependent variable. If you find that your data differ significantly, this says only that at least two of the data sets differ from one another, not that all of your tested data sets differ from one another. • If your ANOVA test indicates that there is a statistical difference in your data, you should also run Bonferroni paired t-tests to see which independent variables produce significantly different results. This test essentially penalizes you more and more as you add more and more independent variables, making it more difficult to reject the null hypothesis than if you had tested fewer independent variables. • One assumption in the ANOVA test is that your data are normally-distributed (plot as a bell curve, approximately). If this is not true, you must use the Kruskall-Wallis test below. • Back to “What Statistics Test Do I Use?”.

Kruskal-Wallis Test You ask children, teens, and adults to rate their response to a set of statements, where 1 = strongly agree with the statement, 2 = agree with the statement, 3 = no opinion, 4 = disagree with the statement, 5 = strongly disagree with the statement, and you want to see if the answers are dependent on the age group of the tested subjects. • Independent variables: age groups of subject • Dependent variable: responses of members of those age groups to your statements • Statistical test you would use: Kruskal-Wallis Test. Use this test to compare the mean values (averages) of more than two sets of data where the data are chosen from some limited set of values or if your data otherwise don't form a normal (bell-curve) distribution. This example could also be done using a two-way chi-square test. An example of the Kruskal-Wallis Test for non-normal data is: You compare scores of students on Math and English tests under different sicrumstances: no music playing, Mozart playing, rock musing playing. When you score the tests, you find in at least one case that the average.score is a 95 and the data do not form a bell-shaped curve because there are no scores above 100, many scores in the 90s, a few in the 80s, and fewer still in the 70s, for example. • Independent variables: type of background music • Dependent variable: score on the tests , with at least one set of scores not normally-distributed • Back to “What Statistics Test Do I Use?”.

Wilcoxon Signed Rank Test You think that student grades are dependent on the number of hours a week students study. You collect letter grades from students and the number of hours each student studies in a week. • Independent variables: hours studied • Dependent variable: letter grade in a specific class • Statistical test you would use: Wilcoxon Signed Rank Test. Use this test to compare the mean values (averages) of two sets of data, or the mean value of one data set to a hypothetical mean, where the data are ranked from low to high (here, A = 1, B = 2, C = 3, D = 4, E =5 might be your rankings; you could also have set A = 5, B = 4, ...). • Back to “What Statistics Test Do I Use?”.

Chi-Square Test You ask subjects to rate their response to a set of statements that are provided with a set of possible responses such as: strongly agree with the statement, agree with the statement, no opinion, disagree with the statement, strongly disagree with the statement. • Independent variable: each statement asked • Dependent variable: response to each statement • Statistical test you would use: x2 (chi-square) test(the 'chi' is pronounced like the 'chi' in 'chiropracter') for within-age-group variations. • For this test, typically, you assume that all choices are equally likely and test to find whether this assumption was true. You would assume that, for 50 subjects tested, 10 chose each of the five options listed in the example above. In this case, your observed values (O) would be the number of subjects who chose each response, and your expected values (E) would be 10. • The chi-square statistic is the sum of: (Observed value -Expected value)2 / Expected value • Use this test when your data consist of a limited number of possible values that your data can have. Example 2: you ask subjects which toy they like best from a group of toys that are identical except that they come in several different colors. Independent variable: toy color; dependent variable: toy choice. • McNemar's test is used when you are comparing some aspect of the subject with that subject's response (i.e., answer to the survey compared to whether or not the student went to a particular middle school). McNemar's test is basically the same as a chi-square test in calculation and interpretation. • Back to “What Statistics Test Do I Use?”.

Correlation Test You look for a relationship between the size of a letter that a subject can read at a distance of 5 meters and the score that the subject achieves in a game of darts (having had them write down their experience previously at playing darts). • Independent variable #1: vision-test result (letter size) • Independent variable #2: darts score • Statistical test you would use: Correlation (statistics: r2 and r). The closer the values are to 1 the better the correlation. • Use this statistic to identify whether changes in one independent variable are matched by changes in a second independent variable. Notice that you didn't change any conditions of the test, you only made two separate sets of measurement. • Back to “What Statistics Test Do I Use?”.

Linear Regression You load weights on four different lengths of the same type and cross-sectional area of wood to see if the maximum weight a piece of the wood can hold is directly dependent on the length of the wood. • Independent variable: length of wood • Dependent variable: weight that causes the wood to break • Statistical test you would use: Linear regression (statistics: r2 and r)The closer the values are to 1 the better the correlation. • Fit a line to data having only one independent variable and one dependent variable. • Back to “What Statistics Test Do I Use?”.

Multiple Linear Regression You load weights on four different lengths and four different thicknesses of the same type of wood to see if the maximum weight a piece of the wood can hold is directly dependent on the length and thickness of the wood, and to find which is more important, length or weight. • Independent variables: length of wood, weight of wood • Dependent variable: weight that causes the wood to break • Statistical test you would use: Multiple Linear regression (statistics: r2 and r)The closer the values are to 1 the better the correlation. • Fit a line to data having two or more independent variables and one dependent variable. • Back to “What Statistics Test Do I Use?”.

Power Regression You load weights on strips of plastic trash bags to find how much the plastic stretches from each weight. Research that you do indicates that plastics stretch more and more as the weight placed on them increases; therefore the data do not plot along a straight line. • Independent variables: weight loaded on the plastic strip • Dependent variable: length of the plastic strip • Statistical test you would use: Power regression of the form y = axb , or Exponential regression of the form y = abx , or Quadratic regression of the form y = a + bx + cx2 (statistics: r2 and r) • Fit a curve to data having only one independent variable and one dependent variable. • There are numerous polynomial regressions of this form, found on the STAT:CALC menu of your graphing calculator. • Back to “What Statistics Test Do I Use?”.

Statistics

Statistics

Presentation Transcript

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics - Descriptive statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics on Statistics.

Social Statistics: Inferential Statistics

Statistics 1: Elementary Statistics

Mathematics & Statistics Statistics

Statistics 300: Elementary Statistics

Statistics South Africa Official statistics; Statistics Act

Statistics

Statistics

Statistics

Presentation Transcript

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics - Descriptive statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics on Statistics.

Social Statistics: Inferential Statistics

Statistics 1: Elementary Statistics

Mathematics &amp; Statistics Statistics

Statistics 300: Elementary Statistics

Statistics South Africa Official statistics; Statistics Act

Statistics

Mathematics & Statistics Statistics