Laboratory for Interdisciplinary Statistical Analysis Anne Ryan email@example.com Virginia Tech
1948: The Statistical Laboratory was founded as a division of the Virginia Agricultural Experiment Station to help agronomists design experiments and calculate sums of squares.
1949: Based on the success of the Statistical Laboratory, the Department of Statistics at Virginia Polytechnic Institute (VPI) was founded—the 3rd oldest statistics department in the United States.
1973: The Statistical Laboratory was re-formed as the Statistical Consulting Center to assist with statistical analyses in every college of Virginia Polytechnic Institute & State University (VPI&SU).
2007: The Graduate Student Assembly led a movement to save statistical consulting and collaboration from death by budget cuts, ensuring that graduate students could receive help with their research. The College of Science, Provost, Vice President of Research, Graduate School, and six additional colleges agreed that researchers should be able to receive free statistical consulting and collaboration.
2008: The Statistical Consulting Center was re-organized as the Laboratory for Interdisciplinary Statistical Analysis (LISA) to collaborate with researchers across the Virginia Tech (VT) campuses.
Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...) Our goal is to improve the quality of research and the use of statistics at Virginia Tech. www.lisa.stat.vt.edu www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics Collaboration LISA statisticians meet with faculty, staff, and graduate students to understand their research and think of ways to help them using statistics. www.lisa.stat.vt.edu www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics • Walk-In Consulting Collaboration Every day from 1-3PM clients get answers to their (quick) questions about using statistics in their research. www.lisa.stat.vt.edu www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics • Walk-In Consulting Collaboration Short Courses are designed to teach graduate students how to apply statistics in their research. Short Courses www.lisa.stat.vt.edu www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics • Walk-In Consulting Collaboration All services are FREE for VT researchers. We assist with research—not class projects or homework. Short Courses www.lisa.stat.vt.edu www.lisa.stat.vt.edu
How can LISA help? • Formulate research question. • Screen data for integrity and unusual observations. • Implement graphical techniques to showcase the data – what is the story? • Develop and implement an analysis plan to address research question. • Help interpret results. • Communicate! Help with writing the report or giving the talk. • Identify future research directions.
To request a collaboration meeting go to www.lisa.stat.vt.edu
To request a collaboration meeting go to www.lisa.stat.vt.edu • 1. Sign in to the website using your VT PID and password. • 2. Enter your information (email address, college, etc.) • 3. Describe your project (project title, research goals, • specific research questions, if you have already collected data, special requests, etc.) • 4. Wait 0-3 days, then contact the LISA collaborators • assigned to your project to schedule an initial meeting.
Introduction to R • R is a free software environment for statistical computing and graphics. Download: http://www.r-project.org/ • Topics Covered: • Data objects in R, loops, import/export datasets, data manipulation • Graphing • Basic Analyses: T-tests, Regression, ANOVA
Linear Regression & Structural Equation Monitoring • Linear regression is used to model the relationship between a continuous response and a continuous predictor. • SEM is a modeling technique that investigates causal relationships among variables. • Time –related latent variables, modification indices and critical ratio in exploratory analyses, and computation of implied moments, factor score weights, total effects, and indirect effects.
Generalized Linear Models • Modeling technique for situations where the errors are not necessarily normal. • Can handle situations where you have binary responses, counts, etc. • Uses a link function to relate the response to the linear model. • Cover: Basic statistical concepts of GLM and how it relates to regression using normal errors.
Mixed Models and Random Effects • Mixed Model: A statistical model that has both random effects and fixed effects. • Fixed Effect: Levels of the factor are predetermined. Random Effect: Levels of the factor were chosen at random. • The primary focus of the course will be to identify scenarios where a mixed model approach will be appropriate. The concepts will be explained almost wholly through examples in SAS or in R.
T-Tests and Analysis of Variance Anne Ryan
Criminal Trial • Defense: • Prosecution: • What’s the Assumed Conclusion? Represent the accused (defendant) Hold the “Burden of Proof”—obligation to shift the assumed conclusion from an oppositional opinion to one’s own position through evidence • ANSWER: The accused is innocent until proven guilty. • Prosecution must convince the judge/jury that the defendant is guilty beyond a reasonable doubt
Similarities between Criminal Trials and Hypothesis Testing Burden of Proof—Obligation to shift the conclusion using evidence Hypothesis Test Trial Accept the status quo (what is believed before) until the data suggests otherwise Innocent until proven guilty
Similarities between Criminal Trials and Hypothesis Testing Decision Criteria Hypothesis Test Trial Occurs by chance less than 100α% of the time (ex: 5%) Evidence has to convincing beyond a reasonable
Introduction to Hypothesis Testing • Hypothesis Test: Procedure for examining a claim about the value of a parameter • i.e. • Hypothesis tests are very methodical with several key pieces.
Steps in a Hypothesis Test • Test • Assumptions • Hypotheses • Mechanics • Conclusion
1. Test • State the name of the testing method to be used • It is important to not be off track in the very beginning • Hypothesis Tests we will Perform: • One Sample t test for μ • Two sample t test for μ • Paired t test • ANOVA
2. Assumptions • List all the assumptions required for your test to be valid. • All tests have assumptions • Even if assumptions are not met you should still comment on how this affects your results.
3. Hypotheses • State the hypothesis of interest • There are two hypotheses • Null Hypothesis: Denoted • Alternative Hypothesis: Denoted • Examples of possible hypotheses:
3. Hypotheses Continued • For hypothesis testing there are three popular versions of testing • Left Tailed Hypothesis Test • Right Tailed Hypothesis Test • Two Tailed or Two Sided Hypothesis Test
3. Hypotheses Continued • Left Tailed Hypothesis Test: • Researchers are only interested in whether the true value is below the hypothesized value. • e.g— • Right Tailed Hypothesis Test: • Researchers are only interested in whether the True Value is above the hypothesized value. • e.g.–
3. Hypotheses Continued • Two Tailed or Two Sided Hypothesis Test: The researcher is interested in looking above and below they hypothesized value.
3. Hypotheses Continued • Three Requirements for Stating Hypotheses: • Two complementary hypotheses. • A parameter about which the test is to be based • e.g.—μ • Hypothesized Value for parameter • Denoted but generally takes on numeric values in practice
4. Mechanics • Computational Part of the Test • What is part of the Mechanics step? • Stating the Significance Level • Finding the Rejection Rule • Computing the Test Statistic • Computing the p-value
4. Mechanics Continued • Significance Level: Here we choose a value to use as the significance level, which is the level at which we are willing to start rejecting the null hypothesis. • Denoted by α • Default value is α=.05, use α=.05 unless otherwise noted!
4. Mechanics Continued • Rejection Rule: State our criteria for rejecting the null hypothesis. • “Reject the null hypothesis if p-value<.05”. • p-value: The probability of obtaining a point estimate as “extreme” as the current value where the definition of “extreme” is taken from the alternative hypotheses assuming the null hypothesis is true.
4. Mechanics Continued • Test Statistic: Compute the test statistic, which is usually a standardization of your point estimate. • Translates your point estimate, a statistic, to follow a known distribution so that is can be used for a test.
4. Mechanics Continued • p-value: After computing the test statistic, now you can compute the p-value. • Use software to compute p-values.
5. Conclusion • Conclusion: Last step of the hypothesis test just like it is the last step when computing confidence intervals. • Conclusions should always include: • Decision:reject or fail to reject • Linkage:why you made the decision (interpret p-value) • Context: what your decision means in context of the problem.
5. Conclusion • Note: Your decision can only be one of two choices: • Reject --data gives strong indication that is more likely • Fail to Reject --data gives no strong indication that is more likely • When conducting hypothesis tests, we assume that is true, therefore the decision CAN NOT be to accept the null hypothesis
One Sample T-Test http://office.microsoft.com/en-us/images Used to test whether the population mean is different from a specified value. Example: Is the mean height of 12 year old girls greater than 60 inches?
Step 1: Formulate the Hypotheses • The population mean is not equal to a specified value. Null Hypothesis, H0: μ = μ0 Alternative Hypothesis: Ha: μ ≠ μ0 • The population mean is greater than a specified value. H0: μ = μ0 Ha: μ > μ0 • The population mean is less than a specified value. H0: μ = μ0 Ha: μ < μ0
Step 2: Check the Assumptions The sample is random. The population from which the sample is drawn is either normal or the sample size is large.
Steps 3-5 • Step 3: Calculate the test statistic: Where • Step 4: Calculate the p-value based on the appropriate alternative hypothesis. • Step 5: Write a conclusion.
Iris Example http://en.wikipedia.org/wiki/Iris_flower_data_set • A researcher would like to know whether the mean sepal width of a variety of irises is different from 3.5 cm. Use . • The researcher randomly selects 50 irises and measures the sepal width. • Step 1: Hypotheses H0: μ = 3.5 cm Ha: μ ≠ 3.5 cm
JMP Steps 2-4: JMP Demonstration Analyze Distribution Y, Columns: Sepal Width Normal Quantile Plot Test Mean Specify Hypothesized Mean: 3.5
JMP Output • Step 5 Conclusion: Fail to reject since the p-value=0.1854 is greater than 0.05. There is significant sample evidence to indicate that the mean sepal width is not different from 3.5 cm.