1 / 24

Statistics Lecture 18 Final Review

This is a comprehensive review of the topics covered in Stat 111 Lecture 18, including collecting data, exploring data, probability, sampling distributions, and inference. It also covers experiments, sampling and surveys, different types of graphs, measures of center and spread, relationships between continuous variables, probability, the normal distribution, and inference using samples.

benk
Download Presentation

Statistics Lecture 18 Final Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics 111 - Lecture 18 Final review Stat 111 - Lecture 18 - Review

  2. Administrative Notes • Pass in homework • Final exam on Thursday • Office hours today from 12:30-2:00pm Stat 111 - Lecture 18 - Review

  3. Final Exam • Tomorrow from 10:40-12:10 • Show up a bit early, if you arrive after 10:40 you will be wasting time for your exam • Calculators are definitely needed! • Single 8.5 x 11 cheat sheet (two-sided) allowed • Sample midterm is posted on the website • Solutions for the midterm are posted on the website Stat 111 - Lecture 18 - Review

  4. Final Exam • The test is going to be long – 6 or 7 questions • Pace yourself • Do NOT leave anything blank! • I want to give you credit for what you know give me an excuse to give you points! • The questions are NOT in order of difficulty • If you get stuck on a question put it on hold and try a different question Stat 111 - Lecture 18 - Review

  5. Outline • Collecting Data (Chapter 3) • Exploring Data - One variable (Chapter 1) • Exploring Data - Two variables (Chapter 2) • Probability (Chapter 4) • Sampling Distributions (Chapter 5) • Introduction to Inference (Chapter 6) • Inference for Population Means (Chapter 7) • Inference for Population Proportions (Chapter 8) • Inference for Regression (Chapter 10) Stat 111 - Lecture 18 - Review

  6. Experiments Treatment Group Treatment Experimental Units Population Control Group No Treatment • Try to establish the causal effect of a treatment • Key is reducing presence of confounding variables • Matching: ensure treatment/control groups are very similar on observed variables eg. race, gender, age • Randomization: randomly dividing into treatment or control leads to groups that are similar on observed and unobserved confounding variables • Double-Blinding: both subjects and evaluators don’t know who is in treatment group vs. control group Stat 111 - Lecture 18 - Review

  7. Sampling and Surveys ? Population Parameter • Just like in experiments, we must be cautious of potential sources of bias in our sampling results • Voluntary response samples, undercoverage, non-response, untrue-response, wording of questions • Simple Random Sampling: less biased since each individual in the population has an equal chance of being included in the sample Sampling Inference Estimation Sample Statistic Stat 111 - Lecture 18 - Review

  8. Different Types of Graphs • A distribution describes what values a variable takes and how frequently these values occur • Boxplots are good for center,spread, and outliers but don’t indicate shape of a distribution • Histograms much more effective at displaying the shape of a distribution Stat 111 - Lecture 18 - Review

  9. Measures of Center and Spread • Center: Mean • Spread: Standard Deviation • For outliers or asymmetry, median/IQR are better • Center: Median - “middle number in distribution” • Spread: Inter-Quartile Range IQR = Q3 - Q1 • We use mean and SD more since most distributions we use are symmetric with no outliers(eg. Normal) Stat 111 - Lecture 18 - Review

  10. Relationships between continuous var. • Scatterplot examines relationship between response variable (Y) and a explanatory variable (X): • Positive vs. negativeassociations • Correlation is a measure of the strength of linear relationship between variables X and Y • r near 1 or -1 means strong linear relationship • r near 0 means weak linear relationship • Linear Regression: come back to later… Education and Mortality: r = -0.51 Stat 111 - Lecture 18 - Review

  11. Probability • Random process: outcome not known exactly, but have probability distribution of possible outcomes • Event: outcome of random process with prob. P(A) • Probability calculations: combinations of rules • Equally likely outcomes rule • Complement rule • Additive rule for disjoint events • Multiplication rule for independent events • Random variable: a numerical outcome or summary of a random process • Discrete r.v. has a finite number of distinct values • Continuous r.v. has a non-countable number of values • Linear transformations of variables Stat 111 - Lecture 18 - Review

  12. The Normal Distribution N(0,1) N(2,1) N(-1,2) N(0,2) • The Normal distribution has center  and spread 2 • Have tables for any probability from the standard normal distribution ( = 0 and 2 = 1) • Standardization: converting X which has a N(,2) distribution to Z which has a N(0,1) distribution: • Reverse standardization: converting a standard normal Z into a non-standard normal X Stat 111 - Lecture 18 - Review

  13. Inference using Samples ? Population Parameters:  or p Sampling Inference Estimation Sample Statistics: or • Continuous: pop. mean estimated by sample mean • Discrete: pop. proportion estimated by sample proportion • Key for inference: Sampling Distributions • Distribution of values taken by statistic in all possible samples from the same population Stat 111 - Lecture 18 - Review

  14. Sampling Distribution of Sample Mean • The center of the sampling distribution of the sample mean is the population mean: • Over all samples, the sample mean will, on average, be equal to the population mean (no guarantees for 1 sample!) • The standard deviation of the sampling distribution of the sample mean is • As sample size increases, standard deviation of the sample mean decreases! • Central Limit Theorem: if the sample size is large enough, then the sample mean has an approximately Normal distribution Stat 111 - Lecture 18 - Review

  15. Binomial/Normal Dist. For Proportions • Sample count Y follows Binomial distribution which we can calculate from Binomial tables in small samples • If the sample size is large (n·p and n(1-p)≥ 10), sample count Y follows a Normal distribution: • If the sample size is large, the sample proportion also approximately follows a Normal distribution: Stat 111 - Lecture 18 - Review

  16. Summary of Sampling Distribution Stat 111 - Lecture 18 - Review

  17. Introduction to Inference • Use sample estimate as center of a confidenceinterval of likely values for population parameter • All confidence intervals have the same form: • Estimate ± Margin of Error • The margin of error is always some multiple of the standard deviation (or standard error) of statistic • Hypothesis test: data supports specific hypothesis? • Formulate your Null and Alternative Hypotheses • Calculate the test statistic: difference between data and your null hypothesis • Find the p-value for the test statistic: how probable is your data if the null hypothesis is true? Stat 111 - Lecture 18 - Review

  18. Inference: Single Population Mean  • Known SD : confidence intervals and test statistics involve standard deviation and normal critical values • Unknown SD : confidence intervals and test statistics involve standard error and critical values from a t distribution with n-1 degrees of freedom • t distribution has wider tails (more conservative) Stat 111 - Lecture 18 - Review

  19. Inference: Comparing Means 1and 2 • Known 1 and2: two-sample Z statistic uses normal distribution • Unknown 1 and2: two-sample T statistic uses t distribution with degrees of freedom = min(n1-1,n2-1) • Matched pairs: instead of difference of two samples X1 and X2, do a one-sample test on the difference d Stat 111 - Lecture 18 - Review

  20. Inference: Population Proportion p • Confidence interval for p uses the Normal distribution and the sample proportion: • Hypothesis test for p = p0 also uses the Normal distribution and the sample proportion: Stat 111 - Lecture 18 - Review

  21. Inference: Comparing Proportions p1and p2 • Hypothesis test for p1 - p2 = 0 uses Normal distribution and complicated test statistic with pooled standard error: • Confidence interval for p1 = p2 also uses Normal distribution and sample proportions Stat 111 - Lecture 18 - Review

  22. Linear Regression • Use best fit line to summarize linear relationship between two continuous variables X and Y: • The slope ( ): average change you get in the Y variable if you increased the X variable by one • The intercept( ):average value of the Y variable when the X variable is equal to zero • Linear equation can be used to predict response variable Y for a value of our explanatory variable X Stat 111 - Lecture 18 - Review

  23. Significance in Linear Regression • Does the regression line show a significant linear relationship between the two variables? H0 : = 0 versus Ha :  0 • Uses the t distribution with n-2 degrees of freedom and a test statistic calculated from JMP output • Can also calculate confidence intervals using JMP output and t distribution with n-2 degrees of freedom Stat 111 - Lecture 18 - Review

  24. Good Luck on Final! • I have office hours until 2pm today • Might be able to meet at other times Stat 111 - Lecture 18 - Review

More Related