1 / 41

STAT 5372: Experimental Statistics

STAT 5372: Experimental Statistics. Wayne Woodward Office: 143 Heroy Phone: (214)768-2457 e-mail: waynew@smu.edu URL: faculty.smu.edu/waynew Hours: 2:00 - 3:00 MWF 3:00 - 4:00 Th - others by appointment.

Download Presentation

STAT 5372: Experimental Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STAT 5372:Experimental Statistics Wayne Woodward Office: 143 Heroy Phone: (214)768-2457 e-mail: waynew@smu.edu URL: faculty.smu.edu/waynew Hours: 2:00 - 3:00 MWF 3:00 - 4:00 Th - others by appointment

  2. On a sheet of paper: • Name • Major (undergraduate/graduate) • Previous stat courses: • STAT 5371? • STAT/CSE/EMIS 4340? • other – describe briefly • Have you used SAS?

  3. Review • Sampling Distributions • Statistical Inference • Confidence Intervals • Hypothesis Tests

  4. Sampling / Sampling Distributions • Population-- totality of all observations of interest • Random Variable (rv)-- a characteristic that can take on different values from object to object • Sample-- subset of a population • random sample: observations made independently and at random Y1, Y2, … , Yn– typical notation for a random sample  Parameter –a characteristic of a population -- population mean (m), standard deviation (s), …

  5. Random Variables • Discrete –you can count the possible outcomes • Discrete distributions: • binomial,Poisson, … • Continuous –possible values fall along a continuum • Continuous distributions: • normal (Gaussian), chi-square, t, F, …

  6. Normal Curve: • -- symmetric, bell-shaped • -- for this particular distribution: • - data concentrated about 60 • - very few data values above 100 or less than 20

  7. Standard Normal(Z-score)  Has mean zero and standard deviation 1  Graph of standard normal is symmetric about 0  Normal table givesP[Z ≤ z]

  8. Find: P[Z≤ 2.5] P[Z > 1.6] Supposem = 50 ands = 10. Find: P[X≤ 45] P[X > 70]

  9. Statistic- function of random variables • - typically used to estimate parameters Examples of Statistics: sample mean sample variance

  10. Key Concept Statistics are random variables and have their own distributions - called sampling distributions

  11. Sampling Distribution of the Sample Mean IF: • Data are Normally distributed • Observations are independent Then: The Sample Mean has aNormal Probability Distribution with -- Mean = m -- Standard Error = s/n has a standard normal distribution

  12. Supposem = 50 ands = 10 for a normal populationand suppose further that a random sample of size n = 25 is taken. Find:

  13. Central Limit Theorem IF: • Independent Observations • Sample Size is Sufficiently Large Then: has a an approximateStandard Normal distribution

  14. Supposem = 50 ands = 10 for a non-normal populationand suppose further that a random sample of size n = 50 is taken. Find:

  15. Distribution of Sample Mean - s Unknown • IF: • Data Values are Normally Distributed • Observations are Independent • Then: has a Student’s t distributionwith n - 1df

  16. t-distribution -- Figure 5.16, page 229

  17. taNotation a ta za is obtained from bottom (inf.) row of t-table

  18. (1-a)x100% Confidence Intervals for m • Setting: • Data are Normally Distributed • Observations are Independent • We want an interval that probably contains the population mean m Case 1:s known Case 2:s unknown

  19. CI Example An insurance company is concerned about the number and magnitude of hail damage claims it received this year. A random sample 20 of the thousands of claims it received this year showed an average claim amount of $6,500 and a standard deviation of $1,500. (You can assume that claims have a normal distribution.Find a a 95% confidence interval on the mean claim damage amount. Suppose that company actuaries believe the company does not need to increase insurance rates for hail damage if the mean claim damage amount is no greater than $7,000. Use the above information to make a recommendation regarding whether rates should be raised.

  20. Last time we found 95% CI to be: ($5798, $7202) What does this mean? “There is a .95 probability that the population mean (m) is between $5798 and $7202”? Not exactly.

  21. Interpretation of 95% Confidence Interval i.e. about 95% of these confidence intervals should “cover” the true mean 100 different 95% CI plotted in the case for which true mean is 80

  22. Last time we found 95% CI to be: ($5798, $7202) What does this mean? “There is a .95 probability that the population mean (m) is between $5798 and $7202”? Not exactly. A better statement; “About 95% of confidence intervals obtained in this manner will cover the true mean.” We say: “we are 95% confident that the mean falls in the interval … ”

  23. Concern has been mounting that SAT scores are falling. • 3 years ago -- National AVG = 955 • Random Sample of 200 graduating high school students this year (sample average = 935)(each year the standard deviation is about 100) Question:Have SAT scores dropped ? Procedure:Determine how “extreme” or “rare” our sample AVG of 935 is if population AVG really is 955.

  24. If Population average = 955, what is the probability of getting a sample average (from a sample of size 200) that is less than or equal to 935?

  25. We must decide: • The sample came from population with population AVG = 955 and just by chance the sample AVG is “small.” OR • We are not willing to believe that the pop. AVG this year is really 955. (Conclude SAT scores have fallen.)

  26. Hypothesis Testing Terminology Statistical Hypothesis - statement about the parameters of one or more populations • Null Hypothesis (H0) • - hypothesis to be “tested” • (standard, traditional, claimed, etc.) • - hypothesis of no change, effect, or difference • (usually what the investigator wants to disprove) • Alternative Hypothesis (Ha) • - null is not correct • (usually what the hypothesis the investigator suspects or wants to show)

  27. Basic Hypothesis Testing Question: Do the Data provide sufficient evidence to refute the Null Hypothesis? Test Statistic - measures how far the observed statistic is from the hypothesized parameter (under H0) Example:H0: m = 50 Test statistic:

  28. Hypothesis Testing (cont.) Critical Region (Rejection Region) - region of test statistic that leads to rejection of null (i.e. t > c, etc.) Critical Value - endpoint of critical region Significance Level - probability that the test statistic will be in the critical region if null is true - probability of rejecting H0 when it is true

  29. Types of Hypotheses One-Sided Tests Two-sided Tests

  30. Rejection Regions for One- and Two-Sided Alternatives a -ta Critical Value

  31. A Standard Hypothesis Test Write-up 1. State the null and alternative 2. Give significance level, test statistic,and the rejection region 3. Show calculations 4. State the conclusion - statistical decision - give conclusion in language of the problem

  32. Hypothesis Testing Example 1 A solar cell requires a special crystal. If properly manufactured, the mean weight of these crystals is .4g. Suppose that 25 crystals are selected at random from a batch of crystals and it is calculated that for these crystals, the average is .41g with a standard deviation of .02g. At the a= .01 level of significance, can we conclude that the batch is bad?

  33. Hypothesis Testing Example 2 A box of detergent is designed to weigh on the average 3.25 lbs per box. A random sample of 18 boxes taken from the production line on a single day has a sample average of 3.238 lbs and a standard deviation of 0.037 lbs. Test whether the boxes seem to be underfilled.

  34. Errors in Hypothesis Testing Actual Situation Null is True Null is False Correct Decision Do Not Reject Ho Type IIError ( 1 - a) ( b) Conclusion Correct Decision Type I Error Reject Ho (Power) ( a) ( 1 - b) Power

  35. Note: There are many ways that H0can be false Example: H0: m =50 This null hypothesis is “false” if: (a) m = 51 (b) m = 60 (c) m = 80  If (c) is the actual situation, then the “power” of the test will probably be large In the case of (a), the “power” will likely be small

  36. p-Value Note: “Large negative values” of tmake us believe alternative is true the probability of an observation as extreme or more extreme than the one observed when the null is true Suppose t = - 2.39 is observed from data for test above p-value -2.39 (observed value of t)

  37. Note: -- if p-value is less than or equal to a, then we reject null at thea significance level -- the p-value is the smallest level of significance at which the null hypothesis would be rejected

  38. Find the p-values for Examples 1 and 2

More Related