1 / 50

Probability & Statistical Inference Lecture 6

Probability & Statistical Inference Lecture 6. MSc in Computing (Data Analytics). Lecture Outline. Quick Recap Testing the difference between two sample means Practical Hypothesis Testing Analysis Of Variance. General Steps in Hypotheses testing.

reuben
Download Presentation

Probability & Statistical Inference Lecture 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probability & Statistical Inference Lecture 6 MSc in Computing (Data Analytics)

  2. Lecture Outline • Quick Recap • Testing the difference between two sample means • Practical Hypothesis Testing • Analysis Of Variance

  3. General Steps in Hypotheses testing • From the problem context, identify the parameter of interest. • State the null hypothesis, H0. • Specify an appropriate alternative hypothesis, H1. • Choose a significance level, . • Determine an appropriate test statistic. • State the rejection region for the statistic. • Compute any necessary sample quantities, substitute these into the equation for the test statistic, and compute that value. • Decide whether or not H0 should be rejected and report that in the problem context.

  4. Type of questions that can be answered with Two sample hypothesis tests • A manufacturing plant want to compare the defective rate of items coming off two different process lines. • Whether the test results of patients who received a drug are better than test results of those who received a placebo. • The question being answered is whether there is a significant (or only random) difference in the average cycle time to deliver a pizza from Pizza Company A vs. Pizza Company B.

  5. Difference in Means of Two Normal Distributions, Variances Known

  6. Test Assumptions

  7. Example

  8. Example

  9. Example The P-Value is the exact significance level of a statistical test; that is the probability of obtaining a value of the test statistic that is at least as extreme as that when the null hypothesis is true

  10. Confidence Interval on a Difference in Means, Variances Known

  11. Example

  12. Example

  13. Difference in Means of Two Normal Distributions, Variances unknown We wish to test: The pooled estimator of 2:

  14. Difference in Means of Two Normal Distributions, Variances unknown

  15. Example

  16. Example

  17. Example

  18. Confidence Interval on the Difference in Means, Variance Unknown

  19. Example

  20. Example

  21. Example

  22. Practical Hypothesis Testing • From the problem context, identify the parameter of interest. • State the null hypothesis, H0. • Specify an appropriate alternative hypothesis, H1. • Choose a significance level, . • Calculate the P-value using a software package of choice. • Decide whether or not H0 should be rejected and report that in the problem context. Reject H0 when P-Value is less than . (Golden rule: Reject H0 for small )

  23. Some Reserach • Look up the correct formula for calculating the hypotheses test between two proportions • What are the assumptions for the test • Find an example of the research

  24. Analysis of Variance

  25. Introduction • In the previous section we were concerned with the analysis of data where we compared the sample means. • Frequently data contains more that two samples, they may compare several treatments. • In this lecture we introduce statistical analysis that allows us compare the mean of more that two samples. The method is called ‘Analysis of Variance ‘ or AVOVA for short.

  26. Total Sum of Squares Data set: 14, 12, 10, 6 ,4, 2 Group A: 6 ,4, 2 Group B: 14, 12, 10 Overall Mean : 8 Total Sum of Squares: SST= (14-8)2 + (12-8)2 + (10-8)2 + (6-8)2 + (4-8)2 + (2-8)2 =112

  27. Between Group Variation • Sum of Squares of the Model: SSm= na(µ - µa)2 + nb(µ - µb)2 =3*(8-4)2 +3*(8-12)2 =96

  28. Within Group Variation • Sum of Squares of the Error: SSe= = (14-12)2 + (12-12)2 + (10-12)2 + (6-4)2 + (4-4)2 + (4-2)2 + = 16

  29. Structure of the Data

  30. ANOVA Table Where : n is the sample size and a is the number of groups

  31. ANOVA Table – Original Example Where : n is the sample size and a is the number of groups

  32. Model Assumptions • Independence of observations within and between samples • normality of sampling distribution • equal variance - This is also called the homoscedasticity assumption

  33. The ANOVA Equation • We can describe the observations in the above table using the following equation: Where : n is the sample size and k is the number of groups

  34. ANOVA Hypotheses We wish to test the hypotheses: The analysis of variance partitions the total variability into two parts.

  35. Example

  36. Graphical Display of Data Figure 13-1 (a)Box plots of hardwood concentration data. (b) Display of the model in Equation 13-1 for the completely randomized single-factor experiment

  37. Example • We can use ANOVA to test the hypotheses that different hardwood concentrations do not affect the mean tensile strength of the paper. The hypotheses are: • The ANOVA table is below:

  38. Example • The p-value is less than 0.05 therefore the H0 can be rejected and we can conclude that at least one of the hardwood concentrations affects the mean tensile strength of the paper.

  39. Test Model Assumptions • Use the Bartletts Test to test for homoscedasticity assumption • Bartlett's test (Snedecor and Cochran, 1983) is used to test if k samples have equal variances. • Bartlett's test is sensitive to departures from normality. That is, if your samples come from non-normal distributions, then Bartlett's test may simply be testing for non-normality. The Levene test is an alternative to the Bartlett test that is less sensitive to departures from normality.

  40. Barlett Test for Equal Variance • The hypotheses for the Barlett test are as follows: • The barlett test statistic follows a chi-squared distribution • Interpert the p-value like any other hypothese test

  41. If the Assumption of Equal Variance is not met • If the assumption for equal variance is not met use the Welches ANOVA • Assignment for next week: • Investigate the difference between the standard ANOVA and WelchesANOVA?

  42. Demo

  43. Confidence Interval about the mean For 20% hardwood, the resulting confidence interval on the mean is

  44. Confidence Interval about on the difference of two treatments For the hardwood concentration example,

  45. An Unbalanced Experiment

  46. Multiple Comparisons Following the ANOVA • The least significant difference (LSD) is If the sample sizes are different in each treatment:

  47. Example: Multi-comparison Test

  48. Example: Multi-comparison Test

  49. Demo

  50. Exercises

More Related