Multiple Regression Assumptions and Outliers
Download

Multiple Regression Assumptions and Outliers

Advertisement
Download Presentation
Comments
Jims
From:
|  
(3845) |   (0) |   (0)
Views: 28 | Added: 11-06-2013
Rate Presentation: 0 0
Description:
SW388R7Data Analysis
Multiple Regression Assumptions and Outliers

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.











- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -




1. SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression ? Assumptions and Outliers Multiple Regression and Assumptions Multiple Regression and Outliers Strategy for Solving Problems Practice Problems

2. SW388R7 Data Analysis & Computers II Slide 2 Multiple Regression and Assumptions Multiple regression is most effect at identifying relationship between a dependent variable and a combination of independent variables when its underlying assumptions are satisfied: each of the metric variables are normally distributed, the relationships between metric variables are linear, and the relationship between metric and dichotomous variables is homoscedastic. Failing to satisfy the assumptions does not mean that our answer is wrong. It means that our solution may under-report the strength of the relationships.

3. SW388R7 Data Analysis & Computers II Slide 3 Multiple Regression and Outliers Outliers can distort the regression results. When an outlier is included in the analysis, it pulls the regression line towards itself. This can result in a solution that is more accurate for the outlier, but less accurate for all of the other cases in the data set. We will check for univariate outliers on the dependent variable and multivariate outliers on the independent variables.

4. SW388R7 Data Analysis & Computers II Slide 4 Relationship between assumptions and outliers The problems of satisfying assumptions and detecting outliers are intertwined. For example, if a case has a value on the dependent variable that is an outlier, it will affect the skew, and hence, the normality of the distribution. Removing an outlier may improve the distribution of a variable. Transforming a variable may reduce the likelihood that the value for a case will be characterized as an outlier.

5. SW388R7 Data Analysis & Computers II Slide 5 Order of analysis is important The order in which we check assumptions and detect outliers will affect our results because we may get a different subset of cases in the final analysis. In order to maximize the number of cases available to the analysis, we will evaluate assumptions first. We will substitute any transformations of variable that enable us to satisfy the assumptions. We will use any transformed variables that are required in our analysis to detect outliers.

6. SW388R7 Data Analysis & Computers II Slide 6 Strategy for solving problems Run type of regression specified in problem statement on variables using full data set. Test the dependent variable for normality. If it does not satisfy the criteria for normality unless transformed, substitute the transformed variable in the remaining tests that call for the use of the dependent variable. Test for normality, linearity, homoscedasticity using scripts. Decide which transformations should be used. Substitute transformations and run regression entering all independent variables, saving studentized residuals and Mahalanobis distance scores. Compute probabilities for D?. Remove the outliers (studentized residual greater than 3 or Mahalanobis D? with p <= 0.001), and run regression with the method and variables specified in the problem. Compare R? for analysis using transformed variables and omitting outliers (step 5) to R? obtained for model using all data and original variables (step 1).

7. SW388R7 Data Analysis & Computers II Slide 7 Transforming dependent variables If dependent variable is not normally distributed: Try log, square root, and inverse transformation. Use first transformed variable that satisfies normality criteria. If no transformation satisfies normality criteria, use untransformed variable and add caution for violation of assumption. If a transformation satisfies normality, use the transformed variable in the tests of the independent variables.

8. SW388R7 Data Analysis & Computers II Slide 8 Transforming independent variables - 1 If independent variable is normally distributed and linearly related to dependent variable, use as is. If independent variable is normally distributed but not linearly related to dependent variable: Try log, square root, square, and inverse transformation. Use first transformed variable that satisfies linearity criteria and does not violate normality criteria If no transformation satisfies linearity criteria and does not violate normality criteria, use untransformed variable and add caution for violation of assumption

9. SW388R7 Data Analysis & Computers II Slide 9 Transforming independent variables - 2 If independent variable is linearly related to dependent variable but not normally distributed: Try log, square root, and inverse transformation. Use first transformed variable that satisfies normality criteria and does not reduce correlation. Try log, square root, and inverse transformation. Use first transformed variable that satisfies normality criteria and has significant correlation. If no transformation satisfies normality criteria with a significant correlation, use untransformed variable and add caution for violation of assumption

10. SW388R7 Data Analysis & Computers II Slide 10 Transforming independent variables - 3 If independent variable is not linearly related to dependent variable and not normally distributed: Try log, square root, square, and inverse transformation. Use first transformed variable that satisfies normality criteria and has significant correlation. If no transformation satisfies normality criteria with a significant correlation, used untransformed variable and add caution for violation of assumption

11. SW388R7 Data Analysis & Computers II Slide 11 Impact of transformations and omitting outliers We evaluate the regression assumptions and detect outliers with a view toward strengthening the relationship. This may not happen. The regression may be the same, it may be weaker, and it may be stronger. We cannot be certain of the impact until we run the regression again. In the end, we may opt not to exclude outliers and not to employ transformations; the analysis informs us of the consequences of doing either.

12. SW388R7 Data Analysis & Computers II Slide 12 Notes Whenever you start a new problem, make sure you have removed variables created for previous analysis and have included all cases back into the data set. I have added the square transformation to the checkboxes for transformations in the normality script. Since this is an option for linearity, we need to be able to evaluate its impact on normality. If you change the options for output in pivot tables from labels to names, you will get an error message when you use the linearity script. To solve the problem, change the option for output in pivot tables back to labels.

13. SW388R7 Data Analysis & Computers II Slide 13 Problem 1 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data. Use a level of significance of 0.01 for the regression analysis. Use a level of significance of 0.01 for evaluating assumptions. The research question requires us to identify the best subset of predictors of "total family income" [income98] from the list: "sex" [sex], "how many in family earned money" [earnrs], and "income" [rincom98]. After substituting transformed variables to satisfy regression assumptions and removing outliers, the total proportion of variance explained by the regression analysis increased by 10.8%. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic

14. SW388R7 Data Analysis & Computers II Slide 14 Dissecting problem 1 - 1 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data. Use a level of significance of 0.01 for the regression analysis. Use a level of significance of 0.01 for evaluating assumptions. The research question requires us to identify the best subset of predictors of "total family income" [income98] from the list: "sex" [sex], "how many in family earned money" [earnrs], and "income" [rincom98]. After substituting transformed variables to satisfy regression assumptions and removing outliers, the total proportion of variance explained by the regression analysis increased by 10.8%. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic

15. SW388R7 Data Analysis & Computers II Slide 15 Dissecting problem 1 - 2 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data. Use a level of significance of 0.01 for the regression analysis. Use a level of significance of 0.01 for evaluating assumptions. The research question requires us to identify the best subset of predictors of "total family income" [income98] from the list: "sex" [sex], "how many in family earned money" [earnrs], and "income" [rincom98]. After substituting transformed variables to satisfy regression assumptions and removing outliers, the total proportion of variance explained by the regression analysis increased by 10.8%. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic

16. SW388R7 Data Analysis & Computers II Slide 16 Dissecting problem 1 - 3 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data. Use a level of significance of 0.01 for the regression analysis. Use a level of significance of 0.01 for evaluating assumptions. The research question requires us to identify the best subset of predictors of "total family income" [income98] from the list: "sex" [sex], "how many in family earned money" [earnrs], and "income" [rincom98]. After substituting transformed variables to satisfy regression assumptions and removing outliers, the total proportion of variance explained by the regression analysis increased by 10.8%. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic

17. SW388R7 Data Analysis & Computers II Slide 17 R? before transformations or removing outliers

18. SW388R7 Data Analysis & Computers II Slide 18 R? before transformations or removing outliers

19. SW388R7 Data Analysis & Computers II Slide 19 R? before transformations or removing outliers

20. SW388R7 Data Analysis & Computers II Slide 20 Normality of the dependent variable: total family income

21. SW388R7 Data Analysis & Computers II Slide 21 Normality of the dependent variable: total family income

22. SW388R7 Data Analysis & Computers II Slide 22 Linearity and independent variable: how many in family earned money

23. SW388R7 Data Analysis & Computers II Slide 23 Linearity and independent variable: how many in family earned money

24. SW388R7 Data Analysis & Computers II Slide 24 Normality of independent variable: how many in family earned money

25. SW388R7 Data Analysis & Computers II Slide 25 Normality of independent variable: how many in family earned money

26. SW388R7 Data Analysis & Computers II Slide 26 Normality of independent variable: how many in family earned money

27. SW388R7 Data Analysis & Computers II Slide 27 Transformation for how many in family earned money The independent variable, how many in family earned money, had a linear relationship to the dependent variable, total family income. The logarithmic transformation improves the normality of "how many in family earned money" [earnrs] without a reduction in the strength of the relationship to "total family income" [income98]. We will substitute the logarithmic transformation of how many in family earned money in the regression analysis.

28. SW388R7 Data Analysis & Computers II Slide 28 Normality of independent variable: respondent?s income

29. SW388R7 Data Analysis & Computers II Slide 29 Normality of independent variable: respondent?s income

30. SW388R7 Data Analysis & Computers II Slide 30 Linearity and independent variable: respondent?s income

31. SW388R7 Data Analysis & Computers II Slide 31 Linearity and independent variable: respondent?s income

32. SW388R7 Data Analysis & Computers II Slide 32 Homoscedasticity: sex

33. SW388R7 Data Analysis & Computers II Slide 33 Homoscedasticity: sex

34. SW388R7 Data Analysis & Computers II Slide 34 Adding a transformed variable

35. SW388R7 Data Analysis & Computers II Slide 35 The transformed variable in the data editor

36. SW388R7 Data Analysis & Computers II Slide 36 The regression to identify outliers

37. SW388R7 Data Analysis & Computers II Slide 37 Saving the measures of outliers

38. SW388R7 Data Analysis & Computers II Slide 38 The variables for identifying outliers

39. SW388R7 Data Analysis & Computers II Slide 39 Computing the probability for Mahalanobis D?

40. SW388R7 Data Analysis & Computers II Slide 40 Formula for probability for Mahalanobis D?

41. SW388R7 Data Analysis & Computers II Slide 41 Multivariate outliers

42. SW388R7 Data Analysis & Computers II Slide 42 Univariate outliers

43. SW388R7 Data Analysis & Computers II Slide 43 Omitting the outliers

44. SW388R7 Data Analysis & Computers II Slide 44 Specifying the condition to omit outliers

45. SW388R7 Data Analysis & Computers II Slide 45 The formula for omitting outliers

46. SW388R7 Data Analysis & Computers II Slide 46 Completing the request for the selection

47. SW388R7 Data Analysis & Computers II Slide 47 The omitted multivariate outlier

48. SW388R7 Data Analysis & Computers II Slide 48 Running the regression without outliers

49. SW388R7 Data Analysis & Computers II Slide 49 Opening the save options dialog

50. SW388R7 Data Analysis & Computers II Slide 50 Clearing the request to save outlier data

51. SW388R7 Data Analysis & Computers II Slide 51 Opening the statistics options dialog

52. SW388R7 Data Analysis & Computers II Slide 52 Requesting descriptive statistics

53. SW388R7 Data Analysis & Computers II Slide 53 Requesting the output

54. SW388R7 Data Analysis & Computers II Slide 54 Sample size requirement

55. SW388R7 Data Analysis & Computers II Slide 55 Significance of regression relationship

56. SW388R7 Data Analysis & Computers II Slide 56 Increase in proportion of variance

57. SW388R7 Data Analysis & Computers II Slide 57 Problem 2 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data. Use a level of significance of 0.05 for the regression analysis. Use a level of significance of 0.01 for evaluating assumptions. The research question requires us to examine the relationship of "age" [age], "highest year of school completed" [educ], and "sex" [sex] to the dependent variable "occupational prestige score" [prestg80]. After substituting transformed variables to satisfy regression assumptions and removing outliers, the proportion of variance explained by the regression analysis increased by 3.6%. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic

58. SW388R7 Data Analysis & Computers II Slide 58 Dissecting problem 2 - 1 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data. Use a level of significance of 0.05 for the regression analysis. Use a level of significance of 0.01 for evaluating assumptions. The research question requires us to examine the relationship of "age" [age], "highest year of school completed" [educ], and "sex" [sex] to the dependent variable "occupational prestige score" [prestg80]. After substituting transformed variables to satisfy regression assumptions and removing outliers, the proportion of variance explained by the regression analysis increased by 3.6%. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic

59. SW388R7 Data Analysis & Computers II Slide 59 Dissecting problem 2 - 2 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data. Use a level of significance of 0.05 for the regression analysis. Use a level of significance of 0.01 for evaluating assumptions. The research question requires us to examine the relationship of "age" [age], "highest year of school completed" [educ], and "sex" [sex] to the dependent variable "occupational prestige score" [prestg80]. After substituting transformed variables to satisfy regression assumptions and removing outliers, the proportion of variance explained by the regression analysis increased by 3.6%. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic

60. SW388R7 Data Analysis & Computers II Slide 60 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data. Use a level of significance of 0.05 for the regression analysis. Use a level of significance of 0.01 for evaluating assumptions. The research question requires us to examine the relationship of "age" [age], "highest year of school completed" [educ], and "sex" [sex] to the dependent variable "occupational prestige score" [prestg80]. After substituting transformed variables to satisfy regression assumptions and removing outliers, the proportion of variance explained by the regression analysis increased by 3.6%. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic Dissecting problem 2 - 3

61. SW388R7 Data Analysis & Computers II Slide 61 R? before transformations or removing outliers

62. SW388R7 Data Analysis & Computers II Slide 62 R? before transformations or removing outliers

63. SW388R7 Data Analysis & Computers II Slide 63 Normality of the dependent variable

64. SW388R7 Data Analysis & Computers II Slide 64 Normality of the dependent variable

65. SW388R7 Data Analysis & Computers II Slide 65 Normality of independent variable: Age

66. SW388R7 Data Analysis & Computers II Slide 66 Normality of independent variable: Age

67. SW388R7 Data Analysis & Computers II Slide 67 Linearity and independent variable: Age

68. SW388R7 Data Analysis & Computers II Slide 68 Linearity and independent variable: Age

69. SW388R7 Data Analysis & Computers II Slide 69 Transformation for Age The independent variable age satisfied the criteria for normality. The independent variable age did not have a linear relationship to the dependent variable occupational prestige. However, none of the transformations linearized the relationship. No transformation will be used - it would not help linearity and is not needed for normality.

70. SW388R7 Data Analysis & Computers II Slide 70 Linearity and independent variable: Highest year of school completed

71. SW388R7 Data Analysis & Computers II Slide 71 Linearity and independent variable: Highest year of school completed

72. SW388R7 Data Analysis & Computers II Slide 72 Normality of independent variable: Highest year of school completed

73. SW388R7 Data Analysis & Computers II Slide 73 Normality of independent variable: Highest year of school completed

74. SW388R7 Data Analysis & Computers II Slide 74 Transformation for highest year of school The independent variable, highest year of school, had a linear relationship to the dependent variable, occupational prestige. The independent variable, highest year of school, did not satisfy the criteria for normality. None of the transformations for normalizing the distribution of "highest year of school completed" [educ] were effective. No transformation will be used - it would not help normality and is not needed for linearity. A caution should be added to any findings.

75. SW388R7 Data Analysis & Computers II Slide 75 Homoscedasticity: sex

76. SW388R7 Data Analysis & Computers II Slide 76 Homoscedasticity: sex

77. SW388R7 Data Analysis & Computers II Slide 77 Adding a transformed variable

78. SW388R7 Data Analysis & Computers II Slide 78 The transformed variable in the data editor

79. SW388R7 Data Analysis & Computers II Slide 79 The regression to identify outliers

80. SW388R7 Data Analysis & Computers II Slide 80 Saving the measures of outliers

81. SW388R7 Data Analysis & Computers II Slide 81 The variables for identifying outliers

82. SW388R7 Data Analysis & Computers II Slide 82 Computing the probability for Mahalanobis D?

83. SW388R7 Data Analysis & Computers II Slide 83 Formula for probability for Mahalanobis D?

84. SW388R7 Data Analysis & Computers II Slide 84 The multivariate outlier

85. SW388R7 Data Analysis & Computers II Slide 85 The univariate outlier

86. SW388R7 Data Analysis & Computers II Slide 86 Omitting the outliers

87. SW388R7 Data Analysis & Computers II Slide 87 Specifying the condition to omit outliers

88. SW388R7 Data Analysis & Computers II Slide 88 The formula for omitting outliers

89. SW388R7 Data Analysis & Computers II Slide 89 Completing the request for the selection

90. SW388R7 Data Analysis & Computers II Slide 90 The omitted multivariate outlier

91. SW388R7 Data Analysis & Computers II Slide 91 Running the regression without outliers

92. SW388R7 Data Analysis & Computers II Slide 92 Opening the save options dialog

93. SW388R7 Data Analysis & Computers II Slide 93 Clearing the request to save outlier data

94. SW388R7 Data Analysis & Computers II Slide 94 Opening the statistics options dialog

95. SW388R7 Data Analysis & Computers II Slide 95 Requesting descriptive statistics

96. SW388R7 Data Analysis & Computers II Slide 96 Requesting the output

97. SW388R7 Data Analysis & Computers II Slide 97 Sample size requirement

98. SW388R7 Data Analysis & Computers II Slide 98 Significance of regression relationship

99. SW388R7 Data Analysis & Computers II Slide 99 Increase in proportion of variance

100. SW388R7 Data Analysis & Computers II Slide 100 Impact of assumptions and outliers - 1

101. SW388R7 Data Analysis & Computers II Slide 101 Impact of assumptions and outliers - 2

102. SW388R7 Data Analysis & Computers II Slide 102 Impact of assumptions and outliers - 3

103. SW388R7 Data Analysis & Computers II Slide 103 Impact of assumptions and outliers - 4

104. SW388R7 Data Analysis & Computers II Slide 104 Impact of assumptions and outliers - 5


Other Related Presentations

Copyright © 2014 SlideServe. All rights reserved | Powered By DigitalOfficePro