1 / 30

Qunatitative Methods in Social Sciences (E774)

Qunatitative Methods in Social Sciences (E774). Sudip Ranjan BASU , Ph.D 27 November 2009. Model Selection Procedures. Selecting explanatory variables for a model Maximum R 2 Backward elimination all significant coefficients Forward selection adding variables Stepwise regression

bart
Download Presentation

Qunatitative Methods in Social Sciences (E774)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Qunatitative Methods in Social Sciences (E774) Sudip Ranjan BASU,Ph.D 27 November 2009

  2. Model Selection Procedures Selecting explanatory variables for a model Maximum R2 Backward elimination all significant coefficients Forward selection adding variables Stepwise regression drop variables if they loose their significance as other variables added Exploratory vs. Explanatory Research Part -1 Lecture 11-Sudip R. Basu 2

  3. Regression Diagnostics Regression function follows a linear relationship Conditional distribution of Y (dependent variable) follows a normal distribution Homoscedasticity: Conditional distribution of Y has constant standard deviation throughout the range of values of the explanatory variables. Sample is randomly selected Lecture 11-Sudip R. Basu 3

  4. Checking Residuals Examine the residuals Plotting Residuals against Explanatory variables Lecture 11-Sudip R. Basu 4

  5. Heteroskedasticity Estimation of regression model Obtain Residuals Plotting Residuals against Fitter values or one or more Explanatory variables Plotting Residuals against Fitter values or one or more Explanatory variables Breusch-Pagan/Cook-Weisberg test Ho=Constant variance P value (Chi-square test)=0.00 Reject constant variance hypothesis Significant heteroskedasticity implies SE and Ho might be invalid Lecture 11-Sudip R. Basu 5

  6. Outliers: Influential Observations? Remove Outliers Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining DFFIT: effect on the fit of deleting observation The larger its absolute value, greater the influence that observation has on fitted values DFBETA: effect on the model parameter estimates of removing observation from dataset The larger the absolute value, the greater the influence of the observations on the parameter estimates Cook’s distance: effect that observation i has on all the predicted values Lecture 11-Sudip R. Basu 6

  7. Multicollinearity: Multicollinearity: Explanatory variables ‘overlap’ considerably and higher R2 values Multicollinearity inflates standard errors Variance inflation factor: multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors Lecture 11-Sudip R. Basu 7

  8. Generalised Linear Model Response variable (y) is non-normal y discrete binary variable (success or failure) Logistic regression Y discrete count variable (# of children) Poisson and negative binomial distribution Y continuous non-normal variable Gamma distribution Part -2: NOT PART OF FINAL EXAM on 7th December Lecture 11-Sudip R. Basu 8

  9. Link function Link function is g(μ) Identity link Log link Logistic link Use of explanatory variables Lecture 11-Sudip R. Basu 9

  10. OLS as special case of GLM Y can have a distribution other than the normal GLM can model a function of the mean GLM not need to transform data Maximum likelihood applied to GLM Choose most appropriate probability distribution of y variable Lecture 11-Sudip R. Basu 10

  11. Nonlinear relationship If data nonlinear, then OLS may underestimate result Prediction may poorly approximate the true regression curve Two approaches to handle Polynomial regression Loglinear regression Lecture 11-Sudip R. Basu 11

  12. Quadratic regression models A polynomial regression function, y response and x explanatory variable Quadratic regression model Cubic regression model Lecture 11-Sudip R. Basu 12

  13. Nonparametric regression Fitting model without assuming particular functional forms No (fewer) assumptions of functional and distribution of y Plot of a fitted nonparametic regression model to learn about (overall) trends in data Generalised additive model, when GLM is the special case if these functions are linear Lecture 11-Sudip R. Basu 13

  14. Exponential regression Y is an exponential function Taking logarithm of exponential function Interpreting ‘Multiplicative’ not linear coefficients E(y) changes by the same percentage for each unit increase in X Lecture 11-Sudip R. Basu 14

  15. Logistic Regression Y –categorical variable with two possible outcomes Binary response variable (1 or 0), P(y=1) Binomial distribution Linear probability model Single explanatory variable Lecture 11-Sudip R. Basu 15

  16. Binary response y variable Curvilinear relationship Odds ratio Logistic transformation Logistic regression model For β>0, P(y=1) increases as X increases For β<0, P(y=1) decreases as X increases For β=0, P(y=1) does not change as X increases Logistic regression for probabilities Lecture 11-Sudip R. Basu 16

  17. Binomial probability distribution Categorical data, discrete variable Each observation falls into 2 categories Probabilities for two categories are same for each observation Category 1: π, Category 2: 1- π Outcomes of successive observations are independent Lecture 11-Sudip R. Basu 17

  18. Properties of binomial distributions BD perfectly symmetric if π=0.50, otherwise skewed Skewness increases as π gets closer to 0 or 1 Sample proportion Mean , standard deviation Binomial test Lecture 11-Sudip R. Basu 18

  19. Multiple Logistic Regression LRM with multiple predictors LRM probabilities For 2 predictors Odds ratio-log of odds, multiplicative Probabilities of predictors impact Lecture 11-Sudip R. Basu 19

  20. Inference for LRM Bivariate logistic regression Ho=0, x has no effect on P(y=1) Use z-distribution, except for small sample Wald statistic, chi-square dist, df=1 Likelihood-ratio test: extra parameters in full model is equal to 0 LR Test statistic: For large samples, W and LR similar results For small sample, use LR test results Lecture 11-Sudip R. Basu 20

  21. Probit Regression Y –categorical variable with standard normal probability distribution Probit score/index A one-unit increase in x leads to increasing the probit score by ‘b’ standard deviations. Probit model Probabilities Lecture 11-Sudip R. Basu 21

  22. Wrap-up Week 8- Week 12 Lecture 11-Sudip R. Basu

  23. Correlation Theory Correlation shows the association between two or more variables Degree of relationship Degree of covariability 3 key issues If relationship exists? And how to measure Testing the significance Exploring cause and effect relation Correlation: Positive or negative Depends on direction of change of variables. If both variables are varying in same direction, then positive correlation. Simple, partial and multiple Simple if two variables are studied Partial/multiple if three or more variables are studied Linear and non-linear Depends on the constancy of the ratio of change between the variables. If the amount of change in one variable tends to bear a constant ration to the amount of change in the other variable, then linear relationship. Scatter diagram A way to see if two variables are related, represented by a dot chart Graphic method Two variables observations are plotted by looking at their direction and closeness Lecture 11-Sudip R. Basu

  24. Correlation coefficient Correlation coefficient (Pearson’s r) Describes degree of correlation between two variables Interpreting r (-1«r«+1) If r=+1: perfect positive relationship If r=-1: perfect negative relationship If r=0: no relationship Coefficient of determination: explained variation/total variation (>less value is not good!!) Rank Correlation: Using orders/ranks of observations rather than actual observations between two variables Steps to compute « r »: Compute deviations of observations of two variables from their respective mean values Square these deviations and obtain sum of squared deviations Multiply the deviations of observations of two variables and obtain total Substitute these values in the formula above: Lecture 11-Sudip R. Basu

  25. Linear Regression Model Regression function is a mathematical function that describes how mean of Y changes according to the value of X β is regression coefficient σ Conditional standard deviation Estimate R2 of predicted equation H0: β =0 –variables are statistically independent Test statistic: Standard error of b: Multiple regression function Slope in mrf describes the effect of an explanatory variable while controlling effects of other explanatory variables in the model β1 and β2are partial regression coefficients R-squared (0,1) Coefficient of multiple determinations If R2=1 If R2=0 Testing collective influence of Xi Alternative hypothesis Test statistic-F distribution: A small P-value for H0: β =0 regression line has nonzero slope Lecture 11-Sudip R. Basu

  26. Regression Diagnostics Model Selection Procedures Selecting explanatory variables for a model-maxR2 Backward elimination-all significant coefficients Forward selection -adding variables Stepwise regression-drop variables if they loose their significance as other variables added Exploratory vs. Explanatory Research Examine the residuals Plotting Residuals against Explanatory variables Heteroskedasticity Lecture 11-Sudip R. Basu

  27. Detecting Influential (outlier) Observations Remove Outliers Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining DFFIT-effect on the fit of deleting observation The larger its absolute value, greater the influence that observation has on fitted values DFBETA-effect on the model parameter estimates of removing observation from dataset The larger the absolute value, the greater the influence of the observations on the parameter estimates Cook’s distance-effect that observation i has on all the predicted values Lecture 11-Sudip R. Basu

  28. Effects of multicollinearity Multicollinearity-Explanatory variables ‘overlap’ considerably and higher R2 values Multicollinearity inflates standard errors Variance inflation factor-multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors. The VIF ranges from 1.0 to infinity. VIFs greater than 10.0 are generally seen as indicative of severe multicollinearity. 1/VIF -Tolerance ranges from 0.0 to 1.0, with 1.0 being the absence of multicollinearity. Lecture 11-Sudip R. Basu

  29. Wish you all the very best!! • Presentation 1: 4 December 8.15am-10am • Group # 1-8 • 15 minutes maximum per group @ AJF-Villa Barton • Presentation 2: 4 December 4.15pm-6pm • Group # 9-16 • 15 minutes maximum per group @ AJF-Villa Barton • MDEV Exam: 7 December 4.30pm-5.30pm • @E1+E2/Bungener- Rothschild Be happyand enjoy numbers in life…. Lecture 11-Sudip R. Basu

  30. Lecture 2-Sudip R. Basu

More Related