1 / 30

# Qunatitative Methods in Social Sciences (E774) - PowerPoint PPT Presentation

Qunatitative Methods in Social Sciences (E774). Sudip Ranjan BASU , Ph.D 27 November 2009. Model Selection Procedures. Selecting explanatory variables for a model Maximum R 2 Backward elimination all significant coefficients Forward selection adding variables Stepwise regression

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Qunatitative Methods in Social Sciences (E774)' - bart

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Qunatitative Methods in Social Sciences (E774)

Sudip Ranjan BASU,Ph.D

27 November 2009

Selecting explanatory variables for a model

Maximum R2

Backward elimination

all significant coefficients

Forward selection

Stepwise regression

drop variables if they loose their significance as other variables added

Exploratory vs. Explanatory Research

Part -1

Lecture 11-Sudip R. Basu

2

Regression function follows a linear relationship

Conditional distribution of Y (dependent variable) follows a normal distribution

Homoscedasticity: Conditional distribution of Y has constant standard deviation throughout the range of values of the explanatory variables.

Sample is randomly selected

Lecture 11-Sudip R. Basu

3

Examine the residuals

Plotting Residuals against Explanatory variables

Lecture 11-Sudip R. Basu

4

Estimation of regression model

Obtain Residuals

Plotting Residuals against Fitter values or one or more Explanatory variables

Plotting Residuals against Fitter values or one or more Explanatory variables

Breusch-Pagan/Cook-Weisberg test

Ho=Constant variance

P value (Chi-square test)=0.00

Reject constant variance hypothesis

Significant heteroskedasticity implies SE and Ho might be invalid

Lecture 11-Sudip R. Basu

5

Remove Outliers

Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining

DFFIT: effect on the fit of deleting observation

The larger its absolute value, greater the influence that observation has on fitted values

DFBETA: effect on the model parameter estimates of removing observation from dataset

The larger the absolute value, the greater the influence of the observations on the parameter estimates

Cook’s distance: effect that observation i has on all the predicted values

Lecture 11-Sudip R. Basu

6

Multicollinearity:

Explanatory variables ‘overlap’ considerably and higher R2 values

Multicollinearity inflates standard errors

Variance inflation factor:

multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors

Lecture 11-Sudip R. Basu

7

Response variable (y) is non-normal

y discrete binary variable (success or failure)

Logistic regression

Y discrete count variable (# of children)

Poisson and negative binomial distribution

Y continuous non-normal variable

Gamma distribution

Part -2: NOT PART OF FINAL EXAM on 7th December

Lecture 11-Sudip R. Basu

8

Use of explanatory variables

Lecture 11-Sudip R. Basu

9

Y can have a distribution other than the normal

GLM can model a function of the mean

GLM not need to transform data

Maximum likelihood applied to GLM

Choose most appropriate probability distribution of y variable

Lecture 11-Sudip R. Basu

10

If data nonlinear, then OLS may underestimate result

Prediction may poorly approximate the true regression curve

Two approaches to handle

Polynomial regression

Loglinear regression

Lecture 11-Sudip R. Basu

11

A polynomial regression function, y response and x explanatory variable

Cubic regression model

Lecture 11-Sudip R. Basu

12

Fitting model without assuming particular functional forms

No (fewer) assumptions of functional and distribution of y

Plot of a fitted nonparametic regression model to learn about (overall) trends in data

Generalised additive model, when GLM is the special case if these functions are linear

Lecture 11-Sudip R. Basu

13

Y is an exponential function

Taking logarithm of exponential function

Interpreting ‘Multiplicative’ not linear coefficients

E(y) changes by the same percentage for each unit increase in X

Lecture 11-Sudip R. Basu

14

Y –categorical variable with two possible outcomes

Binary response variable (1 or 0), P(y=1)

Binomial distribution

Linear probability model

Single explanatory variable

Lecture 11-Sudip R. Basu

15

Binary response y variable

Curvilinear relationship

Odds ratio

Logistic transformation

Logistic regression model

For β>0, P(y=1) increases as X increases

For β<0, P(y=1) decreases as X increases

For β=0, P(y=1) does not change as X increases

Logistic regression for probabilities

Lecture 11-Sudip R. Basu

16

Categorical data, discrete variable

Each observation falls into 2 categories

Probabilities for two categories are same for each observation

Category 1: π, Category 2: 1- π

Outcomes of successive observations are independent

Lecture 11-Sudip R. Basu

17

BD perfectly symmetric if π=0.50, otherwise skewed

Skewness increases as π gets closer to 0 or 1

Sample proportion

Mean , standard deviation

Binomial test

Lecture 11-Sudip R. Basu

18

LRM with multiple predictors

LRM probabilities

For 2 predictors

Odds ratio-log of odds, multiplicative

Probabilities of predictors impact

Lecture 11-Sudip R. Basu

19

Bivariate logistic regression

Ho=0, x has no effect on P(y=1)

Use z-distribution, except for small sample

Wald statistic, chi-square dist, df=1

Likelihood-ratio test: extra parameters in full model is equal to 0

LR Test statistic:

For large samples, W and LR similar results

For small sample, use LR test results

Lecture 11-Sudip R. Basu

20

Y –categorical variable with standard normal probability distribution

Probit score/index

A one-unit increase in x leads to increasing the probit score by ‘b’ standard deviations.

Probit model

Probabilities

Lecture 11-Sudip R. Basu

21

Week 8- Week 12

Lecture 11-Sudip R. Basu

Correlation shows the association between two or more variables

Degree of relationship

Degree of covariability

3 key issues

If relationship exists? And how to measure

Testing the significance

Exploring cause and effect relation

Correlation:

Positive or negative

Depends on direction of change of variables. If both variables are varying in same direction, then positive correlation.

Simple, partial and multiple

Simple if two variables are studied

Partial/multiple if three or more variables are studied

Linear and non-linear

Depends on the constancy of the ratio of change between the variables. If the amount of change in one variable tends to bear a constant ration to the amount of change in the other variable, then linear relationship.

Scatter diagram

A way to see if two variables are related, represented by a dot chart

Graphic method

Two variables observations are plotted by looking at their direction and closeness

Lecture 11-Sudip R. Basu

Correlation coefficient (Pearson’s r)

Describes degree of correlation between two variables

Interpreting r (-1«r«+1)

If r=+1: perfect positive relationship

If r=-1: perfect negative relationship

If r=0: no relationship

Coefficient of determination:

explained variation/total variation (>less value is not good!!)

Rank Correlation:

Using orders/ranks of observations rather than actual observations between two variables

Steps to compute « r »:

Compute deviations of observations of two variables from their respective mean values

Square these deviations and obtain sum of squared deviations

Multiply the deviations of observations of two variables and obtain total

Substitute these values in the formula above:

Lecture 11-Sudip R. Basu

Regression function is a mathematical function that describes how mean of Y changes according to the value of X

β is regression coefficient

σ Conditional standard deviation

Estimate

R2 of predicted equation

H0: β =0 –variables are statistically independent

Test statistic:

Standard error of b:

Multiple regression function

Slope in mrf describes the effect of an explanatory variable while controlling effects of other explanatory variables in the model

β1 and β2are partial regression coefficients

R-squared (0,1) Coefficient of multiple determinations

If R2=1

If R2=0

Testing collective influence of Xi

Alternative hypothesis

Test statistic-F distribution:

A small P-value for H0: β =0

regression line has nonzero slope

Lecture 11-Sudip R. Basu

Model Selection Procedures

Selecting explanatory variables for a model-maxR2

Backward elimination-all significant coefficients

Stepwise regression-drop variables if they loose their significance as other variables added

Exploratory vs. Explanatory Research

Examine the residuals

Plotting Residuals against Explanatory variables

Heteroskedasticity

Lecture 11-Sudip R. Basu

Remove Outliers

Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining

DFFIT-effect on the fit of deleting observation

The larger its absolute value, greater the influence that observation has on fitted values

DFBETA-effect on the model parameter estimates of removing observation from dataset

The larger the absolute value, the greater the influence of the observations on the parameter estimates

Cook’s distance-effect that observation i has on all the predicted values

Lecture 11-Sudip R. Basu

Multicollinearity-Explanatory variables ‘overlap’ considerably and higher R2 values

Multicollinearity inflates standard errors

Variance inflation factor-multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors.

The VIF ranges from 1.0 to infinity. VIFs greater than 10.0 are generally seen as indicative of severe multicollinearity.

1/VIF -Tolerance ranges from 0.0 to 1.0, with 1.0 being the absence of multicollinearity.

Lecture 11-Sudip R. Basu

• Presentation 1: 4 December 8.15am-10am

• Group # 1-8

• 15 minutes maximum per group @ AJF-Villa Barton

• Presentation 2: 4 December 4.15pm-6pm

• Group # 9-16

• 15 minutes maximum per group @ AJF-Villa Barton

• MDEV Exam: 7 December 4.30pm-5.30pm

• @E1+E2/Bungener- Rothschild

Be happyand enjoy numbers in life….

Lecture 11-Sudip R. Basu