Qunatitative methods in social sciences e774
Download
1 / 30

Qunatitative Methods in Social Sciences (E774) - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

Qunatitative Methods in Social Sciences (E774). Sudip Ranjan BASU , Ph.D 27 November 2009. Model Selection Procedures. Selecting explanatory variables for a model Maximum R 2 Backward elimination all significant coefficients Forward selection adding variables Stepwise regression

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Qunatitative Methods in Social Sciences (E774)' - bart


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Qunatitative methods in social sciences e774

Qunatitative Methods in Social Sciences (E774)

Sudip Ranjan BASU,Ph.D

27 November 2009


Model selection procedures
Model Selection Procedures

Selecting explanatory variables for a model

Maximum R2

Backward elimination

all significant coefficients

Forward selection

adding variables

Stepwise regression

drop variables if they loose their significance as other variables added

Exploratory vs. Explanatory Research

Part -1

Lecture 11-Sudip R. Basu

2


Regression diagnostics
Regression Diagnostics

Regression function follows a linear relationship

Conditional distribution of Y (dependent variable) follows a normal distribution

Homoscedasticity: Conditional distribution of Y has constant standard deviation throughout the range of values of the explanatory variables.

Sample is randomly selected

Lecture 11-Sudip R. Basu

3


Checking residuals
Checking Residuals

Examine the residuals

Plotting Residuals against Explanatory variables

Lecture 11-Sudip R. Basu

4


Heteroskedasticity
Heteroskedasticity

Estimation of regression model

Obtain Residuals

Plotting Residuals against Fitter values or one or more Explanatory variables

Plotting Residuals against Fitter values or one or more Explanatory variables

Breusch-Pagan/Cook-Weisberg test

Ho=Constant variance

P value (Chi-square test)=0.00

Reject constant variance hypothesis

Significant heteroskedasticity implies SE and Ho might be invalid

Lecture 11-Sudip R. Basu

5


Outliers influential observations
Outliers: Influential Observations?

Remove Outliers

Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining

DFFIT: effect on the fit of deleting observation

The larger its absolute value, greater the influence that observation has on fitted values

DFBETA: effect on the model parameter estimates of removing observation from dataset

The larger the absolute value, the greater the influence of the observations on the parameter estimates

Cook’s distance: effect that observation i has on all the predicted values

Lecture 11-Sudip R. Basu

6


Multicollinearity
Multicollinearity:

Multicollinearity:

Explanatory variables ‘overlap’ considerably and higher R2 values

Multicollinearity inflates standard errors

Variance inflation factor:

multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors

Lecture 11-Sudip R. Basu

7


Generalised linear model
Generalised Linear Model

Response variable (y) is non-normal

y discrete binary variable (success or failure)

Logistic regression

Y discrete count variable (# of children)

Poisson and negative binomial distribution

Y continuous non-normal variable

Gamma distribution

Part -2: NOT PART OF FINAL EXAM on 7th December

Lecture 11-Sudip R. Basu

8


Link function
Link function

Link function is g(μ)

Identity link

Log link

Logistic link

Use of explanatory variables

Lecture 11-Sudip R. Basu

9


Ols as special case of glm
OLS as special case of GLM

Y can have a distribution other than the normal

GLM can model a function of the mean

GLM not need to transform data

Maximum likelihood applied to GLM

Choose most appropriate probability distribution of y variable

Lecture 11-Sudip R. Basu

10


Nonlinear relationship
Nonlinear relationship

If data nonlinear, then OLS may underestimate result

Prediction may poorly approximate the true regression curve

Two approaches to handle

Polynomial regression

Loglinear regression

Lecture 11-Sudip R. Basu

11


Quadratic regression models
Quadratic regression models

A polynomial regression function, y response and x explanatory variable

Quadratic regression model

Cubic regression model

Lecture 11-Sudip R. Basu

12


Nonparametric regression
Nonparametric regression

Fitting model without assuming particular functional forms

No (fewer) assumptions of functional and distribution of y

Plot of a fitted nonparametic regression model to learn about (overall) trends in data

Generalised additive model, when GLM is the special case if these functions are linear

Lecture 11-Sudip R. Basu

13


Exponential regression
Exponential regression

Y is an exponential function

Taking logarithm of exponential function

Interpreting ‘Multiplicative’ not linear coefficients

E(y) changes by the same percentage for each unit increase in X

Lecture 11-Sudip R. Basu

14


Logistic regression
Logistic Regression

Y –categorical variable with two possible outcomes

Binary response variable (1 or 0), P(y=1)

Binomial distribution

Linear probability model

Single explanatory variable

Lecture 11-Sudip R. Basu

15


Binary response y variable
Binary response y variable

Curvilinear relationship

Odds ratio

Logistic transformation

Logistic regression model

For β>0, P(y=1) increases as X increases

For β<0, P(y=1) decreases as X increases

For β=0, P(y=1) does not change as X increases

Logistic regression for probabilities

Lecture 11-Sudip R. Basu

16


Binomial probability distribution
Binomial probability distribution

Categorical data, discrete variable

Each observation falls into 2 categories

Probabilities for two categories are same for each observation

Category 1: π, Category 2: 1- π

Outcomes of successive observations are independent

Lecture 11-Sudip R. Basu

17


Properties of binomial distributions
Properties of binomial distributions

BD perfectly symmetric if π=0.50, otherwise skewed

Skewness increases as π gets closer to 0 or 1

Sample proportion

Mean , standard deviation

Binomial test

Lecture 11-Sudip R. Basu

18


Multiple logistic regression
Multiple Logistic Regression

LRM with multiple predictors

LRM probabilities

For 2 predictors

Odds ratio-log of odds, multiplicative

Probabilities of predictors impact

Lecture 11-Sudip R. Basu

19


Inference for lrm
Inference for LRM

Bivariate logistic regression

Ho=0, x has no effect on P(y=1)

Use z-distribution, except for small sample

Wald statistic, chi-square dist, df=1

Likelihood-ratio test: extra parameters in full model is equal to 0

LR Test statistic:

For large samples, W and LR similar results

For small sample, use LR test results

Lecture 11-Sudip R. Basu

20


Probit regression
Probit Regression

Y –categorical variable with standard normal probability distribution

Probit score/index

A one-unit increase in x leads to increasing the probit score by ‘b’ standard deviations.

Probit model

Probabilities

Lecture 11-Sudip R. Basu

21


Qunatitative methods in social sciences e774

Wrap-up

Week 8- Week 12

Lecture 11-Sudip R. Basu


Correlation theory
Correlation Theory

Correlation shows the association between two or more variables

Degree of relationship

Degree of covariability

3 key issues

If relationship exists? And how to measure

Testing the significance

Exploring cause and effect relation

Correlation:

Positive or negative

Depends on direction of change of variables. If both variables are varying in same direction, then positive correlation.

Simple, partial and multiple

Simple if two variables are studied

Partial/multiple if three or more variables are studied

Linear and non-linear

Depends on the constancy of the ratio of change between the variables. If the amount of change in one variable tends to bear a constant ration to the amount of change in the other variable, then linear relationship.

Scatter diagram

A way to see if two variables are related, represented by a dot chart

Graphic method

Two variables observations are plotted by looking at their direction and closeness

Lecture 11-Sudip R. Basu


Correlation coefficient
Correlation coefficient

Correlation coefficient (Pearson’s r)

Describes degree of correlation between two variables

Interpreting r (-1«r«+1)

If r=+1: perfect positive relationship

If r=-1: perfect negative relationship

If r=0: no relationship

Coefficient of determination:

explained variation/total variation (>less value is not good!!)

Rank Correlation:

Using orders/ranks of observations rather than actual observations between two variables

Steps to compute « r »:

Compute deviations of observations of two variables from their respective mean values

Square these deviations and obtain sum of squared deviations

Multiply the deviations of observations of two variables and obtain total

Substitute these values in the formula above:

Lecture 11-Sudip R. Basu


Linear regression model
Linear Regression Model

Regression function is a mathematical function that describes how mean of Y changes according to the value of X

β is regression coefficient

σ Conditional standard deviation

Estimate

R2 of predicted equation

H0: β =0 –variables are statistically independent

Test statistic:

Standard error of b:

Multiple regression function

Slope in mrf describes the effect of an explanatory variable while controlling effects of other explanatory variables in the model

β1 and β2are partial regression coefficients

R-squared (0,1) Coefficient of multiple determinations

If R2=1

If R2=0

Testing collective influence of Xi

Alternative hypothesis

Test statistic-F distribution:

A small P-value for H0: β =0

regression line has nonzero slope

Lecture 11-Sudip R. Basu


Regression diagnostics1
Regression Diagnostics

Model Selection Procedures

Selecting explanatory variables for a model-maxR2

Backward elimination-all significant coefficients

Forward selection -adding variables

Stepwise regression-drop variables if they loose their significance as other variables added

Exploratory vs. Explanatory Research

Examine the residuals

Plotting Residuals against Explanatory variables

Heteroskedasticity

Lecture 11-Sudip R. Basu


Detecting influential outlier observations
Detecting Influential (outlier) Observations

Remove Outliers

Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining

DFFIT-effect on the fit of deleting observation

The larger its absolute value, greater the influence that observation has on fitted values

DFBETA-effect on the model parameter estimates of removing observation from dataset

The larger the absolute value, the greater the influence of the observations on the parameter estimates

Cook’s distance-effect that observation i has on all the predicted values

Lecture 11-Sudip R. Basu


Effects of multicollinearity
Effects of multicollinearity

Multicollinearity-Explanatory variables ‘overlap’ considerably and higher R2 values

Multicollinearity inflates standard errors

Variance inflation factor-multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors.

The VIF ranges from 1.0 to infinity. VIFs greater than 10.0 are generally seen as indicative of severe multicollinearity.

1/VIF -Tolerance ranges from 0.0 to 1.0, with 1.0 being the absence of multicollinearity.

Lecture 11-Sudip R. Basu


Wish you all the very best
Wish you all the very best!!

  • Presentation 1: 4 December 8.15am-10am

    • Group # 1-8

    • 15 minutes maximum per group @ AJF-Villa Barton

  • Presentation 2: 4 December 4.15pm-6pm

    • Group # 9-16

    • 15 minutes maximum per group @ AJF-Villa Barton

  • MDEV Exam: 7 December 4.30pm-5.30pm

    • @E1+E2/Bungener- Rothschild

Be happyand enjoy numbers in life….

Lecture 11-Sudip R. Basu