Advanced models and methods in behavioral research
This presentation is the property of its rightful owner.
Sponsored Links
1 / 75

Advanced Models and Methods in Behavioral Research PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Advanced Models and Methods in Behavioral Research. Chris Snijders [email protected] 3 ects http://www.chrissnijders.com/ammbr (=studyguide) literature: Field book + separate course material laptop exam (+ assignments). ToDo ( if not done yet ): Enroll in 0a611.

Download Presentation

Advanced Models and Methods in Behavioral Research

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Advanced models and methods in behavioral research

Advanced Models and Methods in Behavioral Research

  • Chris Snijders

  • [email protected]

  • 3 ects

  • http://www.chrissnijders.com/ammbr (=studyguide)

  • literature: Field book + separate course material

  • laptop exam (+ assignments)

ToDo

(ifnotdoneyet):

Enroll in 0a611

Advanced Methods and Models in Behavioral Research –


The methods package

The methods package

  • MMBR (6 ects)

    • Blumberg: questions, reliability, validity, research design

    • Field: SPSS: factor analysis, multiple regression, ANcOVA, sample sizeetc

  • AMMBR (3 ects)

    - Field (1 chapter): logististicregression

    - literaturethrough website:

    conjoint analysis multi-level regression

Advanced Methods and Models in Behavioral Research –


Models and methods topics

Models and methods: topics

  • t-test, Cronbach's alpha, etc

  • multiple regression, analysis of (co)varianceand factor analysis

  • logisticregression

  • conjoint analysis / repeatedmeasures

    • Stata next to SPSS

    • “Finding new questions”

    • Some data collection

      In the background:

      “now you should be able to deal with dataon your own”

Advanced Methods and Models in Behavioral Research –


Methods in brief 1

Methods in brief (1)

  • Logisticregression: target Y, predictorsXi.

    Y is a binaryvariable (0/1).

    - Whynotjust multiple regression?

    - Interpretation is more difficult

    - goodness of fit is non-standard

    - ...

    (andit is a chapter in Field)

Advanced Methods and Models in Behavioral Research –


Methods in brief 2

Methods in brief (2)

  • Conjoint analysis

    Underlying assumption: for

    each user, the "utility" of an

    offercan be written as

    U(x1,x2, ... , xn) = c0 + c1 x1 + ... + cn xn

  • 10 Euro p/m

  • 2 years fixed

  • free phone

  • ...

  • How attractive is this

  • offer to you?

Advanced Methods and Models in Behavioral Research –


Conjoint analysis as an in between method

Conjoint analysis as an “in between method”

Between

Which phone do you like and why?

What would your favorite phone be?

And:

Let’s keep track of what people buy.

We have:

Advanced Methods and Models in Behavioral Research –


Local master thesis example

Local Master Thesis example:

Fiber to the home

Speed: really fast

Price: sort of high

Installation: free!

Your neighbors:are in!

How attractive is this to you?

(RoelSchuring)

Advanced Methods and Models in Behavioral Research –


Coming up with new ideas 3

Coming up with new ideas (3)

“More research is necessary”

But on what?

YOU: come up with sensible new ideas, given previous research

Advanced Methods and Models in Behavioral Research –


Stata next to spss

Stata next to SPSS

  • It’s just better (faster, better written, more possibilities, better programmable …)

  • Multi-level regression is much easier than in SPSS

  • It’s good to be exposed to more than just a single statistics package (your knowledge should not be based on “where to click” arguments)

  • More stable

  • BTW Supports OSX as well… (anybody?)

Advanced Methods and Models in Behavioral Research –


Every advantage has a disadvantage

Every advantage has a disadvantage

  • Output less “polished”

  • It takes some extra work to get you started

  • The Logistic Regression chapter in the Field book uses SPSS (but still readable for the larger part)

  • (and it’s not campus software, but subfaculty software)

  • Installation …

Advanced Methods and Models in Behavioral Research –


If on windows try downloading

If on Windows, try downloading

  • www.chrissnijders.com/ammbr/TUeStata12-zip.exe

Advanced Methods and Models in Behavioral Research –


Logistic regression analysis

Logistic Regression Analysis

That is: your Y variable is 0/1: Now what?

Credit where credit is due:

slides adapted from Gerrit Rooks


The main points

The main points

  • Why do we have to know and sometimes use logistic regression?

  • What is the underlying model? What is maximum likelihood estimation?

  • Logistics of logistic regression analysis

    • Estimate coefficients

    • Assess model fit

    • Interpret coefficients

    • Check residuals

  • An SPSS example


Advanced models and methods in behavioral research

Advanced Methods and Models in Behavioral Research


Advanced models and methods in behavioral research

Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD)


A graphic representation of the data

A graphic representation of the data

CHD

Age


Let s just try regression analysis

Let’s just try regression analysis

pr(CHD|age) = -.54 +.022*Age


Linear regression is not a suitable model for probabilities

... linear regression is not a suitable model for probabilities

pr(CHD|age) = -.54 +.0218107*Age


In this graph for 8 age groups i plotted the probability of having a heart disease proportion

In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)


A nonlinear model is probably better here

A nonlinear model is probably better here


Something like this

Something like this


This is the logistic regression model

This is the logistic regression model


Predicted probabilities are always between 0 and 1

Predicted probabilities are always between 0 and 1

similar to classic regression

analysis


Side note this is similar to mmbr

Side note: this is similar to MMBR …

Suppose Y is a percentage (so between 0 and 1).

Then consider

…which will ensure that the estimated Y will vary between 0 and 1

and after some rearranging this is the same as

Advanced Methods and Models in Behavioral Research –


Continued

… (continued)

  • And one “solution” might be:

    • Change all Y values that are 0 to 0.001

    • Change all Y values that are 1 to 0.999

  • Now run regression on log(Y/(1-Y)) …

  • … but that really is sort of higgledy-piggledy …

Advanced Methods and Models in Behavioral Research –


Logistics of logistic regression

Logistics of logistic regression

  • How do we estimate the coefficients?

  • How do we assess model fit?

  • How do we interpret coefficients?

  • How do we check regression assumptions?


Kinds of estimation in regression

Kinds of estimation in regression

  • Ordinary Least Squares (we fit a line through a cloud of dots)

  • Maximum likelihood (we find the parameters that are the most likely, given our data)

    We never bothered to consider maximum likelihood in standard multiple regression, because you can show that they lead to exactly the same estimator (in MR, that is, normally they differ).

    Actually, maximum likelihood has superior statistical properties (efficiency, consistency, invariance, …)

Advanced Methods and Models in Behavioral Research –


Maximum likelihood estimation

Maximum likelihood estimation

  • Method of maximum likelihood yields values for the unknown parameters that maximize the probability of obtaining the observed set of data

Unknown parameters


Maximum likelihood estimation1

Maximum likelihood estimation

  • First we have to construct the “likelihood function” (probability of obtaining the observed set of data).

Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn)

Assuming that observations are independent


Log likelihood

Log-likelihood

  • For technical reasons the likelihood is transformed in the log-likelihood (then you just maximize the sum of the logged probabilities)

LL= ln[pr(obs1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obsn)]


Some subtleties

Some subtleties

  • In OLS, we did not need stochastic assumptions to be able to calculate a best-fitting line (only for the estimates of the confidence intervals we need that). With maximum likelihood estimation we need this from the start

    (and let us not be bothered at this point by how the confidence intervals are calculated in maximum likelihood)

Advanced Methods and Models in Behavioral Research –


Note optimizing log likelihoods is difficult

Note: optimizing log-likelihoods is difficult

  • It’s iterative (“searching the landscape”)

     it might not converge

     it might converge to the wrong answer

Advanced Methods and Models in Behavioral Research –


Nasty implication extreme cases should be left out

Nasty implication: extreme cases should be left out

(some handwaving here)

Advanced Methods and Models in Behavioral Research –


Spss output

SPSS output

Advanced Methods and Models in Behavioral Research –


Estimation of coefficients spss results

Estimation of coefficients: SPSS Results


Advanced models and methods in behavioral research

This function fits best: other values of b0 and b1 give worse results (that is, other values have a smaller likelihood value)


Illustration 1 suppose we chose 05x instead of 11x

Illustration 1: suppose we chose .05X instead of .11X


Illustration 2 suppose we chose 40x instead of 11x

Illustration 2: suppose we chose .40X instead of .11X


Logistics of logistic regression1

Logistics of logistic regression

  • Estimate the coefficients (and their conf.int.)

  • Assess model fit

    • Between model comparisons

    • Pseudo R2 (similar to multiple regression)

    • Predictive accuracy

  • Interpret coefficients

  • Check regression assumptions


Model fit comparisons between models

Model fit: comparisons between models

The log-likelihood ratio test statistic can be used to test the fit of a model

full model

reduced model

The test statistic has a

chi-square distribution

NOTE This is sort of similar to the variance decomposition tables you see in MR!


Advanced models and methods in behavioral research

Advanced Methods and Models in Behavioral Research


Between model comparisons the likelihood ratio test

Between model comparisons: the likelihood ratio test

full model

reduced model

The model including only an intercept

Is often called the empty model. SPSS uses this model as a default.


Between model comparison spss output

Between model comparison: SPSS output

This is the test statistic,

and it’s associated

significance


Overall model fit pseudo r 2

Just like in multiple regression, pseudo R2 ranges 0.0 to 1.0

Cox and Snell

cannot theoretically reach 1

Nagelkerke

adjusted so that it can reach 1

Overall model fitpseudo R2

log-likelihood of the model

that you want to test

log-likelihood of model

before any predictors were

entered

NOTE: R2 in logistic regression tends to be (even) smaller than in multiple regression


Overall model fit classification table

Overall model fit: Classification table

We predict 74% correctly


Overall model fit classification table1

Overall model fit: Classification table

14 cases had a CHD while according to our model

this shouldnt have happened


Overall model fit classification table2

Overall model fit: Classification table

12 cases didn’t have a CHD while according to our model

this should have happened


Logistics of logistic regression2

Logistics of logistic regression

  • Estimate the coefficients

  • Assess model fit

  • Interpret coefficients

    • Direction

    • Significance

    • Magnitude

  • Check regression assumptions


The odds ratio

TheOdds Ratio

We had:

And after some rearranging we can get


Magnitude of association percentage change in odds

Magnitude of association: Percentage change in odds


Interpreting coefficients direction

Interpreting coefficients: direction

original b reflects changes in logit: b>0 implies positive relationship

exponentiated b reflects the “changes in odds”: exp(b) > 1 implies a positive relationship

52


3 interpreting coefficients magnitude

3. Interpreting coefficients: magnitude

The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful.

exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect

53


Magnitude of association

For the age variable:

Percentage change in odds = (exponentiated coefficient – 1) * 100 = 12%, or “the odds times 1,117”

A one unit increase in age will result in 12% increase in the odds that the person will have a CHD

So if a soccer player is one year older, the odds that (s)he will have CHD is 12% higher

Magnitude of association

Ref=1

Ref=0


Another way to get an idea of the size of effects calculating predicted probabilities

Another way to get an idea of the size of effects: Calculating predicted probabilities

For somebody of 20 years old, the predicted probability is .04

For somebody of 70 years old, the predicted probability is .91


But this gets more complicated when you have more than a single x variable

But this gets more complicatedwhen you have more than a single X-variable

(see blackboard)

Conclusion: if you consider the effect of a variable on the predicted probability, the size of the effect of X1 depends on the value of X2! (yuck!)

Advanced Methods and Models in Behavioral Research –


Testing significance of coefficients

Testing significance of coefficients

  • In linear regression analysis this statistic is used to test significance

  • In logistic regression something similar exists

  • however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely)

estimate

t-distribution

standard error of estimate

Note: This is not the WaldStatistic SPSS presents!!!


Interpreting coefficients significance

Interpreting coefficients: significance

SPSS presents

While Andy Field thinks SPSS presents this (at least in the 2nd version of the book):


Advanced models and methods in behavioral research

Advanced Methods and Models in Behavioral Research –


Logistics of logistic regression3

Logistics of logistic regression

  • Estimate the coefficients

  • Assess model fit

  • Interpret coefficients

  • Check regression assumptions


Checking assumptions

Checking assumptions

  • Influential data points & Residuals

    • Follow Samanthas tips

  • Hosmer & Lemeshow

    • Divides sample in subgroups

    • Checks whether there are differences between observed and predicted between subgroups

    • Test should not be significant, if so: indication of lack of fit


Hosmer lemeshow

Hosmer & Lemeshow

Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups

Test should not be significant (indicating no difference)


Examining residuals in logistic regression

Examining residuals in logistic regression

  • Isolate points for which the model fits poorly

  • Isolate influential data points


Residual statistics field s rules of thumb

Residual statistics: Field’s rules of thumb


Advanced models and methods in behavioral research

Time for a summary …

Advanced Methods and Models in Behavioral Research –


Logistic regression

Logistic regression

  • Y = 0/1

  • Multiple regression (or ANcOVA) is not right

  • You consider either the odds or the log(odds)

  • It is estimated through “maximum likelihood”

  • Interpretation is a bit more complicated than normal

  • Assumption testing is a bit more concrete than in multiple regression

Advanced Methods and Models in Behavioral Research –


Advanced models and methods in behavioral research

Advanced Methods and Models in Behavioral Research

  • Make sureto

  • enroll in studyweb (0a611)

  • Read the Field chapter on logisticregression

  • Go through the slides as well

  • Bringyour laptop next time: we’ll go through a logisticregression in Stata

Advanced Methods and Models in Behavioral Research – 2008/200968

Advanced Methods and Models in Behavioral Research –


Illustration with spss without the outlier part

Illustration with SPSS (without the outlier part)

  • Penalty kicks data, variables:

    • Scored: outcome variable,

      • 0 = penalty missed, and 1 = penalty scored

    • Pswq: degree to which a player worries

    • Previous: percentage of penalties scored by a particular player in their career


Advanced models and methods in behavioral research

SPSS OUTPUT Logistic Regression

Tells you something

about the number of

observations and

missings


Advanced models and methods in behavioral research

this table is based on

the empty model, i.e. only

the constant in the model

Block 0: Beginning Block

these variables

will be entered

in the model

later on


Advanced models and methods in behavioral research

Block is useful to check significance of individual coefficients, see Field

Block 1: Method = Enter

this is the test statistic

Note: Nagelkerke

is larger than Cox

after dividing by -2

New model


Advanced models and methods in behavioral research

Block 1: Method = Enter (Continued)

Predictive accuracy has improved (was 53%)

significance

based on

Wald statistic

estimates

change in odds

standard error

estimates


Advanced models and methods in behavioral research

How is the classification table constructed?

# cases not predicted

corrrectly

# cases not predicted

corrrectly


Advanced models and methods in behavioral research

How is the classification table constructed?


Advanced models and methods in behavioral research

How is the classification table constructed?


  • Login