Loading in 5 sec....

Analysis of Categorical DataPowerPoint Presentation

Analysis of Categorical Data

- By
**amish** - Follow User

- 134 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Analysis of Categorical Data' - amish

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Analysis of Categorical Data

Nick Jackson

University of Southern California

Department of Psychology

10/11/2013

Overview

- Data Types
- Contingency Tables
- Logit Models
- Binomial
- Ordinal
- Nominal

Things not covered (but still fit into the topic)

- Matched pairs/repeated measures
- McNemar’sChi-Square

- Reliability
- Cohen’s Kappa
- ROC

- Poisson (Count) models
- Categorical SEM
- TetrachoricCorrelation

- Bernoulli Trials

Data Types (Levels of Measurement)

Discrete/Categorical/Qualitative

Continuous/Quantitative

Nominal/Multinomial:

Rank Order/Ordinal:

Binary/Dichotomous/Binomial:

- Properties:
- Values arbitrary (no magnitude)
- No direction (no ordering)

- Example:
- Race: 1=AA, 2=Ca, 3=As

- Measures:
- Mode, relative frequency

- Properties:
- Values semi-arbitrary (no magnitude?)
- Have direction (ordering)

- Example:
- Lickert Scales (LICK-URT):
- 1-5, Strongly Disagree to Strongly Agree

- Measures:
- Mode, relative frequency, median
- Mean?

- Properties:
- 2 Levels
- Special case of Ordinal or Multinomial

- Examples:
- Gender (Multinomial)
- Disease (Y/N)

- Measures:
- Mode, relative frequency,
- Mean?

Contingency Tables

- Often called Two-way tables or Cross-Tab
- Have dimensions I x J
- Can be used to test hypotheses of association between categorical variables

Contingency Tables: Test of Independence

- Chi-Square Test of Independence (χ2)
- Calculate χ2
- Determine DF: (I-1) * (J-1)
- Compare to χ2 critical value for given DF.

R1=156

R2=664

N=820

C1=265

C2=331

C3=264

Where: Oi = Observed Freq

Ei= Expected Freq

n= number of cells in table

Contingency Tables: Test of Independence

- Pearson Chi-Square Test of Independence (χ2)
- H0: No Association
- HA: Association….where, how?

- Not appropriate when Expected (Ei) cell size freq < 5
- Use Fisher’s Exact Chi-Square

R1=156

R2=664

N=820

C1=265

C2=331

C3=264

Contingency Tables

- 2x2

Disorder (Outcome)

Yes

No

a

b

Yes

a+b

c

d

Risk Factor/

Exposure

c+d

No

a+c

b+d

a+b+c+d

Contingency Tables:Measures of Association

Depression

Probability :

Contrasting Probability:

Yes

No

a=

25

b=

10

Individuals who used alcohol were 2.31 times more likely to have depression than those who do not use alcohol

35

Yes

c=

20

d=

45

Alcohol Use

Contrasting Odds:

Odds:

65

No

The odds for depression were 5.62 times greater in Alcohol users compared to nonusers.

45

55

100

Why Odds Ratios?

i=1 to 45

(20 + 45*i)

Depression

(45 + 55*i)

Yes

No

a=

25

b=

10*i

(25 + 10*i)

Yes

c=

20

d=

45*i

Alcohol Use

No

45

55*i

The GeneralizedLinear Model

- General Linear Model (LM)
- Continuous Outcomes (DV)
- Linear Regression, t-test, Pearson correlation, ANOVA, ANCOVA

- GeneralizedLinear Model (GLM)
- John Nelder and Robert Wedderburn
- Maximum Likelihood Estimation
- Continuous, Categorical, and Count outcomes.
- Distribution Family and Link Functions
- Error distributions that are not normal

Logistic Regression

- “This is the most important model for categorical response data” –Agresti (Categorical Data Analysis, 2nd Ed.)
- Binary Response
- Predicting Probability (related to the Probit model)
- Assume (the usual):
- Independence
- NOT Homoscedasticity or Normal Errors
- Linearity (in the Log Odds)
- Also….adequate cell sizes.

Logistic Regression

- The Model
- In terms of probability of success π(x)
- In terms of Logits (Log Odds)
- Logit transform gives us a linear equation

Logistic Regression: Example

The Output as Logits

- Logits: H0: β=0

Freq. Percent

Not Depressed 672 81.95

Depressed 148 18.05

- Conversion to Probability:

What does H0: β=0 mean?

- Conversion to Odds
- Also=0.1805/0.8195=0.22

Logistic Regression: Example

- The Output as ORs
- Odds Ratios: H0: β=1
- Conversion to Probability:
- Conversion to Logit (log odds!)
- Ln(OR) = logit
- Ln(0.220)=-1.51

Freq. Percent

Not Depressed 672 81.95

Depressed 148 18.05

Logistic Regression: Example

Logistic Regression w/ Single Continuous Predictor:

AS LOGITS:

Interpretation:

A 1 unit increase in age results in a 0.013 increase in the log-odds of depression.

Hmmmm….I have no concept of what a log-odds is. Interpret as something else.

Logit > 0 so as age increases the risk of depression increases.

OR=e^0.013 = 1.013

For a 1 unit increase in age, there is a 1.013 increase in the odds of depression.

We could also say: For a 1 unit increase in age there is 1.3% increase in the odds of depression[ (1-OR)*100 % change]

Logistic Regression: GOF

- Overall Model Likelihood-Ratio Chi-Square
- Omnibus test for the model
- Overall model fit?
- Relative to other models

- Compares specified model with Null model (no predictors)
- Χ2=-2*(LL0-LL1), DF=K parameters estimated

Logistic Regression: GOF (Summary Measures)

- Pseudo-R2
- Not the same meaning as linear regression.
- There are many of them (Cox and Snell/McFadden)
- Only comparable within nested models of the same outcome.

- Hosmer-Lemeshow
- Models with Continuous Predictors
- Is the model a better fit than the NULL model. X2
- H0: Good Fit for Data, so we want p>0.05
- Order the predicted probabilities, group them (g=10) by quantiles, Chi-Square of Group * Outcome using. Df=g-2
- Conservative (rarely rejects the null)

- Pearson Chi-Square
- Models with categorical predictors
- Similar to Hosmer-Lemeshow

- ROC-Area Under the Curve
- Predictive accuracy/Classification

Logistic Regression: GOF(Diagnostic Measures)

- Outliers in Y (Outcome)
- Pearson Residuals
- Square root of the contribution to the Pearson χ2

- Deviance Residuals
- Square root of the contribution to the likeihood-ratio test statistic of a saturated model vs fitted model.

- Pearson Residuals
- Outliers in X (Predictors)
- Leverage (Hat Matrix/Projection Matrix)
- Maps the influence of observed on fitted values

- Leverage (Hat Matrix/Projection Matrix)
- Influential Observations
- Pregibon’s Delta-Beta influence statistic
- Similar to Cook’s-D in linear regression

- Detecting Problems
- Residuals vs Predictors
- Leverage VsResiduals
- Boxplot of Delta-Beta

Logistic Regression: GOF

L-R χ2 (df=1): 2.47, p=0.1162

H-L GOF:

Number of Groups: 10

H-L Chi2: 7.12

DF: 8

P: 0.5233

McFadden’s R2: 0.0030

Logistic Regression: Diagnostics

- Linearity in the Log-Odds
- Use a lowess (loess) plot
- Depressed vs Age

Logistic Regression: Example

Logistic Regression w/ Single Categorical Predictor:

AS OR:

Interpretation:

The odds of depression are 0.299 times lower for males compared to females.

We could also say: The odds of depression are (1-0.299=.701) 70.1% less in males compared to females.

Or…why not just make males the reference so the OR is positive. Or we could just take the inverse and accomplish the same thing. 1/0.299 = 3.34.

Ordinal Logistic Regression

- Also called Ordered Logistic or Proportional Odds Model
- Extension of Binary Logistic Model
- >2 Ordered responses
- New Assumption!
- Proportional Odds
- BMI3GRP (1=Normal Weight, 2=Overweight, 3=Obese)
- The predictors effect on the outcome is the same across levels of the outcome.
- Bmi3grp (1 vs 2,3) = B(age)
- Bmi3grp (1,2 vs 3) = B(age)

- Proportional Odds

Ordinal Logistic Regression

- The Model
- A latent variable model (Y*)
- j= number of levels-1
- From the equation we can see that the odds ratio is assumed to be independent of the category j

Ordinal Logistic Regression Example

AS LOGITS:

For a 1 unit increase in Blood Pressure there is a 0.012 increase in the log-odds of being in a higherbmi category

AS OR:

For a 1 unit increase in Blood Pressure the odds of being in a higher bmi category are 1.012 times greater.

Ordinal Logistic Regression: GOF

- Assessing Proportional Odds Assumptions
- Brant Test of Parallel Regression
- H0: Proportional Odds, thus want p >0.05
- Tests each predictor separately and overall

- Score Test of Parallel Regression
- H0: Proportional Odds, thus want p >0.05

- Approx Likelihood-ratio test
- H0: Proportional Odds, thus want p >0.05

- Brant Test of Parallel Regression

Ordinal Logistic Regression: GOF

- Pseudo R2
- Diagnostics Measures
- Performed on the j-1 binomial logistic regressions

Multinomial Logistic Regression

- Also called multinomial logit/polytomous logistic regression.
- Same assumptions as the binary logistic model
- >2 non-ordered responses
- Or You’ve failed to meet the parallel odds assumption of the Ordinal Logistic model

Multinomial Logistic Regression

- The Model
- j= levels for the outcome
- J=reference level
- where x is a fixed setting of an explanatory variable
- Notice how it appears we are estimating a Relative Risk and not an Odds Ratio. It’s actually an OR.
- Similar to conducting separate binary logistic models, but with better type 1 error control

Multinomial Logistic Regression Example

Does degree of supernatural belief indicate a religious preference?

AS OR:

For a 1 unit increase in supernatural belief, there is a (1-OR= %change) 21.8% increase in the probability of being an Evangelical compared to Catholic.

Multinomial Logistic Regression GOF

- Limited GOF tests.
- Look at LR Chi-square and compare nested models.
- “Essentially, all models are wrong, but some are useful” –George E.P. Box

- Pseudo R2
- Similar to Ordinal
- Perform tests on the j-1 binomial logistic regressions

Resources

“Categorical Data Analysis” by Alan Agresti

UCLA Stat Computing:

http://www.ats.ucla.edu/stat/

Download Presentation

Connecting to Server..