Outline of Today’s Discussion

Outline of Today’s Discussion • Introduction to Discriminant Analysis • Assumptions for Discriminant Analysis • Discriminant Analysis in SPSS

Part 1 Introduction To Discriminant Analysis

Definition • Discriminant Analysis - A procedure for (1) determining whethertwo or more mutually exclusive groups can be distinguished from each other based on linear combinations of predictor variables, and (2) determining which variables contribute to that separation. • Generic Linear Discriminant Analysis Equation: Score = aX1 + bX2 + cX3 + dX4 + intercept • Why is the equation above said to be linear?

Definition • Generic Linear Discriminant Analysis Equation: Score = aX1 + bX2 + cX3 + dX4 + intercept • To the extent that the scores resulting from the Linear Discriminant Analysis (LDA) form two or more distinct (mutually exclusive) groups, the LDA is successful. • Example: Scores ranging from… 10 to 20  Group 1 60 to 70  Group 2

Definition • Additional LDAs could be used to sort scores on separate, ORTHOGONAL, dimensions of variability! • Dim 2 Score = eX5 + fX6 + gX7 + hX8 + intercept • Dim 3 Score = eX9 + fX10 + gX11 + hX12 + intercept • Dim 4 Score = iX13 + jX14 + kX15 + lX16 + intercept

Questions Addressed by LDA • What are the linear combinations of predictors whose average values best separate the groups? (On average, which combos work best?) • Are the linear combinations useful in actually classifying cases into groups? (What’s the accuracy?) • Is there a smaller subset of predictors that captures the differences between the groups? (What’s the parsimony?)

Sample LDA Research Questions • Predict which universities will fail financially, based on endowment size, debt, operating budget, and alumni support. • Predict which students should skip PSYC 100, based on SAT scores, grades in HS psyc classes, HS class rank, and Psyc AP score. • Predict which clients will relapse, based on education level, income, number of mental health hospitalizations, family stability, frequency of contact with drug / alcohol abusers. • Predict which children on the autistic spectrum will benefit from Applied Behavior Analysis (ABA), based on family SES, parental attachment scores, IQ, parental motivation, number of siblings.

Relation to Other Stats • LDA is similar to the ANOVA family of stats because… • It is a linear model. • It is based on means (it is parametric). • It can be univariate (anova) or multivariate (manova).

Relation to Other Stats • In the special case of prediction for exactly two groups, LDA is similar to multiple regression. • In the two-group case, the LDA coefficients and multiple regression coefficients will be proportional to each other (will differ from each other by only a simple multiplicative scaling factor) if the MR is computed with “Group” as the DV (criterion). This proportionality is only true in the 2-group case. • Informally, LDA is the “reverse” of MR and (M)ANOVA. In MR and (M)ANOVA, we typically use group as an IV, not a DV. In LDA, one or more DVs are used to predict group membership, a typical IV!

Relation to Other Stats • LDA is also similar to Factor Analysis in that both procedures use linear combinations of measured variables to “extract” (latent) ORTHOGONAL trends . • However, in LDA the researcher *must* explicitly identify variables as DVs or IVs. • By contrast, in Factor Analysis, variables are just variables; researchers do not identify DVs or IVs as such!

Part 2 Assumptions for Discriminant Analysis

Discriminant Analysis Assumptions • Perhaps counter-intuitively, a discriminant analysis requires a data set for which each participant’s group membership is already known. • So why bother “predicting” group membership if it’s already known? Answer: We use the current known- memberships to derive LDA equations for classifying future cases whose group membership is NOT known. (We’re building a model!)

Discriminant Analysis Assumptions • If the cases in your data set are not already assigned to groups, you can’t use LDA.  • Note: There is an alternate multivariate procedure called Cluster Analysis, which discerns groups (“clusters”) in data sets where group membership is NOT known. • We won’t do a cluster analysis (nor its mighty big-sibling, hierarchical cluster analysis) this semester. 

Discriminant Analysis Assumptions • LDA’s assumption set is similar to that of many other linear models in the ANOVA family. • Assumptions: Normalcy Equal Variance Independence Note: The LDA and other ANOVA-like linear models can be robust to violations of the normalcy assumption to the extent that the groups have equal n. No such luck with the other two assumptions.

Discriminant Analysis Assumptions • If your LDA data set does not meet the normalcy or equal variance assumptions, there is another statistical procedures that might help... • Logistic Regression can predict binary group membership, like LDA, but without assuming either normalcy or equal variance. • Note: If the assumptions underlying LDA are met, LDA is often more sensitive (more likely to produce significance) than LR is. We won’t learn LR this semester. 

Part 3 Discriminant Analysis In SPSS

Discriminant Analysis in SPSS • Assume we would like to build an LDA to address the issue of criminal recidivism. • We will use a data set to classify 20 current prison inmates based on whether they had or had not been previously incarcerated. • In our SPSS Variable View, we will use a “1” for the group that had NOT been previously incarcerated, and a “2” for prior incarceration. • We will use the values on the following variables for the prediction: An Alcohol Abuse Test A Drug Abuse Test An Anger Test

Discriminant Analysis in SPSS • Click Analyze, Classify,and Discriminant. • Click your group variable and move it to the Grouping Variable box. • Click Define Range, and enter your minimum & maximum values. • Click Continue. • Click on the variables to be used for prediction; move to Independent box. • Click Statistics and click Means and univariate ANOVAs. • Click all four options in the Matrices box. • Click continue and Classify. • Click All Groups Equal (if equal n, or group size proportional to N). Else, click “Compute Group from Group Sizes”. • Click Combined Groups in the Plot Box and Within Groups in the Covariance Matrix box. • Click Summary Table in the Display box and Continue. • Click Save and Predicted Group Membership. • Click Continue and OK.

Discriminant Analysis in SPSS First of Three Helpful Output Boxes (occurs in the “Discriminant” Section) Here, you might informally ask yourself, “based on the means, which variables distinguish the groups?”

Discriminant Analysis in SPSS Second of Three Helpful Output Boxes (occurs under “Analysis 1. Summary of Cannonical Discriminant Functions”) Wilks’ Lambda also occurs in MANOVA. It reflects the proportion of variance NOT explained by the model. Lower Wilks’ Lambda scores are better. Our model’s predictions are significantly better than chance! ( p = 0.021 )

Discriminant Analysis in SPSS Third of Three Helpful Output Boxes (occurs under “Classification Statistics”) Our model has 80% accuracy…beats a coin flip!

Discriminant Analysis in SPSS Hey Look! A New Variable Appears Automatically In Our Data View! (Better than Christmas!) The new “Dis_1” variable shows us what group (1 vs 2) was predicted for each case. Our Linear Discriminant Analysis misclassified cases 6,8,10, & 20.

Discriminant Analysis in SPSS • To summarize, we used a discriminant analysis to assess how well three variables (alcohol score, drug score, anger score) could predict whether an inmate had been previously incarcerated. • The Wilks’ Lambda was significant, meaning that our discriminant function produced a significant difference between the two target groups. • Our linear classifier (our LDA) “grouped” the inmates with an accuracy of 80%. • More generally, we can use LDA to predict membership in a wide range of groups…including clinically relevant DSM groups! (And there was much rejoicing!)

Outline of Today’s Discussion

Outline of Today’s Discussion

Presentation Transcript

TODAY’S DISCUSSION

Discussion Outline

Today’s outline

The Role of Consumer Diversity in Ecosystem Function

Today’s Discussion

Exiting the Great Depression: lessons for today

Today’s discussion 2/11/10

Objectives of today’s discussion

“Why I Am a Writer” Post-Reading Discussion and Chapter Six: Drafting and Revising

Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

Outline for today

Outline For Today’s Discussion

Outline for Today’s Lecture

Today : A Discussion Of Government

Today ’ s Discussion

TODAY’S DISCUSSION

Today’s Discussion

DISCUSSION

bxxxxx Quiz and Discussion Day

Today our topic of discussion will be: Division by 2 digit divisors

Privacy and Security Tiger Team

Outline: 2/2/07