Loading in 5 sec....

Detecting Faking on Noncognitive Assessments Using Decision TreesPowerPoint Presentation

Detecting Faking on Noncognitive Assessments Using Decision Trees

- By
**cera** - Follow User

- 163 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Detecting Faking on Noncognitive Assessments Using Decision Trees' - cera

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Detecting Faking on Noncognitive Assessments Using Decision Trees

Stanford University

October 13, 2006

Acknowledgements

- Based on a draft paper that is joint work with Eric Heggestad, Patrick Kyllonen, and Richard Roberts

Overview

- The decision tree method and its applications to faking
- Evaluating decision tree performance
- Three studies evaluating the method
- Study 1: Low-stakes noncognitive assessments
- Study 2: Experimental data
- Study 3: Real-world selection

- Implications and conclusions

Yes

No

Yes

No

drive

Is it raining?

drive

walk

What are decision trees?- A technique from machine learning for predicting an outcome variable from (a possibly large number of) predictor variables
- Outcome variable can be categorical (classification tree) or continuous (regression tree)
- Algorithm builds the decision tree based on empirical data

Training set

Yes

No

Yes

No

drive

Is it raining?

drive

walk

What are decision trees?Training set

- Not all cases are accounted for correctly
- Wrong decision on Day 4
- Need to choose variables predictive enough of the outcome

Yes

No

Yes

No

drive

Is it raining?

drive

walk

What are decision trees?- Not all cases are predicted correctly
- Maybe the decision to drive or walk is determined by more than just the snow and rain?

Test set

Advantages of decision trees

- Ease of interpretation
- Simplicity of use
- Flexibility in variable selection
- Functionality to build decision trees readily available in software (e.g., the R statistical package)

Application to faking: Outcome variables and training sets

- Outcome variable = faking status (“faking” or “honest”)
- Training set = an experimental data set where some participants instructed to fake
- Training set = a data set where some respondents are known to have faked

- Outcome variable = lie scale score
- Training set = a data set where the target lie scale was administered to some subjects

Application to faking:Predictor variables

- So far, have used individual item responses only
- Other possibilities:
- Variance of item responses
- Number of item responses in the highest (or lowest category)
- Modal item response

- Decision tree method permits some sloppiness in variable selection

Evaluating decision tree performance: Metrics

- Classification trees (dichotomous outcome case, e.g., predicting faking or not faking)
- Accuracy rate
- False positive rate
- Hit rate

- Continuous
- Average absolute error
- Correlation between actual and predicted scores

Evaluating decision tree performance: Overfitting

- Algorithm can “overfit” to the training data, so performance metrics computed on the training data not indicative of future performance
- Thus we will often partition the data:
- Training set (data used to build tree)
- Test set (data used to compute performance metrics)

Evaluating decision tree performance: Cross-validation

- Training/test set split leaves a lot to the chance selection of the training and test set
- Instead, partition the data into k equal subsets
- Use each subset as a test set for the tree trained on the rest of the data
- Average the resulting performance metrics to get better estimates of performance on new data

- Here we will report cross-validation estimates

Study 1

- Data sets
- Two sets of students (N = 431 and N = 824) that took a battery of noncognitive assessments as well as two lie scales as part of a larger study

- Measures
- Predictor variables
- IPIP (“Big Five” personality measure) items
- Social Judgment Scale items

- Outcomes (lie scales)
- Overclaiming Questionnaire
- Balanced Inventory of Desirable Responding

- Predictor variables
- Method
- Build regression trees to predict scores on each lie scale based on students’ item responses

Study 1: Results

- Varying performance, depending on the items used for prediction and the lie scale used as the outcome
- Correlations between actual lie scale scores and predicted scores ranged from -.02 to .49
- Average prediction errors ranged from .74 to .95 SD

Study 1: Limitations

- Low-stakes setting: how much faking was there to detect?
- Nonexperimental data set: students with high scores on the lie scales may or may not have actually been faking

Study 2

- Data sets
- An experimental data set of N = 590 students in two conditions (“honest” and “faking”)

- Measures
- Predictor variables
- IPIP (“Big Five” personality assessment) items

- Predictor variables
- Method
- Build decision trees to classify students as honest or faking based on their personality test item responses

Study 2: Results

- Decision trees correctly classified students into experimental condition with varying success
- Accuracy rates of 56% to 71%
- False positive rates of 25% to 41%
- Hit rates of 52% to 68%

Study 2: An example

- Two items on a 1-5 scale form a decision tree:
- Item 19: “I always get right to work”
- Item 107: “Do things at the last minute” (reversed)

- Extreme values of either one are indicative of faking

Study 2: Discussion

- Many successful trees utilized few item responses
- Range of tree performance
- Laboratory—not real-world—data
- Although an experimental study, still don’t know:
- If students in the faking condition really faked
- If the degree to which they faked is indicative of how people fake in an operational setting
- If any of the students in the honest condition faked

Study 3

- Data set
- N = 264 applicants for a job

- Measures
- Predictor variables
- Achievement striving, assertiveness, dependability, extroversion, and stress tolerance items of the revised KeyPoint Job Fit Assessment

- Outcome (lie scale)
- Candidness scale of the revised KeyPoint Job Fit Assessment

- Predictor variables
- Method
- Build decision trees predicting the candidness (lie scale) score from the other item responses

Study 3: Results

- Correlations between actual and predicted candidness (lie scale) scores ranged from .26 to .58
- Average prediction errors ranged from .61 to .78 SD

Study 3: An example

- Items are on a 1-5 scale, where 5 indicates the highest level of Achievement Striving
- Note that most tests are for extreme item responses

Study 3: Discussion

- Similar methodology to Study 1, but better results (e.g., stronger correlations)
- Difference in results likely due to the fact that motivation to fake was higher in this real-world, high-stakes setting

General discussion

- Wide variety in decision tree quality between groups of variables (e.g., conscientiousness scale vs. openness scale)
- Examining trees can give insight into the structure of the assessment

Detecting faking in an operational setting

- Some decision trees in each study used only a small number of items and achieved a moderate level of accuracy
- Use decision trees for real-time faking detection on computer-administered noncognitive assessments
- Real-time “warning” system
- Need to study how this changes the psychometric properties of the assessment

Future work

- Address whether decision trees can be effective in an operational setting—are current decision trees accurate enough to reduce faking?
- Comparisons of decision tree faking/honest classification with classifications from IRT mixture models
- Develop additional features to be used as predictor variables
- Explore other machine learning techniques

Download Presentation

Connecting to Server..