1 / 30

Reliability, Validity, & Scaling

Reliability, Validity, & Scaling. Reliability. Repeatedly measure unchanged things. Do you get the same measurements? Charles Spearman, Classical Measurement Theory. If perfectly reliable, then corr between true scores and measurements = +1. r < 1 because of random error.

romeo
Download Presentation

Reliability, Validity, & Scaling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability, Validity, & Scaling

  2. Reliability • Repeatedly measure unchanged things. • Do you get the same measurements? • Charles Spearman, Classical Measurement Theory. • If perfectly reliable, then corr between true scores and measurements = +1. • r < 1 because of random error. • error symmetrically distributed about 0.

  3. True Scores and Measurements • Reliability is the squared correlation between true scores and measurement scores. • Reliability is the proportion of the variance in the measurement scores that is due to differences in the true scores rather than due to random error.

  4. Systematic error • not random • measuring something else, in addition to the construct of interest • Reliability cannot be known, can be estimated.

  5. Test-Retest Reliability • Measure subjects at two points in time. • Correlate ( r ) the two sets of measurements. • .7 OK for research instruments • need it higher for practical applications and important decisions. • M and SD should not vary much from Time 1 to Time 2, usually.

  6. Alternate/Parallel Forms • Estimate reliability with r between forms. • M and SD should be same for both forms. • Pattern of corrs with other variables should be same for both forms.

  7. Split-Half Reliability • Divide items into two random halves. • Score each half. • Correlate the half scores. • Get the half-test reliability coefficient, rhh • Correct with Spearman-Brown

  8. Cronbach’s Coefficient Alpha • Obtained value of rsb depends on how you split the items into haves. • Find rsb for all possible pairs of split halves. • Compute mean of these. • But you don’t really compute it this way. • This is a lower bound for the true reliability. • That is, it underestimates true reliability.

  9. Maximized Lambda4 • This is the best estimator of reliability. • Compute rsb for all possible pairs of split halves. • The largest rsb = the estimated reliability. • If more than a few items, this is unreasonably tedious. • But there are ways to estimate it.

  10. Construct Validity • To what extent are we really measuring/manipulating the construct of interest? • Face Validity – do others agree that it sounds valid?

  11. Content Validity • Detail the population of things (behaviors, attitudes, etc.) that are of interest. • Consider your operationalization of the construct (the details of how you proposed to measure it) as a sample of that population. • Is your sample representative of the population – ask experts.

  12. Criterion-Related Validity • Established by demonstrating that your operationalization has the expected pattern of correlations with other variables. • Concurrent Validity– demonstrate the expected correlation with other variables measured at the same time. • Predictive Validity – demonstrate the expected correlation with other variables measured later in time.

  13. Convergent Validity – demonstrate the expected correlation with measures of other constructs. • Discriminant Validity – demonstrate the expected lack of correlation with measures of other constructs.

  14. Scaling • Scaling = construction of instruments for measuring abstract constructs. • I shall discuss the creation of a Likert-scale, my favorite type of scale.

  15. Likert Scales • Define the Concept • Generate Potential Items • About 100 statements. • On some, agreement indicates being high on the measured attribute • On others, agreement indicates being low on the measured attribute

  16. Likert Response Scale • Use a multi-point response scale like this: 1. People should make certain that their actions never intentionally harm others even to a small degree. Strongly Disagree Disagree Neutral Agree Strongly Agree

  17. Evaluate the Potential Items • Get judges to evaluate each item on a 5-point scale • 1 – Agreement = very low on attribute • 2 – Agreement = low on attribute • 3 – Agreement tells you nothing • 4 – Agreement = high on attribute • 5 – Agreement = very high on attribute • Select items with very high or very low means and little variability among the judges.

  18. Alternate Method of Item Evaluation • Ask some judges to respond to the items in the way they think someone high in the attribute would respond. • Ask other judges to respond as would one low in the attribute. • Prefer items that best discriminate between these two groups • Also ask judges to identify items that are unclear or confusing.

  19. Pilot Test the Items • Administer to a sample of persons from the population of interest • Conduct an item analysis (more on this later) • Prefer items which have high item-total correlations • Consider conducting a factor analysis (more on this later)

  20. Administer the Final Scale • on each item, response which indicates least amount of the attribute scored as 1 • next least amount response scored as 2 • and so on • respondent’s total score = sum of item scores or mean of item scores • dealing with nonresponses on some items • reflecting items (reverse scoring)

  21. Item Analysis • You believe the scale is unidimensional. • Each item measures the same thing. • Item scores should be well correlated. • Evaluate this belief with an item analysis. • is the scale internally consistent? • if so, it is also reliable. • are there items that do not correlate well with the others?

  22. Item Analysis of Idealism Scale • Bring KJ-Idealism.sav into PASW. • Available at http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Data.htm

  23. Click Analyze, Scale, Reliability Analysis.

  24. Select all ten items and scoot them to the Items box on the right. • Click the Statistics box.

  25. Check “Scale if item deleted” and then click Continue.

  26. Back on the initial window, click OK. • Look at the output. • The Cronbach alpha is .744, which is acceptable.

  27. Item-Total Statistics

  28. Troublesome Items • Items 7 and 10 are troublesome. • Deleting them would increase alpha. • But not by much, so I retained them. • Item 7 stats are especially distressing: • “Deciding whether or not to perform an act by balancing the positive consequences of the act against the negative consequences of the act is immoral.”

  29. What Next? • I should attempt to rewrite item 7 to make it more clear that it applies to ethical decisions, not other cost-benefit analysis. • But this is not my scale, • And who has the time?

  30. Scale Might Not Be Unidimensional • If the items are measuring two or more different things, alpha may well be low. • You need to split the scale into two or more subscales. • Factor analysis can be helpful here (but no promises).

More Related