psychometrics n.
Skip this Video
Loading SlideShow in 5 Seconds..
Psychometrics PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 19

Psychometrics - PowerPoint PPT Presentation

  • Uploaded on

Psychometrics. PSYC 325 Paul Jose Thought for the day: “Studies at Jikei University in Tokyo found that people who employed 7 rules for good health (e.g., adequate sleep, no smoking) had about 6% higher blood pressure than people who were not so concerned about their health.”.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Psychometrics' - lilith

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


PSYC 325

Paul Jose

Thought for the day: “Studies at Jikei University in Tokyo found that people who employed 7 rules for good health (e.g., adequate sleep, no smoking) had about 6% higher blood pressure than people who were not so concerned about their health.”

this is a big topic
This is a big topic
  • Could spend a whole semester on this topic, but we’ll devote a few days.
  • Psychometrics are important to know about because most of the work in psychology involves measures of some kind for which you need to know whether they are reliable and valid. Quick review on this.
scale reliability
Scale reliability
  • Internal reliability:
    • Cronbach’s alpha; Kuder-Richardson 20; split-half; etc.
    • Want to know that all of your items are highly intercorrelated. If not, then you have noise or heterogeneity in your measure. You may think that you’re measuring depression, let’s say, but you’re really measuring depression and anxiety.
    • Let’s take a look at some examples. I mixed depression items with coping items on the following page.

Item-total Statistics

Scale Scale Corrected

Mean Variance Item- Alpha

if Item if Item Total if Item

Deleted Deleted Correlation Deleted

CDI08 8.1450 11.0675 .4146 .6002

CDI09 8.0341 10.4920 .5248 .5768

CDI10 8.1938 10.8881 .4864 .5898

CDI11 7.9242 10.6614 .3930 .5967

CDI12 8.2340 11.9727 .2101 .6328

CDI07 8.0916 10.6471 .4964 .5833

COPE08 6.3352 9.6316 .3088 .6201

COPE11 5.6796 11.4301 -.0004 .7260

COPE15 6.6628 8.2480 .5157 .5485

criteria for cronbach s alpha
Criteria for Cronbach’s alpha
  • The minimum acceptable level is .70. You would like to have the alpha be in the .80s, and you’re ecstatic if it goes into the .90s.
  • Another example. A colleague and I have written a new scale to measure parental facilitation of literacy and numeracy in preschool children (PFLNS), and we sought to compare it with a pre-existing measure (Home Literacy Environment; HLE).
some critical facts
Some critical facts
  • HLE: composed of 9 items and the purported Cronbach’s alpha is .74. Has been shown by the authors to predict reading scores on two commonly used tests of reading: PPVT and PIAT-R.
  • PFLNS: composed of 42 items, and we didn’t know what the reliability would be. No validation yet.
  • Research plan: Collect data from 200 parents on the HLE and PFLNS in Chicago and Wellington, and individually test these children (4 and 5-years old) on the TEMA and TERA.
  • By so doing, we could examine the internal reliability, test-retest reliability, and validity data in one fell swoop. Let’s see how it turned out. Next page is alpha for the HLE.

Scale Scale Corrected

Mean Variance Item- Alpha

if Item if Item Total if Item

Deleted Deleted Correlation Deleted

TELLY 9.8764 5.3247 -.0525 .5345

CHECKS 11.1437 4.9534 .0673 .5124

NEWSPAP 10.8764 4.0394 .3439 .4127

MAGADULT 10.5833 3.7250 .3360 .4117

MAGCHILD 11.2730 4.6025 .1802 .4781

MOTHREAD 9.9770 4.3222 .3020 .4343

FATHREAD 10.0201 4.2157 .2905 .4362

CHILREAD 9.9195 4.6794 .2367 .4608

NUMBOOKS 9.7557 5.0497 .2037 .4776

Reliability Coefficients

N of Cases = 348.0 N of Items = 9

Alpha = .4955

  • Cronbach’s alpha = .866 for 42 items.


  • Inescapable conclusion: the PFLNS is internally reliable and the HLE is definitely not. Doesn’t usually turn out to be quite so clean.
  • Something to remember: the more items you have (if they are similar), the higher your alpha. A 9-item scale must be very coherent in order to have a good alpha. We have 42 items, and they have 9.
  • Okay, we’ve demonstrated good internal reliability, are we done yet?
  • No, because we don’t know if the scales have good reliability over time, usually called “test-retest reliability”. One simply correlates scores between individuals over a relatively short period of time (a few weeks to a month).
  • What are they for the HLE and PFLNS? Answer: We don’t know yet. We have the data, but have not entered them yet. I would guess that the PFLNS would be better, again because of the larger number of items in it.
reliability over time
Reliability over time
  • Why is this important? Because you want to know that whatever you’re measuring is relatively stable over time.
  • But is that true for all measures? In the case of parental practices, the answer is yes. But in the case of rapidly changing variables, such as mood, you would not expect stability over time. So think about this before you gather the data and check it.
  • There are four kinds of validity. Let’s review them.
    • Face validity: do the items look like they measure what they’re supposed to measure?
    • Convergent validity: does the measure correlate with similar measures and fail to correlate with dissimilar measures?
    • Criterion validity: does the measure predict something that it is supposed to predict?
    • Construct validity: the degree to which the measure accurately measures the hypothetical construct it is designed to measure.
validity for the pflns
Validity for the PFLNS
  • So what kind of validity should we consider?
    • Face validity: we created items that measured the degree to which parents did educationally enriching activities.
    • Convergent validity: does our scale correlate with the HLE? We could have included a measure of anxiety, or something unrelated too.
    • Criterion validity: does the scale predict scores from standardized tests of literacy and numeracy? This is the most important goal.
    • Construct validity: does the scale predict the hypothetical construct of “parental behaviours that facilitate academic skills”? This would be the long-term goal of a number of data collections.
face validity
Face validity
  • HLE:
    • Approximately how many books does your child own?
    • How many hours of television does your child watch each week?
  • PFLNS:
    • Use maths in home routines, e.g., measuring ingredients for cooking.
    • Do alphabet workbooks or worksheets.
convergent validity
Convergent validity
  • Correlation between the HLE and PFLNS:

r(322) = .245, p < .001.

  • Correlation between the HLE and the PFLNS-Reading sub-score: r(322) = .259, p < .001
  • Correlation between the HLE and the PFLNS-Maths sub-score: r(322) = .190, p < .001
criterion validity
Criterion validity
  • Correlation of the HLE with:
    • Reading: b = .017, R2= .001, p = .82.
    • Mathematics: b = .047, R2= .002, p = .35.
  • Correlation of the PFLNS with:
    • Reading: b = .238, R2= .06, p = .001.
    • Mathematics: b = .158, R2= .03, p = .003.
  • Conclusion? The PFLNS seems to do a better job of predicting maths and literacy scores than the HLE.
construct validity
Construct validity
  • This is not easily demonstrated. One needs to have the results from a variety of studies, all of which show that the new scale is a good predictor/correlate of related constructs.
  • Other than the HLE, no pre-existing measure of parental behaviours exists with which we can correlate our new measure. One really needs to have 3-5 other measures to “triangulate” in on the hypothetical construct.
  • One cannot measure a hypothetical construct directly, but one can use structural equation modeling todetermine this.
example with a variety of similar tests
Example with a variety of similar tests





New test



Test A

Test C

Test D

Test B

Test B is not a good example, but the new test is a good example.

other topics in psychometrics
Other topics in psychometrics
  • There are two high-powered technique of choosing items for scales called:
    • Item response theory (IRT);
    • The Rasch model.
  • I suspect that I’ll use one or the other to examine the specific items of our new scale to determine whether 42 items are truly necessary.
  • Also, there is the issue of whether certain items are relevant for specific ages of children. I need to examine whether parents change what they are doing over the preschool age span (I know they are).
low level of psychometric knowledge out there
Low level of psychometric knowledge out there
  • My chief complaint about researchers who propose new measures is that they are not systematic and complete in doing all of the things that need to be done.
  • Don’t report internal and test-retest reliabilities; don’t report validity data; and don’t factor analyze their data properly.
  • This last item is our next concern.