Research Methods

Research Methods Previous Comps Questions August 2010

Statistics (Fall 2008)1) A consultant developed a battery of personnel selection test (including 9 subscales) to replace a current test (including 2 subscales) used by an organization. To examine the validity of the new test vis-à-vis the existing one, the consultant administered the two tests on a group of employees in the organization (N=120). He then conducted multiple linear regression analyses to compare the tests. One regression model was examined for each test. Supervisor ratings were used as the criterion for the analyses. Results are shown in Table 1. Suppose you are asked to evaluate the results. Interpret the values in Table 1. Specifically, explain the meanings of the values in columns 3 - 7 to an HR manager of the organization.Based on the results, what is your conclusion about the tests? Which one would you recommend the organization for selecting their employees? Multiple R: correlation btwn linear combo of IVs and DV R2: amount of variance in the DVs that the Ivs predict Adjust R2: R2 taking into account the number of predictors. Will always be less than or equal to R2. Attempts to take into account the phenomenon of statistical shrinkage. Cross-Validity: correlation btwn linear combo of Ivs created using sample-based regression coefficient & DV in population Population Validity: population value of multiple R of regression model New test has higher R square, but when the predictors are taken into account, its adjusted R square is lower than the old test. The old test also has higher population validity and population cross-validity. Also because the old test has fewer predictors, it would be less timely and costly to administer. Population and cross-validities are also lower (see definition above), so I would recommend keeping the old test.

Statistics (Fall 2008)2) Critics of meta-analysis methods often argue that it is better to conduct a single study with very large sample size than a meta-analysis. This is based on the belief that a major advantage of meta-analysis is that it helps address the problem of low power in primary studies with limited sample sizes. (A) How would you respond to this critique? (B) List the pros and cons each type of study (i.e., meta-analysis, primary study with large sample size) as part of your response.3) One of the assumptions of ANCOVA is that the covariate does not interact with any of the independent (categorical) variables. Why is this important? In other words, what happens when a covariate is included in ANCOVA and it does interact with one of the categorical IV’s?4) Under what circumstances does the inclusion of a control variable in a multiple regression analysis help the researcher to avoid (A) falsely concluding that a relationship exists between their IV (e.g., experience) and their DV (e.g., performance), or (B) falsely concluding that a relationship does not exist between their IV and their DV, and how/why? A. Misguided. Assuming that the study that we’re comparing a meta to is a experimental design B. Pros of one large study: higher control over exactly what the variables you want in your study are whereas in a meta-analysis your data is limited to the work that other authors have already been done; if the area you’re looking at isn’t well-researched or if it’s a relatively new construct, there won’t be enough studies available to do a meta Cons of a large sample study: limited because the measures are more prone to error whereas in meta-analysis the errors balance out; relying on significance testing as opposed to confidence intervals; lack of knowledge on what the artifacts that you can tease out are (sampling, measurement errors, range restriction) Pros of meta-analysis: confidence intervals (more precise band around which the effect size lies without comparing it to an arbitrary number (.05)); statistical control; can look at potential moderators across studies that can’t be looked at in one study; synthesizes research findings Cons of meta-analysis: garbage in, garbage out; file drawer effect if unpublished studies aren’t looked at; apples and oranges- it is not a panacea for research, requires judgment on how to code for the variables of interest

RM2 5) A researcher is interested in examining the effectiveness of a training program aiming at improving diversity awareness in the workplace. She designed a study using a sample of college undergraduate students. The researcher planned to randomly assign the students into two groups. One group will be put through the training program; the other serves as the control group. She developed a measure including three items to measure the construct diversity awareness. The internal consistency of the measure (coefficient alpha) is estimated to be .70. (A) What type of study is described in question 5? (B) Discuss three of the most worrisome threats to validity in this situation (note: explicitly discuss which types of validities are affected by the threats). (C) What can be done to alleviate concern about each of these threats? A. Experimental study, Control-Experimental Post-Only B. Internal validity: Selection threat: because the sample being used comprise of college students, this affects the extent to which you can generalize to the population. Statistical validity: Reliability of measures: Reliability attenuates validity; Measures of low reliability cannot be depended on to register true change; Control for this by using longer tests with carefully selected items that are highly inter-correlated Construct validity: mono-operation of construct only one measure of the construct is used. C. Interval: test the stayers of the groups on some variable collected at the beginning of the study. If the two groups do not vary on a variable that IS related to the DV, then there is no selection threat. Statistical: to increase reliability, add items. Construct: add another measure to fix mono-operation of construct.

6) (A) How would you test the following two hypotheses? (B) What effects would demonstrate that both were supported? (C) What effects would demonstrate support for Hypothesis One but not Hypothesis Two? (D) Graphically represent the pattern of results you would expect to see if both were concurrently supported. Finally, (E) graph the pattern of results you would expect to see if only Hypothesis One but not Two was supported.Hypothesis One: After accounting for the positive influence of applicant education, whites will receive significantly higher ratings (1-5 Likert-type scale) than blacks and Hispanics but this difference will be greater when the interviews are conducted face-to-face than when the interviews are conducted by phone. (Kim’s tip: because it says AFTER, it implies ANCOVA since it puts the other IVs AFTER the covariate, whereas regression puts them in simultaneously)Hypothesis Two: Blacks and Hispanics will receive higher ratings (1-5 Likert-type scale) if they are interviewed by phone than they will if they are interviewed face-to-face. A. h1: use ANCOVA, control variable is applicant education, IV is race, DV is ratings of interview performance, moderator is interview method, which in SPSS you would create an interaction variable by multiplying race and interview method. It’d be practically easier to use ANCOVA b/c you wouldn’t have to make a variable where you do interactions btwn mode and dummy coded race variable. You can say regression as long as you say that you MUST dummy code. You would have to look at the interaction effect AS WELL as the simple effect to see if Whites are higher than Blacks/Whites at a specific level of interview type. H1: In this hypothesis, we’re testing for a simple effect because we are looking at the specific difference between whites as compared to blacks and Hispanics rather than just looking to see if there is amore general race effect. H2: simple effect, test a simple effect for Hispanics by mode and another one on Blacks by mode. Simple effect for Blacks for mode, and one for Hispanics for mode. We’re not testing whether they differ from each other, it’s just saying that it’s significant for both of them. If it had said “minorities” then there would be one simple effect of interview type on minorities. B. 1. significant interaction btwn mode and race. 2. significant simple effect for race in the right direction for both modes. 3. the simple effect for race (Whites are higher than B/H) in f2f is bigger than phone (i.e. comparison of simple effects). 4. significant simple effect of interview mode in the correct direction only looking at the minorities. C. Everything would be the same except that you wouldn’t have a significant simple effect of interview mode in the correct direction only looking at the minorities (answered with Kim’s help). A main effect would be: everyone does better on f2f than on phone. This doesn’t preclude an interaction. It could be that everyone does better AND whites are the most positively affected. Ratings Ratings Whites: Dashed Blacks/Hispanics: Solid Phone F2F Phone F2F

Fall 2009: Research Methods IIn the past several decades, quantitative research reviews based on meta-analysis have mostly replaced qualitative reviews in psychological research. List three major limitations of qualitative review of research findings and explain how meta-analysis can address these limitations. 1. don’t account for file drawer studies, most meta-analyses take into account published vs. unpublished articles 2. subjective 3. cannot process a large number of studies, 3 4. can’t make sense of conflicting findings because no statistical analyses on the data are done 5. can’t look at moderators. Meta-analysis addresses by: looking at published as well as unpublished studies, can process a large number of studies, use statistical analyses to control for study biases as well as looking at moderator effects. Statistics are not subjective, the outcome is data driven and calculations are made with the help of statistical analyses, not just judgment. Meta-analyses have more power (due to the large number of studies it can pull from) and accommodates the varying effect sizes that can be found in primary studies.

RM1, Fall 2009: What are the major differences between experimental design, quasi-experimental design, and correlational design? Some people argued that only experimental design can provide evidence of causality. What is your reaction to this argument? Experimental study: random assignment, control of extraneous variables and manipulation of variable(s), can determine causation Quasi-experimental study: everything except assignment Correlational design: no controls, no causation You can still control for different variables in quasi-experimental designs. Longitudinal correlational design can provide some evidence of causality as well.

Research Methods II, Fall 2009:Suppose you hypothesize that test X is biased toward the minority subgroup, such that (a) test X is less related to the criterion of interest (job performance) for the minority subgroup than it is in the majority subgroup, and (b) given a same score on test X, a minority member is likely to have higher true job performance than it is predicted by the test (that is the test underestimates job performance of members of minority subgroup), whereas the test correctly predicts performance of members of the majority subgroup. Design a study that allows you to test this hypothesis. Describe steps, analysis, considerations. Suppose your hypothesis is confirmed. Illustrate the result graphically. A. Test interaction term for race*test to see if race moderates the relationship btwn the test and job performance. B. Run two separate regressions and see if there is a difference in adjusted R2 between the two equations. Collect data on the criteria and the predictor for both groups and regress the criterion onto the predictor for the separate groups. Then you would compare the variance explained for the two separate groups and see if they differ. Can also look at changes in beta weight and their significance. Ex. GPA on SAT scores for the majority subgroup. Then regress GPA on SAT for the minority subgroup. Look at the adjusted R square. Would expect the R square to be larger for the majority vs. the minority subgroups. If the hypothesis is proven correct, you would use different regression lines for the two groups. But this is illegal.

Research Methods II, Fall 2009: Recent research has shown that personality factors are related to employee turnover. Suppose you hypothesize that job attitudes (i.e., job satisfaction and organizational commitment) mediate the relationship between personality and turnover. Design a study to test this hypothesis. Specifically, describe your research design, choice of analysis method, (describe all steps involved), justification of your choice (why the analysis method is selected instead of alternative ones), and expected results which would confirm your hypothesis. Logistic regression because you have a dichotomous DV. Longitudinal design: collect personality data upon hiring. Collect attitude measures after about 6 months on the job. About a year after hiring, collect turnover . Baron & Kenny (1986): IV to DV, IV to Mediator. Put all together. If the relationship btwn the IV & DV decreases but remains significant, partial mediation. If the relationship disappears, the mediator completely mediates the relationship btwn the IV & DV.

Spring 2010:, RM1 Assume you were asked to evaluate the effectiveness of two training methods designed to improve job performance. You have a sample of employees in an organization which you can use for your study. For this group of employees, you also have access to their scores on an employment test (the Wonderlic Personnel Test). Describe how you would conduct a study to answer the question about the effectiveness. Specifically: A) Describe your study design B) Explain how would you obtain the variables in your analysis and why you select these variables C) Describe the analysis you plan to conduct; explain why you select this kind of analysis. IV: two training methods, DV: job performance, Covariate: continuous variable, Wonderlic This is a post-test experimental design (because all we have are post-test scores) where you would obtain effectiveness scores after we implement the training design. How would we obtain the variables? Obtain job performance scores (for effectiveness), you could do this by collecting data from supervisor scores, or collecting more performance data from more sources to reduce error. Administer employment test before the training. ANCOVA, 2-level categorical IV, covariate and DV (as opposed to multiple regression, which you would use if the IV had more than 2 levels)

Spring 2010, RM II: An organization is considering adopting a battery of four tests for personnel selection purpose. You are hired to evaluate the validity of the test battery. Based on a sample of current employees of the organization, you used multiple linear regression analysis to examine how the four tests predict job performance. Tables 1 and 2 below show results of the analysis. A) Explain to the organization about the validity of these tests (specifically, explain all the values shown in Table 1). B) Advise the organization how to use the tests (i.e., how to combine scores on the test and use the resulting composite for selection purpose).

RM I A researcher is interested examining the effectiveness of two intervention methods (A and B) on a learning outcome. She conducted two studies based on two independent samples of college students. In Study 1, subjects receiving intervention A were compared to those in a control group. In Study 2, Subjects receiving Intervention B were compared to those in another control group. Using t-test, the researcher found significant result in Study 1 (p<.05) but not in Study 2 (p=.08). Accordingly, she concluded that Intervention A was more effective than Intervention B. A) Do you agree with the researcher conclusion? Explain. B) If you were the researcher, what would you do? (i.e., what analysis would you conduct?) C) Assume that later it was revealed that the means of learning outcome are actually the same for both Interventions A and B across the studies. Can you think of any explanation for the researcher’s earlier findings (that is, significant result for A but not for B). • A. No, we don’t agree with the researcher. • Differential attrition • I would gather a measure before for both groups, and look at the stayers in both groups because then you can determine if there was a differential attrition. • Power threat • There could be a larger sample size for one group than the other • The variance in one group could be larger • Family-wise error, t-test. Instead of using 2 t-tests, she could have run an ANOVA. This would tell you if there is a difference btwn groups. You would then do a post-hoc analysis to check where the difference lies. • B. Can use ANCOVA or multiple regression with a control variable. • C. it would be a better test if both interventions were compared against the same control group; there could be something about the individual control groups that could lead to finding an effectiveness of either of the interventions. I would look to see if there are any control variables that could be you used, because that would increase your power (you can add this in bc it doesn’t specifically say that there aren’t covariates). Power: one could have had more within-group variance, difference in sample size, more error variance would increase type 2.

An organization is considering adopting a battery of four tests for personnel selection purpose. You are hired to evaluate the validity of the test battery. Based on a sample of current employees of the organization, you used multiple linear regression analysis to examine how the four tests predict job performance. Tables 1 and 2 below show results of the analysis. A) Explain to the organization about the validity of these tests (specifically, explain all the values shown in Table 1). B) Advise the organization how to use the tests (i.e., how to combine scores on the test and use the resulting composite for selection purpose). • (Multiple) R is equivalent to correlation coefficient r, so this is the strength of the relationship btwn the tests and job performance • R2: the amount of variability that the predictors (the tests) explain in the outcome variable. • Adjusted R2: this value takes into account the sample size as well as the number of predictors that are used. • Population Validity = estimated correlation if you were to take it to the entire population • Population Cross Validity =Estimated correlation if you were to take it to a new sample population. • Beta: multiply the beta by each of the test scores, add them up + the constant and that is the composite score.

Spring 2010, RM II: Compare and contrast moderating and mediating effects. Specifically: A) define, B) provide examples, and C) describe procedures to test them. Moderation: A. explains WHEN the relationship between two variables becomes stronger or weaker. B. example, in the job characteristics model, growth need strength moderates the relationship between the job characteristics and motivation. B. C. Use multiple regression, include the IVs as well as another term where you multiply the IV by the moderator. If this interaction is signification, then there is a significant moderating effect. Whichever one of the IVs turns out to be the moderator depends on theory. Mediation: A. relationship btwn predictor and criterion exists because of the existence of a third variable that causes the criterion. B. Use Baron & Kenney (1986) method, determine if IV predicts DV; determine if IV predicts Mediator; (alternate step) Mediator predicts DV; put all the variables in the regression model, if the IV is no longer a significant predictor but the Mediator is, there is full mediation. If the IV is still a significant predictor but the beta weight decreases, then you have partial mediation. This doesn’t necessarily mean that there isn’t full mediation so you could use the Sobel test to determine if there is full mediation or not. For example, job satisfaction mediates the relationship between positive affect and turnover intentions. That is, people that are higher in positive affectivity tend to have higher job satisfaction, and in turn, people that have higher job satisfaction have fewer intentions of turning over.

ARM 4) Under what circumstances does the inclusion of a control variable in a multiple regression analysis help the researcher to avoid (A) falsely concluding that a relationship exists between their IV (e.g., experience) and their DV (e.g., performance), or (B) falsely concluding that a relationship does not exist between their IV and their DV, and how/why? Type 1: if the control variable is related to the IV as well as the DV, it keeps researchers from erroneously concluding that it was the IV that caused a change in the DV when it was actually the control variable. Type 1 errors always have to do with systematic bias (e.g. confounds). For example, learning goal oriented individuals could seek opportunities to master tasks as well as having higher job performance, so if this variable is not controlled for, one would conclude that experience predicts job performance when in fact it is LGO. Type 2: if the control variable is related to the DV but not the IV, then the control variable serves to explain some of the variance in the DV, thereby making it easier to find an effect between the IV and DV. For example, cognitive ability has been known to be the best predictor of job performance, so if one were to regress performance onto experience while including cognitive ability, you would explain a lot of variance in job performance, thereby having more power to find an effect btwn experience and performance.

ARM 3) One of the assumptions of ANCOVA is that the covariate does not interact with any of the independent (categorical) variables. Why is this important? In other words, what happens when a covariate is included in ANCOVA and it does interact with one of the categorical IV’s? • When the assumptions of ANCOVA is that the covariate does not interact with the IV. • If the covariate does interact with the IV, it will misadjust the means because the covariate is put in first to adjust the means of the IV (in multiple regressions, all of the variables are inputted simultaneously) • Using a covariate tries to find the regression line that fits both IVs, and if there is an interaction, then it violates ANCOVA’s assumption of homogeneity of variance.

Applied Research Methods, Fall 2009: suppose you have conducted an experiment in an org setting, and you are concerned about ruling out differential attrition from treatment and control groups. A. what data would you collect and how would you analyze it? B. what pattern of results would indicate that it is present and what would indicate that it’s not present. A. you would collect individual difference variables at the beginning, preferably more than one. B. compare the differences on the pre-test variables on the STAYERS. If there is a difference between the two groups on the individual difference variables, then there is differential attrition. If there isn’t a difference on the individual difference variables on stayersbtwn the two groups, then there isn’t differential attrition.

Applied Research Methods, Fall 2009: A. Under what conditions (how?) can a MANOVA help prevent a researcher from false concluding that a manipulation had no significant effect when in fact it did? B. Under what conditions (how?) can MANOVA help prevent a researcher from finding a significant manipulation effect when in fact it did not? The DVs must be moderately correlated but conceptually related. If they are not related, you would use ANOVA. If they are too related, you collapse them. Type 2: Type 2 is always about error. Since you’re collecting multiple DVs of a similar construct, they are more likelly to have less error in them than a single measure of the construct. One measure would probably not capture the entire construct (criterion deficiency) and would be less reliable, which attenuates the relationship between the variables. Type 1: Type is always about systematic bias. Also, running multiple ANOVAs on the same data would increase family wise error, which would make it harder to make an effect.

ARM, Spring 2010: Describe, compare, and contrast the techniques of stratified random sampling and quota sampling. Be sure to specify how/why these techniques differ both in terms of the processes involved and the statement that can be made about the data collected. Quota sampling: A. splice population based on a particular variable (make sure that it’s related to the DV), but when you go about selecting participants, you use convenience sampling in an effort to fill pre-determined quotas of what you want to end up in your sample. For example, we want 40 males and 50 females, we’ll sample the population until we get those number and then stop. Can’t be sure if the sample is representative of the population so you can only generalize to groups that are similar to the ones that you sampled. Stratified random sampling: A. slice the population based on a particular variable (make sure that it’s related to the DV) and then randomly sample from those slices. For example, we want males and females, we split the population in two categories (gender) and then randomly select from those two categories. You can either do proportionate or disproportionate sampling, depending on what you want your sample to end up like. You can create a confidence interval and generalize to the entire population you sampled from.

ARM, SP2010: A ) Describe Hierarchical Linear Modeling (HLM) at a conceptual level. Be sure to include in your answer when it is recommended for use, what types of data can be used as predictors, what types of data can be used as DV’s. B) Under what circumstances (what pattern of true relationships) would the use of HLM, and more specifically the ability to account for nested variables, enable a researcher to avoid a false positive decision with respect to his/her hypothesis?, or C) to avoid a false negative decision with respect to his/her hypothesis? Used when there are variables nested within other variables. It’s similar to a control variable. Ex. Individuals within the team, people within a department. The DV, however, has to be individual level data. The IV can be any kind of variable- categorical, dichotomous or continuous. The DV, however, must be continuous. There is no conceptual reason for a difference between groups but there may be something about belonging to the group can be related to the DV B. Type 1: partials out variance accounted for by group membership (or whatever the nested variable is); theoretically it’s the same a within-subjects design because anything that is specific to the group gets accounted for. C. Type 2: increases the power because it reduces the amount of variance in the DV that must be explained.

Fall 2008, RM3: 10) Produced below is a portion of an SPSS Windows output for an exploratory factor analysis. Variables consisted of the answers from 1,783 respondents to eight “yes”/”no” questions (“no” = 0; “yes” = 1), asking respondents to indicate the reasons why they, as television viewers, stay with a show into its second season (respondents were asked to mark all reasons that applied). Summarize and interpret the results from this output that should be included in an APA-style results section. You do not need to write the APA-style results section. Instead tell us what you would need to include in such a write-up. • Four steps of factor analysis. • Extraction: deals with choosing either PAF and PCA. PCA is used when there is very little error variance because it models all of the variance (error, specific and common). PAF is used when there is a lot of common variance (and a lot of error variance because it doesn’t model this variance); this is the only variance that this method models. In this example, the participants were asked yes/no questions about the TV shows they watched, and although attitudinal measures usually have a lot of error variance, the yes/no component of the questions leads us to assume that there is not a lot of error variance, therefore why we would choose PCA. • Truncation: scree plot, eigen values and pre-determined number of factors are the methods that we can use for truncation. We looked at the eigen values and there were only two factors that had eigen values above 1 (4.65 and 2.2),. We can also look at the scree plot where we choose component based off of when the scree plot straightens out. Using the scree plot, we would also choose 2 factors. The problem with scree plot is that it is subjective and the problem with eigen values is that they do not provide the whole picture. Go into detail about what percentage of variance each factor accounted for and etc. • Rotation: varimax, equimax and quartimax (orthogonal), direct oblimin and promax (oblique). Orthogonal rotations allow for greater ease of interpretation, so those are the methods that we chose from. Varimax optimizes rows and tries to achieve simple structure while quartimax optimizes columns and tries to maximize variance. Equimax tries to do both but it does so badly. We chose to do quartimax. Quartimax maximizes the variance of the rows of a factor matrix. Varimax used to maximize the square loadings of a factor in all the variables. It minimizes the number of variables that have high loadings on any given factor. Yields solution to identify variables with a single factor. Varimax is the most common. • Interpretation: after the quartimax rotation, we had two factors. After looking at the items within each factor, we named one Outside Influences including items such as … and the second factor we named Original Format including items such as … because it had to do with items based on the format of the TV show.

Fall 2009 RM3: In exploratory factor analysis (EFA), the researcher has to make choices regarding a number of methodological approaches. One of these is which method of extraction to use. Compare and contrast (at least) two common methods for the extraction in exploratory factor analysis. Discuss each method’s theoretical underpinnings, and describe the decision process and criteria a researcher should use when deciding on the extraction method to use. • Both of these methods are extraction methods. These methods have to do with how they model reliable vs. unreliable (error) variance. Reliable variance includes common and unique. PCA (principal components analysis) and PAF (principal axis factoring). PCA models all of the variance (error, unique and common). PAF models only common variance. Because PCA models ALL of the variance, it just involves reformulation of the data but it models too much variance (because it includes error variance). Because PAF models only the common variance, it involves reformulation of the data AND data loss; so it models too little variance.

Spring 2010 RM3: One of the four steps of Exploratory Factor Analysis (EFA) is “Step 3 – Rotation.” This was not always the case; in fact, ground-breaking work by Stanford-Binet on intelligence testing used principal components analysis for extracting the first two principal components from the intelligence tests, and then called the first one general intelligence “g,” and the second one specific intelligence “s.” However, Stanford-Binet never rotated any of the results. A) Explain what rotation allows the researcher to do. B) Discuss why it may be very important to rotate the results obtained from an initial extraction and truncation, such as Principal Components Analysis (PCA), followed by Kaiser-criterion truncation. C) Briefly describe the two major categories of rotation techniques. • A. The purpose of rotation is to approximate simple structure. Allows cleaner picture of data. It also involves just reformulation of data; there is no loss of information. • B. When you don’t rotate, the sum of square loadings are maximized but it’s harder to interpret because items load on more than one factor. For example, in PCA is a method of extraction that models all of the variance, both reliable and error. Kaiser-criterion is truncation that is based on the amount of variance as well as the number of variables. • C. Varimax, equimax and quartimax (orthogonal), direct oblimin and promax (oblique). Orthogonal rotations don’t allow for the factors to be correlated while oblique does allow correlations between the factors. Orthogonal rotations are easier to interpret but less realistic. Oblique rotations are harder to interpret but more realistic because it allows factors to be correlated. Oblique rotations should be used when the correlations are between .3 and .7. If it’s lower, force it into orthogonality and it’s if it’s higher, then it’s probably uni-dimensional.

Fall 2008 RM3: Structural Equation Modeling7) (A) Discuss the influence of sample size N on structural equation modeling (SEM). (B) Discuss minimum sample size, statistical power, and n-to-k ratio all within the context of SEM. (C) What, if any, relationship is there between sample size and degrees of freedom in SEM? • A. SEM is a large sample size data analysis method. Minimum N is greater than N and acceptable is greater than 200. tests are sensitive to sample size and the co-variances. Rule of thumb: 50 more than 8 times the number of variables in the model. Covariances are less stable when small samples are used and SEM is based on covariances. Velicer & Fava (1998) size of the factor loading, number of variables and the size of the sample were important elements in obtaining a good factor model, which was found in factor analysis but can be generalized to SEM (Tabachnick & Fidell, 2007). • B. What drives the sample size is the number of variables (the N to k ratio). Different authors vary on their recommendation regarding the N to k ratio, but they vary from 5 to 1 to 35 to 1 (Tabachnick & Fidell, 2007). If your sample size is too small, you’re going to be less able to find an effect (type 2 error) because there will be more error in the covariances since they will be more unstable. The stronger the correlations, the more power SEM has to detect an incorrect model. When correlations are low, the researcher may lack the power to reject the model. • C. There is NO relationship between sample size and degrees of freedom. The only thing that drives degrees of freedom is the degree of over-identification (more knowns than unknowns  greater degrees of freedom).

Fall 2009, RM3: In Structural Equation Modeling (SEM), situations occur when it is of benefit to fix path coefficients, rather than to estimate them freely. One of these situations is a model that is locally under-identified. In such a case, if you had to fix path coefficients to increase the model’s degrees of freedom (dfs), how would you go about it? Describe how you would determine (a) which paths to fix, (b) how to practically fix them, and (c) what two conceptual alternatives for determining how to fix the path coefficients would be. • A. Fix the path from the manifest to the latent variable (do not overthink the question!) • B. Set it to a particular value and entering it as such into the model. • C. A specific value, reliability or validity gotten from a meta

Spring 2010, RM3: A ) Describe and discuss the two major categories of structural equation modeling (SEM) techniques, i.e., Confirmatory Factor Analysis (CFA) and SEM of Latent-Variable Effects Models. B) How do the models used in CFA vs. SEM of Latent-Variable Effects Models differ? C) If you choose to use the two stage approach where you first use CFA and then SEM, will you always get the same path coefficients? Why or why not? • Florian’s answer key • The problem of the two-stage approach with respect to measurement invariance has to do with the degree to which the factor loadings from latent constructs to the manifest variables change from the measurement model to the hypothesized model (3 points). • • Student should summarize the key points from: “Given an acceptable measurement model, one should observe only slight changes in the factor loadings as hybrid models with alternative structural models are tested” (Kline, page 252) (2 points). • • Student should summarize the key points from: “If the factor loadings change markedly …, then the measurement model is not invariant” (Kline, page 252) (2 points). • • Potential alternatives (all not very good) are (a) to observe the changes in the factor loadings and make interpretations on the basis of change; (b) to fix the factor loadings from one step to another; or (c) to create a “fake” measurement model using a calculated correlation matrix among the latent construct (3 points). • Both of them are based on covariances • Notes from ARM III Class with Florian: • Two Stage Process • o First: CFA of the measurement model •  Ultimate question, are the manifest variables and groupings reasonable/ “well-fitting”? •  Compare one-factor to two-factor models, etc. •  Can these variables be appropriately mapped against related constructs? •  Focus on the indicator variables, are they good indicators and can we distinguish reasonably among them; and that a comparison model is not better fitting (the correlations between the constructs themselves are not of much interest in this step) • o Second: SEM of the structural/effects/path model •  Are the relationships between/among the model latent constructs reasonable/well-fitting as specified in the structural model? •  Focus on the causal paths (less focus on the indicator paths) • o Steps 1 and 2 are independent, you may get different path coefficients in the measurement model than you would for the structural model, theoretically a variable that loaded positively on C1 in the measurement model may turn up as negatively loaded on C1 in the structural model. Yes it is possible to get different values. Keyword here is measurement invariance (better measures should be better measures and worse measures should be worse measures, regardless of if we are testing the structural model or the measurement model) • o We do not have to wait for the results of step 1 to run step 2

Fall 2008, Psychometrics: 8) Describe, compare and contrast Guttman and Thurstone attitude scaling. Illustrate your discussion with appropriate examples. • Describe: • Guttman: “ideal answer patter” or “perfect scale”. Examinees are ordered with respect to their ability and items are ordered with respect to their difficulty. Person that passes item with given difficulty passes all easier items; item passed by person with a certain level of ability will also be passed by higher ability people. • Thurstone: 10-20 items (that have small SDs and varied medians), score is average of the endorsed item. Superior to Likert scale for people who hold extreme attitudinal positions. Basic assumption: people with a particular position on the attitude dimensions will agree only with items that express opinions near their own and will disagree with items that differ in either direction (Barb’s psychometrics slides). • Compare: • Both attempt to use items to get a person’s standing on an attitude construct. • Contrast: • Guttman: assumes that you will endorse all items below a certain amount and not those above a certain level. Monotonic (linearly increasing). Does not take into account both ends of an attitude spectrum. • Thurstone: takes the mean of all of the items that you do endorse. Non-monotonic. Takes into account both extremes of an attitude. • Examples: • Guttman: • 1. I don’t mind listening to music when it comes on the radio. • 2. I like listening to music in my spare time. • 3. I listen to music in my spare time and when I’m studying • 4. I love music and incorporate it into as many aspects of my life as I can. • 5. Music is what makes life worthwhile and it’s infused in every aspect of my life. • Thurstone: • 1. Music is jarring to my ears. • 2. Music is a medium through which I can express myself artistically. • 3. I hate going to music concerts. • 4. Music is just noise. • 5. I don’t really see the point of music. • 6. Music is an integral part of my life. • 7. Listening to music relaxes me. • 8. Music sucks. • 9. I would be depressed if I couldn’t listen to music again. • 10. Listening to music makes me feel good.

Fall 2008, Psychometrics: A psychological test publishing company is disappointing with the personality measure that they market for use in HR selection. In particular, they cite poor validity coefficients with job performance criteria and response distortion in job applicant samples as big problems with the measure. In an effort to address these issues, one of their internal consultants has developed another measure of the five-factor model that can be completed for an applicant by their prior supervisor. The company wishes to conduct validation research that will accomplish the following three objectives: determine whether the new measure indeed taps the big five personality dimensions, determine whether the new measure predicts job performance better than the old measure, and determine whether the new measure results in less response distortion than the old measure. Describe the manner in which you would accomplish these three objectives. In other words, describe the validation research you would conduct. • 1. Determine whether the new measure indeed taps the big five personality dimensions: administer a well-developed measure of personality, the new measure of the ratings by the prior supervisor as well as the current measure that we’re concerned about. In order to determine divergent validity, we would also have a measure of a construct that should not be related to each other, such as cognitive ability (g). Do an MTMM matrix. When we look at the cells looking at the same personality trait with different methods (MTHM), they should be larger than different traits, different methods (HTHM) and different traits, same method (MTMM). The latter 2 would be discriminant validity and the former is regarding convergent validity. • 2. Determine whether the new measure predicts job performance better than the old measure: You would run a hierarchical linear regression model where the old measure is entered in first, and the new measure is entered second. If the new measure does not account for variance in job performance beyond the old measure, then the new measure does not have utility beyond the first one and it doesn’t make sense to replace the old one with the new one. • 3. Determine whether the new measure results in less response distortion than the old measure: Look at central tendency and leniency errors in the new measure in order to determine if there is response distortion going on. • See florian’s answer

Fall 2009: Compare and contrast the main tenets of Classical Test Theory (CTT) and Item Response Theory (IRT). What do the two theories have in common, and what are the differences between them? For each difference, discuss whether you consider the difference a strength of CTT, of IRT, or of neither. • CTT: the observed variable equals the true score plus error. Measures are based upon the linear combinations. Discrimination & difficulty are confounded with each other. • IRT: focuses on measuring a latent construct that is believed to underlie the responses to a given test. • Compare: • Both attempt to measure a person’s standing on a trait • Both measurement methods look at item parameters • Contrast: • CTT: based on parallel forms of tests, item parameters are fixed (sample- and test-dependent), item properties are not directly linked to a trait but instead are normed and tested against a group of people (how do you score against a group of people but not your standing on a trait), person and item parameters are on a different scale, mathematical expressions are easier to interpret (strength), doesn’t require so large a sample size (strength), true score is defined for the measure (their score on the entire domain of items) and not the construct (the latter which IRT does do) • IRT: not based on parallel forms of tests (strength b/c you don’t have to norm it), item parameters are not sample dependent; they are the same across populations and computed independently of the sample (strength b/c you can compare individuals based on their standing on items), items represent an individual’s standing on a trait instead of how they compare to a norm (strength, allows you to know an person’s actual standing on a trait instead of just their score on an indefinite amount of items) • See florian’s answer

Fall 2009: Campbell and Fiske (1959) proposed the multi-trait, multi-method matrix as a way of assessing convergent and discriminant validity. How does the MTMM matrix accomplish this, and how do convergent and discriminant validity relate to the concept of construct validity? What are potential pitfalls of using MTMM matrices to assess the construct validity of a test? In your answer, make sure to first describe, in detail, what the components are of an MTMM matrix. • The MTMM looks at different traits measured by different and the same methods as well as the same traits measured by different and same methods. • Construct validity involves specifying what a trait is as well as what it’s not related to. Convergent and discriminant validity are necessary but not sufficient conditions for construct validity. Convergent validity tries to determine whether a particular trait is related to other traits that are similar. Discriminant validity tries to determine whether a particular trait is unrelated to traits that it should not be related to. • MTMM: same trait, same method: this is the reliability of the measure. this should be high. • MTHM: same trait, different methods; this validity diagonal; this is a measure of convergent validity. This should be large. • HTHM: different trait, different method; this is a measure of discriminant validity. VD should be lower than this. • HTMM: different trait, same method; this is also a measure of discriminant validity. VD should be lower than this. • Potential pitfalls: • Judgments are qualitative because there are no quantifying rules for the criteria (i.e. how high or how low exactly should the validity values be?). Doesn’t use an outside criterion to determine whether the construct that we’re looking at is indeed what we’re saying it is. • Doesn’t take random error into account • Incomplete because it assumes: • No trait-method correlations • All traits are equally influenced by method factors • Method factors are uncorrelated

Fall 2010: When developing a new test, it is insufficient to simply identify a construct and write items that form homogeneous scales. Explain, in detail, according to Cronbach and Meehl (1955), what should be done to justify the development of a new test. • Look at the original article • The test should have something to add over the measures that are already existing (if some do exist); there should be an improvement somewhere • E.g. • Measures the construct better • Is more valid • Should expand the nomological network

Fall 2010: Guttman scaling is sometimes called a “perfect scale.” List the assumptions underlying Guttman scaling. How can you tell if you have produced a “perfect” Guttman scale? Assume that you are interested in attitudes toward statistics. Write 4 items that are likely to form a perfect Guttman scale on attitudes toward statistics. • “Ideal answer patter” or “perfect scale”. Examinees are ordered with respect to their ability and items are ordered with respect to their difficulty. Person that passes item with given difficulty passes all easier items; item passed by person with a certain level of ability will also be passed by higher ability people. • Assumes that you will endorse all items below a certain amount and not those above a certain level. Monotonic (linearly increasing). Does not take into account both ends of an attitude spectrum.

Research Methods