1 / 40

Introduction to Statistical Models and Factoring

Introduction to Statistical Models and Factoring. dependent and independent models “traditional” and possible applications of independent models 3-way sampling problem basic steps in factor analyses. Dependent multivariate models.

tory
Download Presentation

Introduction to Statistical Models and Factoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Statistical Models and Factoring dependent and independent models “traditional” and possible applications of independent models 3-way sampling problem basic steps in factor analyses

  2. Dependent multivariate models Dependent models are used when we divide our variables into “criterion” and “predictor” variables • the value of the criterion(ia) is “dependent” on the value of the predictor(s) -- statistically / causally • simple regy’ = bx + a • multiple regy’ = bx + bx + bx + a • canonical reg a + by + by= bx + bx + bx + a • x & y vars are quantitative, binary, coded, or interaction terms

  3. Dependent multivariate models, cont. Dependent models … • these are the General Linear Models from “multivariate class” • research questions/hypotheses are about which predictors (with what weightings) are useful for estimating what criteria

  4. Independent multivariate models Independent models are used when there is no “predictor vs. criterion” distinction among our variables. • The independent models we will examine are … • Factor Analysis • Cluster Analysis • Multidimensional Scaling • Research questions are about the number and identity (interpretation) of groupings among the “things” being analyzed

  5. Independent multivariate models: “Traditional” Uses • Factor variables to find the number and identity of the different kinds of information  Factor the 25 questions from client’s intake interviews • Cluster people to find the number and identity of the different kinds of characteristic profiles  Cluster 250 students to took a standardized test • Scale stimuli to find the “rules of stimulus similarity and dissimilarity”  MDscale 24 shape stimuli Before we get into the “alternative” uses of these models ...

  6. 3-way sampling problem • from a statistical or researchdesign perspective“sampling”usually refers to the selection of some set of people from which data will be collected, for the purposes of representing what the results would be if data were collected from the entire population of people in which the researcher is interested • from a psychometric perspective “sampling” is a broader issue, with three dimensions • sampling respondentsto represent the desired population of individuals • sampling attributes to represent some desired domain of characteristics • samplingstimuli(things or people) to represent the desired category(ies) ofobjects

  7. Stimuli 3-way sampling Let’s look at how “people”, “attributes” and “stimuli” are used... People Examples • 20 patients each rate the complexity, meaningfulness and pleasantness of the 10 Rorschach cards • 3 co-managers judge the efficiency, effectiveness, efficacy and elegance of the 15 workers they share • 10 psychologists rate each of 30 clients on their amenability to treatment, dangerousness and treatment progress • 200 respondents complete a 50 item self-report personality measure Attributes

  8. From the examples... Example #1 #2 #3 #4 People Stimuli Attributes 20 patients 10 cards cmp, ples. mng 3 co-man 15 workers e, e, e & e 10 psychists 30 clients amen, dang, tp 200 responds 1 -- “self” 50 items

  9. So, why is it called the 3-way sampling “problem” ??? “Variables” • one “problem” is that most data analysis models (both dependent and independent) start from a 2-way data set, most commonly… • So, the 3-way data must be “prepared” for analysis, by either ... • limited collection (only collect 2-way data-- only one person, one stimulus, or one attribute involved) • selection (only use one 2-way “layer” from the 3-way sample) • aggregation (combine across one “way” of the 3-way sample to get a 2-way layer) “Cases” Let’s look at an example of each ...

  10. Examples of data prep... 50 Items • limited collection • only collect 2-way data-- only one person, one stimulus, or one attribute involved • Example -- 200 respondents complete a 50 item self-report personality measure • only one stimulus (“self” or “I”) -- so only a 2-way sampling (people x attributes) • 2-way data table would look like 200 Respondents

  11. Examples of data prep... • selection • only use one 2-way layer from the 3-way sample • Example -- 20 patients each rate the complexity, meaningfulness and pleasantness of the 10 Rorschach cards • here’s what the 3-way data array would look like • Imagine the researcher were interested in only the meaningfulness data • only those data would be selected 10 Cards 20 Patients comp mean plesnt

  12. Example of selection, cont. 10 cards • The resulting 2-way table would look like ... 20 Patients All data are mean- ingfullness ratings

  13. Examples of data prep... • aggregation • only use one 2-way layer from the 3-way sample • Example -- 3 co-managers judge the efficiency, effectiveness, efficacy and elegance of the 15 workers they share • here’s what the 3-way data array would look like • Imagine the researcher was interested in how the workers differed in terms of the attributes 15 workers 3 co-mangrs ef ef ef el

  14. Example of aggregation, cont. ef ef ef el • In this case, the co-manger ratings would be considered “replications” of each other -- existing primarily to get more stable data (than one manager’s rating) • So, we would aggregate (take the mean) across the three co-managers for each attribute of each worker • The resulting 2-way table would look like... 15 workers All data are average ratings

  15. A second example of aggregation co#1 co#2 co#3 • Imagine the researcher was interested in how the workers differed in the ratings given by the three co-mangers • In this case, the attributes would be considered “replications” of each other -- existing primarily to get more stable data (than using one attribute) • So, we would aggregate (take the mean) across the four attributes ratings from each co-manager, for each worker • The resulting 2-way table would look like... 15 workers All data are average ratings

  16. Different ways of treating data for the different models • The 2-way data table we have been discussing is often labeled the “X” matrix • starting with “X”, different things are done to prepare the data for different model • Let’s look at these .. • Remember, we’ll start with the “traditional” uses of the different models, and then look at the different ways they can be used

  17. Factor variables to find the number and identity of the different kinds of information X R “Variables” “Variables” “Variables” “Cases” “R” captures the relationships among the variables which are summarized in the “S” (Structure) matrix, which provides the basis for deciding how many and what are the kinds of information the variables carry S “Factors” “Variables”

  18. Cluster people to find the number and identity of the different kinds of characteristic profiles X D “Cases” “Cluster” “Variables” C 53248. . 1 11122 . . 3 “Cases” “Cases” “Cases” “D” captures the similarities and differences among the cases which are summarized in “C” (Cluster membership), which provides the basis for deciding how many and what are the “sets” of people -2 -1 0 1 2 “Variables”

  19. Scale stimuli to find the “rules of stimulus similarity and dissimilarity” symmetrical D Map “Stimuli” complex 1 2 7 4 “Stimuli” 5 3 6 8 simple asymmetrical “D” captures the patterns of similarities and dissimilarities among the stimuli (can be from direct ratings or derived from “X”, more later) which are summarized in the “Map”, which provides the basis for deciding how many and what are the “rules” (dimensions) underlying the patterns of stimulus similarities and dissimilarities.

  20. Independent multivariate models: “Alternative” Uses • As you might imagine, we are not limited to factoring variables, clustering people, and scaling stimuli • Any combination of “interest” “data” and “model” is possible • So, there are really nine possible combinations Factoring Clustering Scaling Variables People Stimuli * * *

  21. Independent multivariate models: “Alternative” Uses of Factoring • Factoring provides a geometric (spatial) model of a pattern of intercorrelations • number of underlying dimensions and interpretation of each • The two most common types of factoring are… • R-type factoring -- based on inter-variable correlations • factoring variables • number & kinds of variables with “similar information” • Q-type factoring -- based on inter-person correlations • factoring people • number & kinds of persons with “similar characteristics”

  22. Factor people to find sets of people that have “similar characteristics” X Q “Cases” “Variables” “Cases” “Cases” “Q” captures the relationships among the cases which are summarized in the “S” (Structure) matrix, which provides the basis for deciding how many and what are the kinds of persons with similar characteristics S “Factors” “Cases”

  23. Independent multivariate models: “Alternative” Uses of Clustering • Clustering provides a non-geometric (non-spatial) model of similarities and differences • number of groups and description of each • The three most common types of factoring are… • clustering people -- what we’ve looked at • clustering variables -- alternative to factoring • clustering stimuli -- alternative to MDScaling

  24. Clustervariables to find sets of variables that have “similar characteristics” X D “Variables” “Variables” “Variables” “Cases” “Cluster” “D” captures the similarities and differences among the variables which are summarized in “C” (Cluster membership), which provides the basis for deciding how many and what are the “sets” of variables C 53248. . 1 11122 . . 3 “Variables”

  25. Cluster stimuli to find sets of stimuli that have “similar characteristics” X D “Stimuli” “Variables” “Stimuli” “Stimuli” “Cluster” “D” captures the similarities and differences among the stimuli which are summarized in “C” (Cluster membership), which provides the basis for deciding how many and what are the “sets” of stimuli C 53248. . 1 11122 . . 3 “Stimuli”

  26. Independent multivariate models: “Alternative” Uses of MDScaling • Scaling provides a geometric (spatial) model of a pattern of similarities and dissimilarities • number of underlying dimensions and interpretation of each • The three types of scaling are… • Scaling stimuli -- what we’ve looked at • Scaling Variables -- an alternative to factoring • Scaling People -- an alternative to clustering

  27. Scale variables to find the “groups of variables” D Map “Items” 1 7 2 4 “Items” 3 6 5 8 “D” captures the patterns of similarities and dissimilarities among the items (can be from direct ratings or derived from “X”, more later) which are summarized in the “Map”, which provides the basis for deciding how many and what are the “rules” (dimensions) underlying the patterns of variable similarities and dissimilarities.

  28. Scale people to find the “dimensions of person’s similarities and dissimilarities” D Map “Cases” 1 2 7 4 “Cases” 5 3 6 8 “D” captures the patterns of similarities and dissimilarities among the people (can be from direct ratings or derived from “X”, more later) which are summarized in the “Map”, which provides the basis for deciding how many and what are the “rules” (dimensions) underlying the patterns of person’s similarities and dissimilarities.

  29. Intro to MDScaling Short History Purpose & Uses of MDS Steps in MDS Research Types of MDS Models/Analyses

  30. Short History Classical Psychophysics • Began as the search for the relationships between the physical world and the “inner life” • Got boring quickly -- in the name of scientific rigor •  (physical attribute) by  (psychological attributes) plots • Pretty sure this wasn’t the way to go ... • Unidimensional research of the multivariate world • Assumes we know the physical attributes and the psychological attributes that are important

  31. Short History • MDS was a “extension/rebellion” ofClassical Psychophysics • Sought some way to collect data to “capture” . . . • the attributes/dimensions that underlie “thought” • the values of stimuli (objects) on the dimensions/attributes • What might be the “basic data” for this process ?? • similarities/dissimilarities among the stimuli • A “solution” that can represent the information in those similarities would capture the “rules” underlying the decisions represented in those similarities • a kind of data reduction • but one that “reveals underlying structure”

  32. How MDScaling “Works” • The result of an MDS analysis is a “map” that positions the stimuli in a k-space based on the set of stimulus similarities • Ever see a map ??? Remember that “triangle” in the corner ??? Inter-City Distances Map of 8 Cities a b c d e f g b 55 c 35 35 d 45 75 40 e 10 60 40 45 f 35 80 55 30 30 g 45 30 10 45 50 60 h 80 40 40 75 85 95 35 h d g c f b a e

  33. MDS is Map-making, only backwards • Start with the pairwise distances (dissimilarities) • Assume some “k” (# of dimensions; 1-6 in SPSS) • Determine the positions of the stimuli in that k-space that best match the pairwise dissimilarities - the map • Assess how well the solution represents the dissimilarities • R² -- for the dissimilarities and the solution distances • larger R² indicates a “better” solution • Stress -- family of “badness of fit” indices • smaller stress indicates a “better” solution • If the solution is good, the stimulus positions “reveal” the structure/rules underlying the original dissimilarities

  34. Purpose and Uses of MDS • The purpose is to provide a spatial representation of the pairwise dissimilarities among a set of stimuli • the assumption is that by interpreting this “space” we can understand the bases (rules, attributes) of the similarities • Used to ask how people “think about” things • similarities and differences among stimuli may be the most basic process of “thinking about” • MDS has been used to reveal the “rules” or “bases of judgement” for a wide variety of domains and populations • can be used for composite, group comparison or individual differences types of analyses (more later)

  35. Purpose and Uses of MDS, cont. • One very important use of MDS is to “check-up” on stimuli you are planning use, especially if the differences among the stimuli are the intended IV manipulation • E.g., vignette studies -- often the IV is manipulated as the differences among the stories, cases, records • you are counting on (assuming) that you can anticipate… • what differences between the stories will influence the DV (those you build in as the IVs) • what differences between stories will not influence the DV (“unimportant” differences you built in to give the stories some “character” or “so they’re not all the same”)

  36. Purpose and Uses of MDS, cont. • Remember in Factoring we noted that sometimes we learn as much about the variables as about the factors? • The same is true in MDScaling • In addition to learning about the “underlying dimensions” of thought and comparisons, we often acquire unexpected information about individual stimuli • they often get positioned in unexpected places • e.g., “influence” • was the “root word” of 82 synonyms/antonyms • wasn’t anywhere near the centroid of the space • e.g., “felt” • “tactually flat” but “visually fluffy”

  37. Steps in an MDS Study • Select the stimulus domain & stimuli • be sure to “cover” the domain • want 8-10 stimuli per expected dimension (rules vary !!) • Select the population(s) of interest • Select a data collection procedure • direct scaling -- pairwise dissimilarities collected “first hand” • indirect scaling -- pairwise dissimilarities computes from a set of attribute ratings (assumes you know the attributes being used !!!) • often use both -- direct scaling data for the MDSolution and the indirect data for helping to interpret the “map”

  38. Steps in an MDS Study, cont. • Determine the # dimensions for the “best map” • “scree-like” plots using R² and Stress indices • stability & replicability • interpretability • Interpret the “map” • visual inspection • dimensional interpretation -- placing attribute vectors using multiple regression • neighborhood interpretation -- identifying clusters of stimuli • Plan the next study • what will be the result of deleting &/or adding stimuli ? • what will be the result of a different population ?

  39. Types of MDS models & analyses There are four major types of MDS analyses • Composite Scaling (ALSCAL) • Assumes everybody uses the same “rules” to form dissimilarities among the stimuli and that everybody quantifies stimuli the same using those rules • Individual Differences Scaling (INDSCAL) • Assumes everybody uses the same “rules” (or a subset of them) but provides for individual differences relative importance of those rules across people

  40. Types of MDS models & analyses, cont. There are four major types of MDS analyses • Group Comparison Analyses (need a priori groups) • Form a composite solution for each “group” and compare them -- looking for similarities and differences in “rules” or “quantification” across the groups • Group Identification Analyses (“like” clustering) • Identify groups of participants that have similar pairwise dissimilarity data -- these folks use similar rules/quantifications -- then identify the group

More Related