**A Hierarchical Bayesian Look at Some Debates in Category** Learning Michael Lee (UCI) Wolf Vanpaemel (KU Leuven)

**Bayesian Statistics** • There are (at least) two ways Bayesian statistics can be applied to understand a cognitive modeling problem like the nature of categories • Use Bayes as a statistician would, to analyze models and data • Use Bayes as a theoretician would, as a working hypothesis about how the mind solves inference problems • This talk takes the first approach • How can hierarchical Bayesian methods help us relate models to data in better ways, and further our understanding of category representations?

**Debates and Models** • We are going to focus on two specific debates in the cognitive modeling literature • Exemplar vs prototype representations: What is the role and extent of abstraction in forming category representations? • Similarity vs rules: What is the role and extent of similarity in constraining category representations? • We are going to focus on two existing models • The Generalized Context Model (GCM: Nosofsky 1986) as an account of how a representation produces category learning behavior • The Varying Abstraction Model (VAM: Vanpaemel et al 2005) as an account of the types of possible category representations

**Hierarchical Bayesian Contributions** • Our basic goals are to show how hierarchical Bayesian analysis • Encourages theorizing at different levels of psychological abstraction • Can covert (hard and limited) model selection problems into (easier and more informative) parameter estimation problems • Yields useful additional information, especially when analyzing data from multiple experiments or tasks simultaneously • Gives one approach to developing theoretically-satisfying priors for different models

**VAM Representations** • The VAM model assumes categories are represented by merging all, some, or none of the original stimuli

**Generalized Context Model** • Calculate the distances between the point representing the presented stimulus, and the points representing the two categories • Use a generalization function to calculate similarities from these distances • Determine the probability of responding with each category decision according to the similarities

**Combining the VAM and GCM** • The first uses of the VAM combined it with the GCM, and make inferences about which representation people were using from their categorization behavior • Implicitly assumes a uniform prior over all the representations • The model selection problem of choosing between representations involves significant computation • No formal notion of the relationships between different representations • No account of where the representations came from

**Interpreting and Relating VAM Representations** • Blue and yellow are more similar than blue and red, and green is more sensible than gray

**Hierarchical Bayesian Extension** • Our hierarchical Bayesian extension adds an account, called the Merge Process, of how the VAM category representation are generated • Merge Process is driven by two parameters • Theta controls the degree of abstraction • Gamma controls the emphasis on similarity

**Merge Process** • Start with the exemplar representation • Do another merge with probability theta, otherwise finish • Calculate the similarity between the current representing points • Calculate the probability that each pair of representing points will be the ones to be merged • Return to step 2

**Indexing Representations** • High theta values encourage merging, and result in more abstracted (prototype-like) representations • High gamma values emphasize similarity in merging, so nearby stimuli are joined

**Priors for VAM Representations** • The hierarchical model automatically makes a prior prediction (or “inductive bias”) over each of the VAM representations • The inductive bias shown comes from the priors

**Hierarchical Bayesian Solutions** • Previous shortcomings are all addressed in some way by the hierarchical extension • “Implicitly assumes a uniform prior over all the representations” • Now have sensible prior coming from the merge process and the priors on its parameters • “The model selection problem of choosing between representations involves significant computation” • Now a problem of parameter estimation for theta and gamma at a higher level of abstraction • “No formal notion of the relationships between different representations” • Similar values of (theta, gamma) index similar representations • “No account of where the representations came from” • The Merge Process provides one

**Thirty Previously Studied Data Sets**

**Exemplar vs Prototype Representation** • Data sets show a range of inferred representations, spanning the exemplar (5, 12, 16, 23, 24) to prototype (4) spectrum, and theta captures this spectrum

**Uncertainty About Exemplar vs Prototype** • For a few data sets, more than one VAM representation had significant posterior mass • The model is uncertain about the degree of abstraction, and this is reflected in the posterior for theta

**Time Course of Representation** • There are two groups of three related data sets, measuring the beginning, middle and end of categorization behavior on the same task • There is a shift from more abstract to less abstract category representations in both cases • Captured by the specific representations in each case, but also (commensurately across experiments) by the change in theta

**Role of Similarity** • Some of the data sets relate to category learning tasks where subsets of subjects were asked/encouraged to use rules to form categories • These rules did not group similar stimuli, and so the gamma parameter detects the lack of emphasis on stimuli in abstraction

**Other Issues** • Data sets 3 and 4 relate to two different subjects doing the same task, and suggest individual differences • This would be expressed naturally by including an additional level in our hierarchical model • Data set 13 suggests an “prototype plus outlier” representation, which the Merge Process indexes, but only by looking at the joint posterior for (theta, gamma), because you need high values of both • Data sets 1 and 2 suggest an alternative or extension to the Merge Process that allow for the deletion of stimulus points in representing categories

**Conclusions** • Demonstrated one way of doing a hierarchical Bayesian analysis an existing model of category representation (the VAM) and of category learning (the GCM) • Can covert (hard and limited) model selection problems into (easier and more informative) parameter estimation problems • Gives one approach to developing theoretically-satisfying priors for different models • Yields useful additional information, especially when analyzing data from multiple experiments or tasks simultaneously • Encourages theorizing at different levels of psychological abstraction

**Thanks!** Questions?