Approaches to Modeling and Learning User Preferences

Approaches to Modeling and Learning User Preferences Marie desJardins University of Maryland Baltimore County Presented at SRI International AI Center March 10, 2008 Joint work with Fusun Yaman, Michael Littman, and Kiri Wagstaff

Overview • Representing Preferences • Learning Planning Preferences • Preferences over Sets • Directions / Conclusions

Representing Preferences

What is a Preference? • (Partial) ordering over outcomes • Feature vector representation of “outcomes” (aka “objects) • Example: Taking a vacation. Features: • Who (alone / family) • Where (Orlando / Paris) • Flight type (nonstop / onestop / multistop ) • Cost (low / medium / high) • … • Languages: • Weighted utility function • CP-net • Lexicographic ordering

Weighted Utility Functions • Each value vij of feature fi has an associated utility uij • Utility Ujof object oj = <v1j, v2j, …, vkj>: • Uj = ∑i wj uij • Commonly used in preference elicitation • Easy to model • Independence of features is convenient • Flight example: • U(flight) = .8*u(Who) + .8*u(Cost)+.6*u(Where) + .4*u(Flight Type) …

CP-Nets • Conditional Preference Network • Intuitive, graphical representation of conditional preferences under a ceteris paribus (“all-else-being equal”)assumption I prefer to take a vacation with my family, rather than going alone If I am with my family, I prefer Orlando to Paris If I am alone, I prefer Paris to Orlando who family > alone family : Orlando > Paris alone : Paris > Orlando where

Induced Preference Graph • Every CP-net induces a preference graph on outcomes: • The partial ordering of outcomes is given by the transitive closure of the preference graph alone  Orlando who family > alone alone  Paris family : Orlando > Paris alone : Paris >Orlando family  Paris where family  Orlando

Lexicographic Orderings • Features are prioritized with a total ordering f1, …, fk • Each value of each feature is prioritized with a total ordering, vi1…vim • To compare o1 and o2: • Find the first feature in the feature ordering on which o1 and o2 differ • Choose the outcome with the preferred value for that feature • Travel example: • Who > Where > Cost > Where > Flight-Type > … • Family > Alone • Orlando > Florida … • Cheap > Expensive

Representation Tradeoffs • Each representation has some limitations • Additive utility functions can’t capture conditional preferences, and can’t easily represent “hard” constraints or preferences • CP-nets, in general, only give a partial ordering, can’t model integer/real features easily, and can’t capture tradeoffs • Lexicographic preferences can’t capture tradeoffs, and can’t represent conditional preferences

Learning Planning Preferences

Planning Algorithms Domain-independent Inputs: initial state, goal state, possible actions Domain-independent but not efficient Domain-specific Works for only one domain (Near-) optimal reasoning Very fast Domain-configurable Use additional planning knowledge to customize the search automatically Broadly applicable and efficient

Domain Knowledge for Planning Provide search control information Hierarchy of abstract actions (HTN operators) Logical formulas (e.g., temporal logic) Experts must provide planning knowledge May not be readily available Difficult to express knowledge declaratively

Learning Planning Knowledge Alternative: Learn planning knowledge by observation (i.e., from example plans) Possibly even learn from a single complex example DARPA’s Integrated Learning Program Our focus: Learn preferences at various decision points Charming Hybrid Adaptive Ranking Model Currently: Learns preferences over variable bindings Future: Learn goal and operator preferences

HTN: Hierarchical Task Network Objectives are specified as high-level tasks to be accomplished Methods describe how high-level tasks are decomposed down to primitive tasks travel(X,Y) travel(X,Y) short-distance travel payDriver rideTaxi(X,Y) getTaxi(X) travel(X,Y) High-level tasks long-distance travel HTN operators Primitive actions travel(Ay,Y) buyTicket(Ax,Ay) travel(X,Ax) fly(Ax,Ay)

CHARM: Charming Hybrid Adaptive Ranking Model Learns preferences in HTN methods Which objects to choose when using a particular method? Which flight to take? Which airport to choose? Which goal to select next during planning? Which method to choose to achieve a task? By plane or by train? Preferences are expressed as lexicographic orderings A natural choice for many (not all) planning domains

Summary of CHARM CHARM learns a preference rule for each method. Given: an HTN, initial state, and the plan tree Find: an ordering on variable values for each decision point (planning context) CHARM has two modes Gather training data for each method Orlando = (tropical, family-oriented, expensive) is preferred to Boise = (cold, outdoors-oriented, cheap) Learn preference rule in each method

Preference Rules A preference rule is a function that returns <, =, or >, given two objects represented as vectors of attributes. Assumption: Preference rules are lexicographic For every attribute there is a preferred value There is a total order on the attributes representing the order of importance A warm destination is preferred to a cold one. Among destinations of the same climate, an inexpensive one is better than an expensive one….

Learning Lexicographic Preference Models Existing algorithms return one of many models consistent with the data The worst case performance of such algorithms is worse than random selection Higher probability of poor performance if there are fewer training observations A novel democratic approach: Variable Voting Sample the possible consistent models Implicit sampling: models that satisfy certain properties are permitted to vote Preference decision is based on the majority of votes

Variable Voting Given a partial order, <, on the attributes and two objects, A and B: D={ attributes that are different in A and B } D*={ most salient attributes in D with respect to < } The object with the largest number of preferred values for the attributes in D* is the preferred object

Learning Variable Ranks • Initially, all attributes are equally important • Loop until ranks converge: • Given two objects, predict a winner using the current beliefs • If the prediction was wrong, decrease the importance of the attribute values that led to the wrong prediction • The importance of an attribute never goes beyond its actual place in the order of attributes • Mistake bounds algorithm, learns from its mistakes • Mistake bound is O( n2 ), where n is the number of attributes

Democracy vs. Autocracy VariableVoting

Preferences Over Sets

Complementarity Redundancy + + Preferences over Sets • Subset selection applications: • Remote sensing, sports teams, music playlists, planning • Ranking, like a search engine? • Doesn’t capture dependencies between items • Encode, apply, learn set-based preferences

Example: prefer images with with more rock than sky Rock: 25% Soil: 75% Sky: 0% Rock: 10% Soil: 50% Sky: 40% User Preferences • Depth: utility function (desirable values) • Diversity: variety and coverage • Geologist: near + far views (context)

Sky Soil Rock utility. Encoding User Preferences • DD-PREF: a language for expressing preferred depth and diversity, for sets Diversity Depth or ?

subset preference per-feature diversity (1 - skew) per-item utility Finding the Best Subset • Maximize • where subset valuation utility of subset s Depth diversity value of s Diversity

Learning Preferences from Examples • Hard for users to specify quantitative values (especially with more general quality functions) • Instead, adopt a machine learning approach • Users provide example sets with high valuation • System infers: • Utility functions • Desired diversity • Feature weights • Once trained, the system can select subsets of new data (blocks, images, songs, food)

% Sky % Rock Learning a Preference Model • Depth: utility functions • Probability density estimation: KDE (kernel density estimation) [Duda et al., 01] • Diversity: average of observed diversities • Feature weights: minimize difference between computed valuation and true valuation • BFGS bounded optimization [Gill et al., 81]

Lower baseline Results: Blocks World • Compute valuation of sets chosen by true preference, learned preference, and random selection • As more training sets are available, performance increases (learned approximates true) Mosaic Tower

Rover Image Experiments • Methodology • Six users: 2 geologists, 4 computer scientists • Five sets of 20 images each • Each user selects a subset of 5 images from each set • Evaluation • Learn preferences on (up to 4) examples,select a new subset from a held-out set • Metrics: • Valuation of the selected subset • Functional similarity between learned preferences

Learned utility functions: Sky Soil Rock Learned Preferences Subset of 5 images, chosen by a geologist, from 20 total

Subset Selection Subset of 5 images, chosen by a geologist, from 20 total 5 images chosen from 20 images, using greedy DD-Select and learned prefs 5 images chosen by the same geologist from the same 20 new images

Future Directions

Future Directions • Hybrid preference representation • Decision tree with lexicographic orderings at the leaves • Permits conditional preferences • How to learn the “splits” in the tree? • Support operator, goal orderings for planning • Incorporate concept of set-based preferences into planning domains Questions?

Approaches to Modeling and Learning User Preferences