Cheap User Modeling for Adaptive Systems

Cheap User Modeling for Adaptive Systems Presented by: Frank Hines Topics in CS Spring 2011 Primary reference: Orwant, J. (1996). For want of a bit the user was lost: Cheap user modeling. IBM Systems Journal,35(3,4), 398-416.

Limitless Information 100’s of channels

One Size Fits All? “People have limited cognitive time they want to spend on picking a movie.” - Reed Hastings, CEO Netflix

Information Overload! • Paradox of choice • Increased dissatisfaction • Increased fatigue • Increased anxiety • Lowered productivity • Lowered concentration • Lowered quality Information Overload

Can we Limit to the Most Relevant Info? Learning Toolbox • User Modeling NOT strictly content filtering! • Timing/Performance • Prioritization • Formatting User Models {U1,U2} Processing & Filtering Presentation U1 {a,c,d,f} Sensors Content {a,b,c,d,e,f} Presentation U2 {b,c,e} Doppelgänger

What is meant by adaptation? • What is a user model? • What can we predict? • Just how predictable are we? Overview

Adaptation • Adaptation is a sign of intelligence • Adaptation in nature • Usability vs. Personalization Commonalities Differences Current Software

Adaptation in Software Newsmap “One of the worst software design blunders in the annals of computing” – Smithsonian Magazine Jadeite

Adaptation in Conversation Human-Human interaction (discourse) • Human-Computer interaction • Vocabulary (age) • Speech volume (noise) • Speech rate (time pressure) • Syntactic structure (cultural affiliation) • Topic (interests, knowledge)

Models “The sciences do not try to explain, they hardly even try to interpret, they mainly make models.” - John von Neumann

User Model Models typically include: • Knowledge • Beliefs • Goals • Plans • Schedules • Behaviors • Abilities • Preferences • Framework to “simulate” a user & predict that user’s actions • A mathematic relationship among variables • NOT necessarily a cognitive representation • GRUNDY (Rich, 1979) - book recommendations from personality traits

Events • Interest • Location • Behavior What can we predict?

How can we predict an event? f(n-1, n-2, …, n-j) f(n-1) f(n-1, n-2)

Linear Prediction • Discrete time series • Predicts future values from linear function of past values • Canonical example: Tidal activity • Other examples: • Sunspots • Speech processing • Stock prices • Branch prediction • Oil detection

Linear Prediction 3. Compute next observation sn 1. Compute autocorrelation vector R 2. Compute autocorrelation coefficients ak

Correlation Shifted by one observation No Shift Shifted by two observations Shifted by n observations

Use in Doppelgänger • Relevant news chosen and collated beforehand • Tailored to length of time user has available • Can determine when user is expected to read email • Problems: • Confidence decreases as predictions advance into future Inter-arrival time Session duration

How can we predict interest? • Sports articles 4 out of 10 ‘Likes’ • Technology articles 9 out of 10 ‘Likes’

News Topic Interest by Section

Beta Distribution • Description of uncertainty of a probability • Based on Hits & Misses • Normalizes function so area under curve = 1 Confidence Mean Rating Variance

Rating and Confidence H=1, M =1 H=2, M =2 H=5, M =5 H=10, M =10 As observations increase, confidence (height) increases and variance (width) decreases H=5, M =25 H=25, M =5 Rating skews relative to hit/miss distribution

Use in Doppelgänger • Measuring topical interest • Problems: • Equal weight on ratings over time • Binary classification of topics • Credit assignment when multiple classification • Binary feedback of yes/no

How can we keep track of location/state? • We can use Markov Models

Markov Models 1 .6 1.0 .3 • Directed Graph • Set of states • Initial probabilities • Transition probabilities • For each discrete time step, state advances • Stationary random process • Markov property: No memory of past states traversed 0 3 .9 .5 .2 .4 2 .1

Modeling a Student Probability Transition Matrix E H ST T SL E H ST T SL

Uses in Doppelgänger • Physical location tracking • Printing priority • Phone call routing • Pre-fetching content • Website page navigation Media Lab Locations

What to do if we can not observe the underlying states? Can we infer state based on observable output? • Yes, we can use “Hidden” Markov Models! • We can use this technique to infer behavior

Hidden States Hidden Markov Models Symbol Emission Probabilities x

Extremely Useful Technique • Speech Recognition • Part of Speech Tagging • DNA Sequencing • Biological Particle Identification • Too many other areas to list!

Questions We Can Ask • What is the probability of a symbol sequence? • What is the most likely state sequence to generate a symbol sequence? • What are the most likely transition/emission probabilities that maximize a symbol sequence? Forward Algorithm (evaluation) Viterbi Algorithm (decoding) Baum-Welch Algorithm (learning)

Forward Algorithm • But, exponential # of state sequences • How do we solve in polynomial time? Dynamic Programming: Forward Algorithm Output symbols x1 x2 x3 x4 s1 s2 s3

Viterbi Algorithm • Via dynamic programming (similar to Forward) • Instead of summing all previous paths, only max probability stored • Store backpointer at each step for path reconstruction x1 x2 x3 x4 s1 Most probable state sequence: s2, s1, s3, s2 s2 s3

Use in Doppelgänger • Determine the “working” (i.e., psychological) state • Class of task being performed • More importantly, how much attention is demanded Hacking HMM Output symbols Hidden States

What do we do if we do not have enough data about a particular user? Substitute small amount of information from many other users

Cluster Analysis • More computationally expensive than previous tools • But doesn’t change as often • Useful when little/no info about a user • Based on correlations between users • Construct communities • Gather a few bits from many people • Similar to popular “collaborative filtering” techniques

K-Means Clustering

Prediction Toolbox • Linear Prediction • Events • Beta Distribution • Interest • Markov Model • Location • Hidden Markov Model • Behavior • Cluster Analysis • When all else fails

Just how predictable are we? • Netflix competition (2006) • Improve recommendation algorithm (Cinematch) by 10% for $1,000,000 • Winner: BellKor’s Pragmatic Chaos • Solution: Independent convergence • Fusing 107 independent algorithmic predictions

The ‘Napoleon Dynamite’ Effect “Human beings are very quirky and individualistic, and wonderfully idiosyncratic. And while I love that about human beings, it makes it hard to figure out what they like.” - Reed Hastings, CEO of Netflix

Criticisms of Primary Article • Empirical evaluation of techniques? • vs. other techniques? • vs. other cheap or expensive? • vs. non-adaptive systems? • Concessions: • Orwant’s motivation: galvanize cheap user modeling techniques • Techniques validated in other realms and in industry

References • Orwant, J. (1996). For want of a bit the user was lost: Cheap user modeling. IBM Systems Journal, 35(3,4), 398-416. • Makhoul, J. (1975). Linear Prediction: A Tutorial Review. Proceedings of the IEEE, 63(4), 561-580. • Rabiner, L.R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257-286. • Singh, V., Marinescu, D.C., & Baker, T.S. (2004). Image Segmentation for Automatic Particle Identification in Electron Micrographs Based on Hidden Markov Random Field Models and Expectation Maximization. Journal of Structural Biology, 145, 123-141. • Many other references not shown here • If interested, email me atfrankHines@knights.ucf.edu

Jon Orwant • Ph.D. • C.T.O. • Engineering Mgr.

Sharing Standards & Privacy • Protocol development • User Markup Language • Passive sensors as an invasion of privacy • Informed consent • Access to personal data • Accessor keywords • Access Control Lists

Cheap User Modeling for Adaptive Systems