Curious Characters: From Virtual Worlds to Sentient Homes

Curious Characters: From Virtual Worlds to Sentient Homes Dr. Kathryn Merrick University of New South Wales Australian Defence Force Academy School of Engineering and Information Technology k.merrick@adfa.edu.au November, 2009 Machine Learning and Developmental Robotics Research Group

Overview • Motivated reinforcement learning • Curious characters in multiuser games • Curious robots • Motivated supervised learning • Curious places • Motivated reflex agents • Curious network security agents • Future directions

Curious Characters in Multiuser Gameswith Prof. Mary Lou Maher, National Science Foundation World of Warcraft (Blizzard)

Motivation Theories Alderfer, C., 1972. Existence, relatedness and growth. Free Press, New York. Atkinson, J.W. and Feather, N.T., 1966. A theory of achievement motivation. Wiley, New York. Berlyne, D.E., 1966. Exploration and curiosity. Science, 153: 25-33. Berlyne, D.E., 1970. Novelty, complexity and hedonic value. Perception and Psychophysics, 8: 279-286. Bindra, D., 1974. A unified account of classical conditioning and operant training. In: A. Black and W. Prokasy (Editors), Classical conditioning - current theory and research. Appleton-Century-Crofts, New York, USA, pp. 453 - 481. Csikszentmihalyi, M., 1996. Creativity: Flow and the Psychology of Discovery and Invention. HarperCollins Publisher, New York, NY. Deci, E. and Ryan, R., 1985. Intrinsic motivation and self-determination in human behaviour. Plenum Press, New York. Easterbrook, J.A., 1959. The effect of emotion on cue utilisation and the organisation of behaviour. Psychological Review, 66: 183-201. Geen, R.G., Beatty, W.W. and Arkin, R.M., 1984. Human motivation: physiological, behavioural and social approaches. Allyn and Bacon, Inc, Massachussets. Heider, F., 1958. The psychology of interpersonal relations. Wiley, New York. Attribution theory Hull, C.L., 1952. A behaviour system: an introduction to behaviour theory concerning the individual organism. Yale University Press, New Haven. Drive theory Hunt, J.M., 1975. Implications of sequential order and hierarchy in early psychological development. Exceptional Infant, 3. Kandel, E.R., Schwarz, J.H. and Jessell, T.M., 1995. Essentials of neural science and behaviour. Appleton and Lang, Norwalk. Maslow, A., 1954. Motivation and personality. Harper, New York. Maslow’s hierarchy of needs McFarland, D., 1995. Animal behavior. Longman, England. Motivational state theory Mook, D.G., 1987. Motivation: the organisation of action. W. W. Norton and Company, Inc, New York. Raynor, J.O., 1969. Future orientation and motivation of immediate activity: an elaboration of the theory of achievement motivation. Psychological Review, 76: 606-610. Sperber, D. and Wilson, D., 1995. Relevance: communication and cognition. Blackwell Publishing. Tolman, E.C., 1932. Purposive behaviour in animals and men. Century, New York. White, R.W., 1959. Motivation reconsidered: The concept of competence. Psychological Review, 66: 297-333. Wundt, W., 1910. Principles of physiologicalpsychology. Macmillan, New York.

“Problem Finding” plus Problem Solving (Saunders, 2001)

Motivated Reinforcement Learning • Learning from trial-and-error and intrinsic reward • Structures: • S(t) – Raw sensor data • O(t) – Observation of state • E(t) – Event (change) • M(t) – Motivated reward value • B(t) – Learned policy • A(t) – Action to execute Sensors S(t) Sensation O(t), E (t) Motivation M(t) AGENT Learning B(t) Activation A(t) Effectors ENVIRONMENT

Motivation as Curiosity SOM, K-means, SART network, etc C(t) = (Wundt, 1910; Berlyne, 1960; 1966; Stanley, 1976; Schmidhuber, 1991;Marsland et al., 2000; Saunders, 2001)

A Game Using MRL

Statistical Evaluation • Identifying learned tasks K as repeated events or observations • Behavioural variety: • Number of learned tasks • Behavioural complexity • Number of actions to complete task cv(K) =

Curious Reconfigurable Robotswith A/Prof. Elanor Huntington, Tom Scully, UNSW@ADFA • Function approximation and MRL • Neural networks • Adaptive resonance theory networks • Modelling behaviour cycles for: • Motivation • Evaluation • A toy that ‘comes alive’ as it is being constructed

Motivation and Behaviour Cycles (Ahlgren and Halberg, 1990) • Biological • Cognitive • Social Kolb learning cycle (Marsland et al., 2000) Socio-demographic cycles

Evaluation and Visualisation S(226) = (tacho:139.0, mov:100.0, red:0.0, green:80.0, blue:0.0) Merrick, K.: (2009) Evaluating Intrinsically Motivated Robots using Affordances and Point-Cloud Matrices, The Ninth International Conference on Epigenetic Robotics (EpiRob09), Venice, Italy, pp 105-112

Numerical Evaluation • Behavioural • Variety • Complexity • Stability Merrick, K.: (2010) Modeling Behavior Cycles as a Value System for Developmental Robots, Adaptive Behavior (to appear)

Curious Placeswith Prof. Mary Lou Maher and Dr. Rob Saunders, University of Sydney • Intelligent environments that adapt to support and enhance human activities by being curious about, and learning about those activities. • Consider the space as an immobile robot

Motivated Supervised Learning • Learning intrinsically motivated tasks by mimicking • Structures: • S(t) – Raw sensor data • O(t) – Observation of state • E(t) – Event (change) • X(t) – Example (state+action) • B(t) – Learned policy • A(t) – Action to execute Sensors S(t) Sensation O(t), X(t), E(t) Motivation AGENT O(t), X(t) O(t) Learning Activation B(t) A(t) Effectors ENVIRONMENT

Mon Tues Wed Thur Fri Evaluation and Case Studies Merrick, K. Shafi, K.: (2009) Agent Models for Self-Motivated Home Assistant Bots, International Symposium on Computational Models for Life Sciences, Sofia, Bulgaria [invited paper] (to appear).

Curious Network Security Agentswith Dr. Kamran Shafi, UNSW@ADFA • Curious agents combinethree measures to analyse stimuli (network data): • Similarity: clustering layer • Recency: habituating layer • Frequency: interest layer • Online, single-pass learners: • Potential for real-time operation • Unsupervised learners: • Potential to adapt to changes in network usage • Don’t require labelled data

Motivated Reflex Agents • Triggering intrinsically motivated reflexes • Structures: • S(t) – Raw sensor data • O(t) – Observation of state • E(t) – Event (change) • M(t) – Motivation value • A(t) – Action to execute Sensors S(t) Sensation O(t), E(t) Motivation AGENT M(t) Activation A(t) Effectors ENVIRONMENT

Domain Specific Evaluation

A Curious Tour Guide Robotwith Dayne Schmidt, UNSW@ADFA • Identifies interesting ‘artworks’ • Reflexively moves towards them while avoiding obstacles • Identifies interesting ‘artworks’ • Reflexively moves towards them while avoiding obstacles Machine Learning and Developmental Robotics Lab, UNSW@ADFA (Saunders, 2001)

Future Directions • Modelling motivation • Other individual models: biological, cognitive, social… • Unified models • Learning models for use with motivation • Recall and reuse: hierarchical models • Other forms of learning, planning… • Combined models • Evaluating intrinsically motivated behaviour

Questions and Discussion Dr. Kathryn Merrick k.merrick@adfa.edu.au http://www.itee.adfa.edu.au/~s3229187 Lecturer in Information Systems University of New South Wales Australian Defence Force Academy School of Engineering and Information Technology

Curious Characters: From Virtual Worlds to Sentient Homes