Artificial Companions: Explorations in machine personality and dialogue

Artificial Companions: Explorations in machine personality and dialogue Yorick Wilks Computer Science, University of Sheffield and Oxford Internet Institute MLMI04, Martigny CH, June 2004

What the talk contains: • Two natural language technologies I work within: • Human dialogue modelling • Information extraction from the web • What drives NLP dialogue models: ML, speech? • Conversational agents as essential for • personalizing the web • making it tractable • Companions for the non-technical as a cosier kind of persistent agent • For niche groups, some of them non-technical, or handicapped • As an interface to the web • An an interface to their stored lives

Machine dialogue: problems with available “theory” • Dialogue the Cinderella of NLP • It can be vacuous: ‘dialogues are systems of turn-taking’ • Speech act analysis initially has led to implausibly deep levels of reasoning--you don’t need plans to sell an airticket. • For some researchers, dialogue theory is still a question of how best to deploy logic • Much conversation is not task orientated at all, nor does it have plausible info-states.

Important historical systems have all the modern traits and functionalities in miniature • Colby’s PARRY (Stanford, 1971) • Winograd’s SHRDLU (MIT 1971) • Perrault, Cohen, Allen’s speech act system (Toronto, 1979)

Colby’s PARRY • Perhaps best ever performance, many users, robust, but not a normal subject (I.e. a paranoid) • primitive individual models, some control of dialogue process; but it had lots to say! • Primitive simulation of intentionality, and emotion in output choice • not syntax analysis but fast pattern matching • Far, far, better than ELIZA

PARRY conversation • Have you been hospitalized before? • THIS IS THE FIRST TIME • How long have you been there? • ABOUT TWO WEEKS • Any headaches? • MY HEALTH IS FINE • Are you have memory difficulties? • JUST A FEW

Does the little block that the hatched pyramid’s support supports support anything black?

Perrault, Cohen & Allen at Toronto • System has domain knowledge and reasoning power • was the first to assign speech act labels to dialogue items • User must have one of two goals, meeting or catching a train • Passenger/User: Do you know when the Windsor train arrives? • This is labelled as a REQUEST not a REQUEST-INFORM (Y/N) because the system knows the user knows it knows! • but again virtually no performance

1970’s division of approaches to machine conversation. • Domain-dependent systems with coded world knowledge and some parsing and reasoning • VERSUS • Wide shallow systems with little knowledge and high performance • Published AI academic systems all in first group • Only the second group (I.e. PARRY) performed at all

Academic systems have moved towards performance • Best is Traum’s TRAINS system--descendant of Allen’s work (Toronto-Rochester tradition) • Semi-empiricist: uses corpora but retains reasoning • Also TRINDIKIT at Gothenburg/Edinburgh; uses a large rule base • Pressure from ASR advances • Pressure from the Loebner competition to perform?

AI people joined the performers:the Loebner Competition • Well designed competition over the last ten years---essentially a modified Turing test. • Domain dependent entries till 1998, then domain free. • The game is for journalists to separate and rank human and machine partners • Search ‘Loebner’ on web for full transcripts

Sheffield University/Intelligent Systems won in 1998! • The CONVERSE program pretended to be Catherine, a 34-year old British journalist living in New York (to explain any usage problems). • She could talk about 80 appropriate topics • (80=domain free???) clothes, shopping, food, abortion, fitness, sex….etc.

Loebner Competition 1998---Sheffield’s winning entry! PROGRAM[10:18:04]Hi, my name is Catherine, what’s yours? JUDGE04[10:18:32]My name is Elliott. PROGRAM[10:18:34]Well, it’s nice to meet you, Elliott. PROGRAM[10:18:48]Did you see that story on CNN last night about the lesbian PROGRAM[10:19:04]couple who came out at a White House party on Sunday? JUDGE04[10:19:55]Yes, I did. I think it may all be a publicity stunt for Ellen. PROGRAM[10:20:01]Ellen Degeneres was one of them – she was kissing her lover PROGRAM[10:20:11]in front of Bill Clinton. Do you think he was very PROGRAM[10:20:25]embarrassed? JUDGE04[10:21:00]I think he want the gay vote. He will do anything to get it and keep it. JUDGE04[10:21:15]He does not have a conservative base. PROGRAM[10:21:21] PROGRAM[10:21:21]Are you still a Friend of Bill Clinton?

The CONVERSE prototype 1998 • Push-me-pull-you architecture • strong driving top-down scripts (80+) in a re-enterable network with complex output functions • bottom-up parsing of user input adapted from statistical prose parser • minimal models of individuals • contained Wordnet and Collins Proper Names • some machine learning from past Loebners + BNC • It owed more to PARRY than to Toronto!

Sheffield dialogue circa 2002 • Empirical corpus-based stochastic dialogue “grammar” that maps utterances directly to dialogue acts and uses IE to match concepts with templates to provide semantic content. • A better virtual machine for script-like (DAF) objects encapsulating both the domain moves and conversational strategy (cf. PARRY and Grosz) to maintain the push-pull (alias mixed-initative) approach. • The Dialogue Action Frames provide domain context, and the stack topic change and reaccess to partially fulfilled DAFs

Resources vs. highest level structure • Need for resources to build belief system representations and quasi-linguistic models of dialogue structure, scripts etc., and to provide a base for learning optimal Dialogue Act assignments • A model of speakers, incrementally reaching VIEWGEN style ascription of belief procedures to give dialogue act & reasoning functionality • Cf A. ballim & Y. Wilks, 1991 Artificial Believers, Erlbaum.

How this research is funded • AMITIES is a EU-US cooperative R & D project (2001-2005) to automate call centers. • University of Sheffield (EU prime) • SUNY Albany (US prime) • Duke U. (US) • LIMSI Paris (Fr) • IBM (US) • COMIC is an EU R & D project ( 2001-2005) to model multimodal dialogue • MaxPlanck Inst (Nijmegen) (Coordiantor) • University of Edinburgh • MaxPlanck Inst (Tuebingen) • KUL Nijmegen • University of Sheffield • ViSoft GMBH

COMIC • Three-year project • Focussed on Multi Modal Dialogue • Speech and pen input/output • Bathroom Design Application • Helps the customer to make bathroom design decisions • Will be based on existing bathroom design software • Spoken output is done with a talking head which includes facial expressions etc.

Design of a Dialogue Action Manager • General purpose DAM where domain dependent features are separated from the control structure. • The domain dependent features are stored as Dialogue Action Frames (DAFs) which are similar to Augmented Transition Networks (ATNs) • The DAFs represent general purpose Dialogue manoeuvres as well as application specific knowledge. • The control mechanism is based on a basic stack structure where DAFs are pushed and popped during the course of a user session. • The control mechanism together with the DAFs provide a flexible means for guiding the user through the system goals (allowing for topic change and barge-in where needed). • “User push” is given by the ability to suspend and stack a new DAF at any point (for a topic or any user maneuver) • “System push” is given by the pre-stacked DAFs corresponding to what the system wants to show or elicit. • Research question of how much of the stacks unpopped DAFs can/should be reaccessed (cf. Grosz limits on reachability).

Dialogue Management • DAFs model the individual topics and conversational manoeuvres in the application domain.. • The stack structure will be preloaded with those DAFs which are necessary for the COMIC bathroom design task and the dialogue ends when the Goodbye DAF is popped. • DAFs and stack interpreters together control the flow of the dialogue Greeting DAF Room measurement DAF Style DAF … Good-bye DAF

DAF example

Current work: Learning to segment the dialogue corpora • Segmenting the corpora we have with a range of tiling-style and MDL algorithms ( by topic and by strategic maneuver) • To segment it plausibly, hopefully into segments that correspond to structures for DM (I.e. Dialogue Action Frames) • Being done on the annotated corpus (i.e. a corpus word model) and on the corpus annotated by Information Extraction semantic tags (a semantic model of the corpus)

AMITIÉS Objectives • Call Center/Customer Access Automation • multilingual access to customer information and services. • Now: speech over the telephone (call centers); • Later: speech, text and pointing over the Internet (e-service) • Multilingual natural language dialogue • unscripted, spontaneous conversation • models derived from real call center data • tested and verified in real call center environment • Showcase applications at real call centers • financial services centers (English, French, German) • expand into public service & gov. applications (US & EC)

Corpora • GE Financial call centres • 1k English calls (transcribed, annotated) • 1k French calls (transcribed, annotated) • IBM software support call centre • 5k English calls (transcribed) • 5k French calls (transcribed) • AGF insurance claim call centre • 5k French calls (recording) • VIEL et CIE • 100 French calls (transcribed, annotated)

AMITIÉS System • Data driven dialogue strategy • Similar to Colorado’s communicator system • Statistical – a dialogue transition graph is derived from a large body of transcribed, annotated conversations • Task and ID identification • Task identification: automatically trained vector-based approach (Chu-Carroll & Carpenter 1999)

Sheffield does the post ASR fusion in AMITIES • Language Understanding • Use of ANNIE IE for robust extraction • Partial matching (creates list of possible entities) • Dialogue Act Classifier • Recognise domain-independent dialogue acts • Works well (~86% accuracy) for subset of Dialogue Act labels

Evaluation • 10 native speakers of English • Each made 9 calls to the system, following scenarios they were given • Overall call success was 70% • Compare this to communicator scores of 56% • Similar number of concepts/scenario (~9) • Word Error Rates: • 17% for successful calls • 22% for failed calls

Evaluation: Interesting Numbers • Avg. num turns/dialogue: 18.28 • Avg. num words/user turn: 6.89 • High in comparison to communicator scores, reflecting • Lengthier response to open prompts • Responses to requests for multiple attributes • Greater user initiative • Avg. user satisfaction scores: 20.45 • (range 5-25)

Learning to tag for Dialogue Acts: initial work • Samuels et al.(1998) TBL learning on n-gram DA cues, Verbmobil corpus (75%) • Stolcke et al. (2000) full language modelling (including DA sequences), more complex Switchboard corpus (71%)

Starting with a naive classifier for DAs • Direct predictivity of DAs by n-grams as a preprocess to any ML algorithm. • Get P(d|n) for all 1-4 word n-grams and the DA set over the Switchboard corpus, and take DA indicated by n-gram with highest predictivity (threshold for probability levels) • Do 10-fold cross validation (which lowers scores) • Gives a best cross validated score of around 63% over Switchboard but using only some of the data Stolcke needed. • Single highest score currently 71.2% - higher than that reported in Stolke • Up to 86% wiuth small (~6) DA set

Extending the pretagging with TBL • Gives 66% (Stolcke’s 71%) over the Switchboard data, but only 3% is due to TBL (rather than the naive classifier). • Samuels unable to see what TBL is doing for him. • This is just a base for a range of more complex ML algorithms (e.g. WEKA).

Dialogue Research Challenges • Will a Dialogue manager raise the DA 75%/85% ceiling top-down? • Multimodal dialogue managers. Are they completely independent of the modality? Are they really language independent? • What is the best virtual machine for running a dialogue engine? Do DAFs+stack provide a robust and efficient mechanism for doing Dialogue Management e.g. topic change? (vs. simple rule systems) • Will they offer any interesting discoveries on stack access to, and discarding, incomplete topics (cf. Stacks and syntax). • Applying machine learning to transcripts so as to determine the content of dialogue management, i.e. the scope and content of candidate DAFs. • Can the state set of DAFs and a stack be trained with reinforcement learning (like a Finite State matrix)? • Can we add a strong belief/planning component to this and populate it empirically? • Fusion with QA functionality

What is the most structure that might be needed and how much of it can be learned? • Steve Young (Cambridge) says learn all modules and no need for rich a priori structures (cf MT history and Jelinek at IBM) • Availability of data (dialogue is unlike MT)? • Learning to partition the data into structures. • Learing the semantic + speech act interpretation of inputs alone has now reached a (low) ceiling (75%/85%).

Young’s strategy not quite like Jelinek’s MT strategy of 1989! • Which was non/anti-linguistic with no intermediate representations hypothesised • Young assumes rougly the same intermediate objects as we do but in very simplified forms. • The aim to to obtain training data for all of them so the whole process becomes a single Partially Observable Markov model. • It remains unclear how to train complex state models that may not represent tasks, let alone belief and intention models.

There are now four not two competing approaches to machine dialogue in NLP: • Logic-based systems with reasoning (traditional and still unvalidated by performance) • Extensions of speech engineering methods, machine learning and no structure (new) • Simple handcoded finite state systems in VoiceXML (Chatbots and commercial systems) • Rational hybrids based on structure and machine learning.

Modes of dialogue with machine agents • Current mode of phone/multimodal interactions at terminals. • The internet (possibly becoming the semantic web) will be for machine agents that understand its content, and with which users dialogue: e.g Find me the best camera under £500. • Interaction with mobile phone agents (more or less monomodal) • Some or all of these services as part of function of persistent, more personal, cosy, lifelong Companion agents.

The Companions: a new economic and social goal for dialogue systems

An idea for integrating the dialogue research agenda in a new style of application... • That meets social and economic needs • That is not simply a product but everyone will want one if it succeeds • That cannot be done now but could in a few years by a series of staged prototypes • That modularises easily for large project management, and whose modules cover the research issues. • Whose speech and language technology components are now basically available

A series of intelligent and sociable COMPANIONS • The SeniorCompanion • The EU will have more and more old people who find technological life hard to handle, but will have access to funds • The SC will sit beside you on the sofa but be easy to carry about--like a furry handbag--not a robot • It will explain the plots of TV programs and help choose them for you • It will know you and what you like and don’t • It wills send your messages, make calls and summon emergency help • It will debrief your life.

Other COMPANIONS • The JuniorCompanion • Teaches and advises, maybe from a backpack • Warns of dangerous situations • Helps with homework and web search • Helps with languages • Always knows where the child is • Explains ambient signals and information • It’s what e-learning might really mean!

The Senior Companion is a major technical and social challenge • It could represent old people as their agents and help in difficult situations e.g. with landlords, or guess when to summon human assistance • It could debrief an elderly user about events and memories in their lives • It could aid them to organise their life-memories (this is now hard!)(see Lifelog and Memories for Life) • It would be a repository for relatives later • Has « Loebner chat aspects » as well as information--it is to divert, like a pet, not just inform • It is a persistent and personal social agent interfacing with Semantic Web agents

Other issues for Companions we can hardly begin to formulate: • Companion identity as an issue that can be settled many ways--- • like that of the owner’s web identity---- now a hot issue? • Responsibilities of Companion agents--who to? • Communications between agents and our access to them • Are simulations of emotional behaviour or politeness desirable in a Companion? • Protection of the vulnerable (young and old here) • What happens to your Companion when you are gone?

Companions and the Web • A new kind of agent as the answer to a passive web • The web/internet must become more personal to be tractable, as it gets bigger (and more structured or unstructured?) • Personal agents will need to be autonomous and trusted (like space craft on missions) • But also personal and persistent, particularly for large sections of populations now largely excluded from the web. • The semantic web is a start to structure the web for comprehension and activity, but web agents are currently abstract and transitory. • The old are a good group to start with (growing and with funds).

The technologies for a Companion are all there already • ASR for a single user (but may be dysarthric) • Ascribing personality? remember Tamagochi? • Quite intelligent people rushed home to feed one (and later Furby) even though they knew it was a simple empty mechnaism. • And Tamaogochi could not even talk! • People with pets live longer. • Wouldn’t you like a warm pet to remind you what happened in the last episode of your favourite TV soap? • No, OK, but perhaps millions of your compatriots would?!

This isnt just about furry talking handbags on sofas, but any persistent and personalised entity that will interface to information sources: phones above all, and for dealing with the web in a more personal manner.‘..claim the internet is killing their trade because customers…seem to prefer an electronic serf with limitless memory and no conversation.’ (Guardian 8.11.03)

Conclusions • Companions are a plausible binding concept for exploring and evaluating a richer concept of human-machine interaction (useful too!!): • Interactions beyond simple task-driven dialogues. • That require more interesting theories underpinning them, even ones we cannot immediately see how to reinforce/learn. • Interactions with persistent personality, affect, emotion, interesting beliefs and goals • Above all, we need a more sophisticated and generally accepted evaluation regime

Artificial Companions: Explorations in machine personality and dialogue

Artificial Companions: Explorations in machine personality and dialogue

Presentation Transcript

companions

Explorations in Artificial Intelligence

CS-INFO 372: Explorations in Artificial Intelligence

Spoken Dialogue Systems: Human and Machine

Explorations in Artificial Intelligence

Human-Machine Dialogue Espere and Reality

Explorations in Economics

Explorations in Artificial Intelligence

Explorations in Artificial Intelligence

Explorations in Artificial Intelligence

Explorations in Economics

Explorations in Artificial Intelligence

Explorations in Artificial Intelligence

Explorations in Artificial Intelligence

Explorations in Artificial Intelligence

Explorations in Artificial Intelligence

Artificial Companions (and CALL?)

Spoken Dialogue Systems: Human and Machine

Explorations in Artificial Intelligence

Explorations in Artificial Intelligence

Artificial Intelligence and Machine Learning

Machine learning and artificial intelligence