Spoken Language Understanding, the Research/Industry Chasm

Spoken Language Understanding,the Research/Industry Chasm Roberto Pieraccini IBM T.J.Watson Research Center rpieracc@us.ibm.com

INDUSTRY FSM based SLU Call Routing RESEARCH Call Routing Sentence Classification Statistical Parsing Phrase Structure ATN, Semantic Grammars mainly for understanding text Robust Parsing Spontaneous Speech FSM based SLU 2000 1990 1970 1980 DARPA RESOURCEMANAGEMENT ARPA SUR COMMUNICATOR ATIS A Brief History of SLU VoiceXML SRGS

COMMERCIAL SLU Mostly directed dialog 100s of deployed systems Lots of proprietary data Customer driven tasks Task completion evaluation Revenue based on license or per-minute SLU RESEARCH Open NL understanding Few deployed systems Little data available Artificial tasks Lack of evaluation paradigm Little funding for SLU research

EuroSpeech 2003 – Paper Breakdown BASIC: Signal Processing, Speech Modeling, Acoustics, Speech Enhancement, Prosody, Emotions, Speech Coding, Corpora, Phonetics NOISE: Noise Robustness, Robust ASR, ENABLING: Speech Recognition, Synthesis, Language Modeling, Speaker/Language ID, Speaker Verif. APP: non Dialog applications DIALOG: Dialog and Multimodal systems NL: Summarization, Title extraction, topic detection, NE recognition, ... TRANS: Speech to Speech Translation UND: Spoken Language Understanding Spoken Language Understanding Papers : 24/800 = 3% 14 Academic, 10 Industrial

SLU is difficult

SLU is difficult to evaluate • End-to-end evaluation • Based on task completion measures • Needs the full conversational system • Needs real, motivated users • Semantic evaluation • Based on semantic annotation • Costly • Subjective • Needs interpretation principles • Highly domain/application dependent

The ATIS evaluation I am flying between New York and Washington tomorrow, early in the afternoon • Systems evaluated on the basis of the data retrieved from the relational database • Reference min and max answers

The ATIS evaluation I am flying between New York and Washington tomorrow, early in the afternoon • Systems evaluated on the basis of the data retrieved from the relational database • Reference min and max answers • Evaluation regulated by Principles of Interpretation (PofI) • Edited by the PofI committee over the 5 years of the project • Regular weekly meetings • About 100 principles

The ATIS evaluation I am flying between New York and Washington tomorrow, early in the afternoon • Systems evaluated on the basis of the data retrieved from the relational database • Reference min and max answers • Evaluation regulated by Principles of Interpretation (PofI) • Edited by the PofI committee over the 5 years of the project • Regular weekly meetings • About 100 principles 2.2.1 A flight "between X and Y" means a flight "from X to Y".

The ATIS evaluation I am flying between New York and Washington tomorrow, early in the afternoon • Systems evaluated on the basis of the data retrieved from the relational database • Reference min and max answers • Evaluation regulated by Principles of Interpretation (PofI) • Edited by the PofI committee over the 5 years of the project • Regular weekly meetings • About 100 principles 2.2.8 The location of a departure, stop, or arrival should always be taken to be an airport.

The ATIS evaluation I am flying between New York and Washington tomorrow, early in the afternoon • Systems evaluated on the basis of the data retrieved from the relational database • Reference min and max answers • Evaluation regulated by Principles of Interpretation (PofI) • Edited by the PofI committee over the 5 years of the project • Regular weekly meetings • About 100 principles 2.2.3.3 "Stopovers" will mean "stops" unless the context clearly indicates that the subject intended "stopover", as in "Can I make a two day stopover on that flight?". In that case the query is answered using the stopover column of the restrictions table.

The ATIS evaluation I am flying between New York and Washington tomorrow, early in the afternoon • Systems evaluated on the basis of the data retrieved from the relational database • Reference min and max answers • Evaluation regulated by Principles of Interpretation (PofI) • Edited by the PofI committee over the 5 years of the project • Regular weekly meetings • About 100 principles 2.2.6 A "red-eye" flight is one that leaves between 9 P.M. and 3 A.M. and arrives between 5 A.M. and 12 noon.

The ATIS evaluation I am flying between New York and Washington tomorrow, early in the afternoon • Systems evaluated on the basis of the data retrieved from the relational database • Reference min and max answers • Evaluation regulated by Principles of Interpretation (PofI) • Edited by the PofI committee over the 5 years of the project • Regular weekly meetings • About 100 principles morning 0000 1200 afternoon 1200 1800 evening 1800 2200 day 600 1800 night 1800 600 early morning 0000 800 mid-morning 800 1000

The ATIS evaluation I am flying between New York and Washington tomorrow, early in the afternoon • Systems evaluated on the basis of the data retrieved from the relational database • Reference min and max answers • Evaluation regulated by Principles of Interpretation (PofI) • Edited by the PofI committee over the 5 years of the project • Regular weekly meetings • About 100 principles INCLUDES TERM ENDPOINTS? before T1 No after T1 No between T1 and T2 Yes arriving by T1 Yes departing by T1 Yes periods of the day Yes

Do we need SLU in commercial applications? • Disfluences = Self corrections, False starts, Repetitions, Filled pauses • Disfluence rate more than doubles going from constrained to unconstrained interactions. • Disfluence rate grows linearly with length of utterance S. Oviatt, "Predicting spoken disfluences during human-computer interaction“ Computer Speech and Language, 1995, 9:19--35.

Do we need SLU in commercial applications? • Apps with one time users do poorly with open prompt systems • Apps with repeat users do with open prompts almost as well as with directed dialog. OPEN: What would you like to do? DIRECTED: ...choose from the following options: web password reset, course enrollment, direct deposit or benefits. S.M. Witt, J.D. Williams " Two Studies of Open vs. Directed Dialog Strategies in Spoken Dialog Systems,“ Proc. of EUROSPEECH 2003, Geneva, CH, September 2003

AVERAGE NUMBER OF WORDS PER SENTENCE 2.9 6.0 2.0 Do we need SLU in commercial applications? Human to Human – AMEX (SRI) (2082) DARPA Communicator Data: Dec 2000 (11168) SpeechWorks deployed applications - Directed dialog + NL (136447)

Do we need SLU in commercial applications? Yes, but it depends on the application

Simple Phrases Mixed Initiative Natural Language Mixed Initiative HELP DESK PROBLEM SOLVING ROUTING PIZZA ORDERING FLIGHT STATUS STOCK TRADING BANKING Natural Language System Initiative Simple Phrases System Initiative Do we need SLU in commercial applications? Dialog Structure (initiative) Sentence Structure (natural-language-ness)

An architectural chasm?

Research Conversational Architecture SPEECH RECOGNIZER NATURAL LANGUAGE UNDERSTANDING DIALOGMANAGER Language Model Semantic Model

Commercial Conversational Architecture Application Server VoiceXML Browser GRAMMAR 3 GRAMMAR 2 ASR result Grammar 5 Grammar 4 Grammar 3 Grammar 2 Grammar 1

direction = ""; • if (origin == "JFK" && destination == "BOS") { • direction = "north • } • elseif(origin == "BOS" && destination =="JFK") { • direction = "south"; • } $ROOT $ITINERARY origin =airport; destination = "JFK" $FROM $TO origin = "BOS" $AIRPORT $AIRPORT destination = airport airport = "JFK" airport = "BOS"; $BOS $NY airport = "JFK" From Boston to New York airport = "BOS"; Current industrial SLU $ROOT = $ITINERARY; $ITINERARY = $FROM $TO; $FROM = from $AIRPORT; $TO = to $AIRPORT; $NY = (new york) | (J F K ) | kennedy $BOS = boston | logan $AIRPORT = ($NY | $BOS) [airport] direction = "south"

SGRS Standard for grammars <?xml version='1.0' encoding='ISO-8859-1'?> <grammar version='1.0' xml:lang='en-us' root="ROOT"> <rule id="ROOT" scope="public"> <ruleref uri="#ITINERARY" tag=" direction = ""; if (ITINERARY.origin == "JFK" && ITINERARY.destination == "BOS") { direction = "north; } elseif(ITINERARY.origin == "BOS" && ITINERARY.destination =="JFK") { direction = "south";}"/> </rule> <rule id="ITINERARY" scope="public"> <ruleref uri="#FROM" tag="origin = FROM.airport;"/> <ruleref uri="#TO" tag="destination = TO.airport; "/> </rule> <rule id="FROM"> <item>from</item> <ruleref uri="#AIRPORT" tag="airport=AIRPORT.airport;"/> </rule> <rule id="NY"> <one-of tag="airport="JFK"/> <item>new york</item> <item>JFK</item> <item>kennedy</item> </one-of>

Difficult problems for commercial systems • No data for training in the design/development phase • System development with no data • Tools for fast grammar handcrafting • Tools for content word normalization/speech-ification • Oodles of data after deployment • Tools for automatic or semi-automatic adaptation/learning

Content Word Variations The problem of content words Sentence structure I need to go to Phoenix from New York leaving on February 4th On February 4th leaving from New York and going to Phoenix I need to go from New York to Phoenix on February 4th Newark Boston Denver Dallas Baltimore San Francisco Los Angeles Philadelphia … …

The problem of content words • Large lists of content words need to have priors • How to estimate priors with no data (or even if you have data?)e.g. airport names, flight numbers, street names • Large lists of content words often come from proprietary databases • Spelling to Phonemes • Acronym expansion • Word normalization • 14" display w/ anti-glr scrn • Synonym/paraphrases generation • A fourteen inches display with anti-glare screen • A display of fourteen inches size with an anti-reflection screen

Flight 679 Day Mon Area Code 1 Day Tue Area Code 2 Day Wed Area Code 2 Time 7 AM Day Thu Area Code 3 Day Area Code … Time 8 AM Time 9 AM Time 10 AM Time … Exploiting real data TRAINING: 2.8 M utterances TEST: 1485 utterances Wai, C., Pieraccini, R., Meng, H., “A Dynamic Semantic Model for Rescoring Recognition Hypothesis,” Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2001

Conclusions • There is very little research in SLU today • lack of data, funding, motivation • SLU is difficult and difficult to evaluate • semantic vs. task completion • Certain speech based applications do not need SLU, other do. • be aware of competing technologies, even if they are not so advanced • There are difficult problems in commercial SLU that are not addressed by the research community. • realignment of academic and industrial research

Advertising Campaign for SLU on Google

Spoken Language Understanding, the Research/Industry Chasm