Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki

Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori, Takaaki Hori, Hajime Tsukada and Hideki Isozaki Speech Open Lab. and Intelligent Communication Lab. NTT Communication Science Laboratories

Humanoid Robot I can walk ! I can see ! I can dance! I can hear ! Let’s have a conversation freely. I can speak !

Domain and DB Structure for QA System specific （SDQA） target domain open （ODQA） unstructured corpus data structure knowledge DB table-lookup natural language input text input w/o addition CHAT-80 SAIQA FALCON w/ addition MYCIN SPIQA VAQA speech input w/o addition Harpy Hearsay-II SPIQA w/ addition JUPITER addition: additional information requirement

QA System for Open Domain through Seech Interactions 2002 Soccer! Which country won the World Cup? I’m going to request additional information to disambiguate users’ question. Got it !!! Additional information, please !!! Which World Cup ??? Brazil won the World Cup of soccer in 2002. What kind of world cup? When was the World Cup held? SPIQA

Spoken Interactive Open Domain QA System: SPIQA Question reconstructor additional information reconstructed question User ODQA engine SAIQA ASR system SOLON answer hypotheses the first question Answer derived? yes answers Answer sentence generator TTS system FinalFluet DDQ sentence no DDQ generator disambiguous question and additional information question and answer

ODQA task • A target of Text REtrieval Conference(TREC) • by DARPA/NIST • Open Domain QA (ODQA) • Gives specific answers from a large, unannotated text corpus rather than a ranked list of documents • In response to a question written in natural language • Question Word Question: {who, where, when, what, why, which, whom, how}

ODQA approach • User’s intention classification • Interrogative {CLASS of named entity (NE)} • Who {PERSON} • Where {LOCATION} • Relevant document retrieval • all documents related to each phrase in questions are retrieved • NE extraction according to users’ intention • Detected class {NE} • PERSON {Bush, Clinton, Gore} • COUNTRY {Japan, America, Italy}

ODQA evaluation • Multiple answer hypotheses extracted: • 1. Bush • 2. Koizumi ←Correct answer • 3. Clinton • 4. Obuchi • 5. Gore • Mean Reciprocal Ranking • Reciprocal Ranking = 1/2

Problems in Spoken Interactive ODQA • Speech recognition for open domain • QA for open domain • Interaction approach for ODQA

Problems in Spoken ODQA • Recognition errors • Incomplete sentences and word fragments in spontaneous speech • Enormous size of vocabularies: 1,800,000 (1.8M) morpheme morpheme+pronunciation+POS/NE -> Koizumi+ko-i-zu-mi+PERSON • Out-of-vocabulary

Problems in ODQA Ambiguous questions input by users • Necessity of interactions between human and machine • Asks questions of its own to resolve ambiguity in the user’s question • Improving QA performance by the user’s answer in response to system queries

Problems in Interactive ODQA • Unable to prepare dialogue scenarios • in system designs • system queries for additional information • optimum interaction strategies • for answer extraction

New Interaction Approach for Open Domain

Very large vocabulary task • Experiment conditions • Acoustic modelread speech（ATR+ASJ+JNAS, about 20 hours）gender-dependent（female）model, 3000 states, 16 mixtures • Vocabulary size: 20K, 65K, 200K, 1M, 1.85M • Language model: n-gram10 years news paper text + questions for QA（other than test sets） • Decoder: SOLON： Approximation in on-the-fly composition [Hori 2004] • Test sets: 1 female speaker, questions for QA 11419 utterances 2000 questions with 20 morphemes 2000 questions with 5 morphemes 7419 isolated words

Test-set OOV rate & Perplexity

Word Accuracy Beam width （score-histogram）

Character Accuracy Beam width （score-histogram）

Decoding speed（Real Time Factor） Beam width （score-histogram） CPU: Opteron 246 2GHz

Weighted Finite-State Transducer: WFST b:y/2.5 State • Morphological analysis [Pereira 1994] • Machine translation[Oncina 1994] • Syntactic analysis[Alshawi 1996] • Speech recognition[Mohri 1997, Willett 2000] Final state a:x/0.8 1 c:z/0.3 State transition 3/1.1 0 2 a:x/1.0 <input>:<output>/weight a:e/1.1 b:v/0

WFSTs in Speech Recognition • Advantages • Yield a unified framework for describing models • Integrate different models into a single model via composition operations • Improve search efficiency via optimization algorithms • Problems • Composition of complex models generates a huge WFST • Search space increases, and huge memory is required • Solution • Efficient algorithm using on-the-fly composition

WFST-based speech recognition Feature Vector Seq. TriphoneSeq Phone Seq. Word Seq. Word Seq. ^ C O W W P HMM Triphone network Lexicon 3-gram Composition & Optimization ^ O W Decoder (Mohri 1997~)

On-the-fly composition ^ C O W W P HMM Triphonenetwork Lexicon 3-gram Composition & Optimization ^ O P W WFST B WFST A Composition during decodingMemory is saved, but search efficiency decreases.

Standard on-the-fly composition Hypotheses of the first WFST s8:e 6 8 s6:C s2:A s4:e 2 s1:e 1 0 4 3 7 5 s5:e s3:B s7:C s9:e s8:e s4:e s6:C s2:A 8,3 6,3 4,1 2,1 s1:e s7:C 0,0 1,0 7,3 5,3 s9:e s3:B 3,2 s6:C 4,2 s8:e s5:e 6,4 8,4 s7:C 5,4 7,4 Hypotheses in on-the-fly composition s9:e time

Approximation in on-the-fly composition Hypotheses of the first WFST s8:e 6 8 s6:C s2:A s4:e 2 s1:e 1 0 4 3 7 5 s5:e s3:B s7:C s9:e s4,s6 : C s8:e s2:A 8,3 6,3 2,1 s1:e 0,0 1,0 s4,s7 : C 7,3 5,3 s9:e s3:B s5,s6 : C 3,2 s8:e 6,4 8,4 s5,s7 : C 5,4 7,4 Hypotheses in on-the-fly composition s9:e time

Proposed on-the-fly composition Hypotheses of the first WFST s8:e 6 8 s6:C s2:A s4:e 2 s1:e 1 0 4 3 7 5 s5:e s3:B s7:C s9:e C A 6,3 2,1 0,0 C 5,3 B C 3,2 6,4 On-the-fly rescoring pass C 5,4 time

Results of the CSJ task • CSJ Benchmark test 1 (10 academic presentations) CPU: Xeon 3.0GHz

Results of the very large vocabulary task • 2,000 utterances in spoken interactive QA domain • Vocabulary size: 65K, 200K, 1M, 1.8M CPU: Opteron 246 2GHz

Distinguishing among Multiple Hypotheses • Suppose documents related to keywords, “World Cup,” include the following information: • Additional information regarding GAMES, COUNTRY, DATE can assist in clarifying the choice of answers.

Disambiguating Ambiguous Questions Which country won the World Cup of soccer held in Japan and Korea in 2002 ? • Indispensable information is not always • present in user’s question. • The missing information is modifiers of • phrases in the user’s question. User’s question Which country won the World Cup? Feature slots

Deriving Disambiguating Query: DDQ • Detecting ambiguous phrase • - Needs more additional information • Generating interrogative sentence • - Combining interrogatives and ambiguous phrase • Selecting the most appropriate • disambiguating query • - linguistic appropriateness

Ambiguous Phrase Detection An ambiguous phrase needs more additional information. Structual ambiguity in users’ questions Phrases with fewer modifying General ambiguity in the retrieved target Phrases appearing more frequently in the corpus

Which country in South America won in the World Cup? Generality Ambiguity Structural Ambiguity The unigram probability of w based on the retrieved corpus is used to calculate a generality ambiguity score. The dependency probability is used to calculate a structualambiguity score. cont: Content words - Ronaldo scores twice to give Brazil a 2-0 victory over Germany in the World Cup final. - Anand, Xu Yuhua Retain Titles at World Cup Chess Championship. - Renate Goetschl and Hermann Maier are the overall champions after the World Cup alpine finals. D(Pi, Pn) is the probability that phrase Pn will be modified by phrase Pi, which can be calculated using Stochastic Dependency Context Free Grammar (SDCFG).

Generating DQs Combining ambiguous phrases in users’ question with templates of all possible interrogative sentences Ambiguous phrase:World Cup Templates of interrogative sentences:What kind of ? What year was held ? DQ candidate 1: What kind of World Cup? DQ candidate 2: What year was the World Cup held? + * *

Linguistic Appropriateness of Interrogative Sentences The n-gram likelihood for interrogative sentences Newspaper text Brazil[COUNTRY] won the World Cup of soccer[SPORTS] held in Japan[COUNTRY] and Korea[COUNTRY] in 2002[DATE]. Quasi interrogative sentences are generated using grammar rules. Which country[COUNTRY] won the World Cup? The World Cup of what sport[SPORTS]? When[DATE] was the World Cup held? Where[COUNTRY] was the World Cup held? Feature slots

Frequency of Feature Slots • The n-gram likelihood for interrogative sentences • The frequency of feature slots -The feature slots appearing in the retrieved target is given high score.

Approach for Generating DQs - Templates ofinterrogative sentences: who, where, when, how, what, … - Let Smn be a DQ generated by inserting the n-th phrase into the m-th templates.- What type of＋World Cup ? -What year was＋the World Cup ＋held ? - Candidates = templates ×(nouns + noun-phrases) - DQ score H(Smn) is defined as follows:

3 4 2 5 7 1 6 8 10 9 1 2 9 8 5 Indispensable Information Extraction from Recognition Results recognition results • exclude words with recognition error • extract indispensable information • compensate for indispensable but misrecognized words

3 4 2 5 7 6 1 8 10 9 Screening Filter for Recognition Errors A meaningful set of words is extracted from original speech excluding recognition errors through automatic speech summarization. recognition result Important words are sometimes dropped during summarization. 3 4 10 2 9 screened result 1 5 8 Indispensable information for extracting answers should be supplemented by users.

Evaluation Experiments • Our ASR system using finite state transducers, SOLON, (20k vocabulary size) transcribed 69 questions read aloud by seven male speakers. • - 19 morphemes on average in a each question • - The sentences were grammatically correct and formally structured. • - The mean word recognition accuracy for the questions was 76%. • The recognition results screened through speech summarization technique • Answers for the questions reconstructed using additional information queried by the DDQmodule were given by ODQA engine, SAIQA.

MRR w/o recognition errors: 0.43 Evaluation Results recognition results removing recognition errors screened recognition results reconstructed questions using the screened questions and additional information obtained through only once interaction Speakers：7 males Questions：69 sentences Word recognition errors：76%

Conclusion • The DDQ (deriving dsiambiguous queries) module automatically generates queries for indispensable information using ambiguous phrases and templates of interrogative sentences. • Experimental results revealed the DQ’s potential to compensate for missing indispensable information to extract answers. • Future work will include an evaluation of the dialogue strategy in a spoken interactive ODQA system to assess how fast answers are extracted and how exact the answers are.

Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki

Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki

Presentation Transcript

1. CMU MRSEC Outreach Activities

Privacy @ CMU

CMU–Voyager

Spring 2012 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

Spring 2014 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

Spring 2014 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

Spring 2012 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

Spring 2012 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

Spring 2014 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

Spring 2014 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

Spring 2014 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

Spring 2012 BioE 2630 (Pitt) : 16-725 (CMU RI) 18-791 (CMU ECE) : 42-735 (CMU BME)

CMU Design Goals

Geoexchange at CMU

Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

CMU CHIANG MAI UNIVERSITY

Special Sensor CMU Camera 3

CMU

Ryan O'Donnell (CMU, IAS) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Entrepreneurship Education at CMU

CMU Campus Life