Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki - PowerPoint PPT Presentation

paul
slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki PowerPoint Presentation
Download Presentation
Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki

play fullscreen
1 / 42
Download Presentation
Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki
387 Views
Download Presentation

Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori, Takaaki Hori, Hajime Tsukada and Hideki Isozaki Speech Open Lab. and Intelligent Communication Lab. NTT Communication Science Laboratories

  2. Humanoid Robot I can walk ! I can see ! I can dance! I can hear ! Let’s have a conversation freely. I can speak !

  3. Domain and DB Structure for QA System specific (SDQA) target domain open (ODQA) unstructured corpus data structure knowledge DB table-lookup natural language input text input w/o addition CHAT-80 SAIQA FALCON w/ addition MYCIN SPIQA VAQA speech input w/o addition Harpy Hearsay-II SPIQA w/ addition JUPITER addition: additional information requirement

  4. QA System for Open Domain through Seech Interactions 2002 Soccer! Which country won the World Cup? I’m going to request additional information to disambiguate users’ question. Got it !!! Additional information, please !!! Which World Cup ??? Brazil won the World Cup of soccer in 2002. What kind of world cup? When was the World Cup held? SPIQA

  5. Spoken Interactive Open Domain QA System: SPIQA Question reconstructor additional information reconstructed question User ODQA engine SAIQA ASR system SOLON answer hypotheses the first question Answer derived? yes answers Answer sentence generator TTS system FinalFluet DDQ sentence no DDQ generator disambiguous question and additional information question and answer

  6. ODQA task • A target of Text REtrieval Conference(TREC) • by DARPA/NIST • Open Domain QA (ODQA) • Gives specific answers from a large, unannotated text corpus rather than a ranked list of documents • In response to a question written in natural language • Question Word Question: {who, where, when, what, why, which, whom, how}

  7. ODQA approach • User’s intention classification • Interrogative {CLASS of named entity (NE)} • Who {PERSON} • Where {LOCATION} • Relevant document retrieval • all documents related to each phrase in questions are retrieved • NE extraction according to users’ intention • Detected class {NE} • PERSON {Bush, Clinton, Gore} • COUNTRY {Japan, America, Italy}

  8. ODQA evaluation • Multiple answer hypotheses extracted: • 1. Bush • 2. Koizumi ←Correct answer • 3. Clinton • 4. Obuchi • 5. Gore • Mean Reciprocal Ranking • Reciprocal Ranking = 1/2

  9. Problems in Spoken Interactive ODQA • Speech recognition for open domain • QA for open domain • Interaction approach for ODQA

  10. Problems in Spoken ODQA • Recognition errors • Incomplete sentences and word fragments in spontaneous speech • Enormous size of vocabularies: 1,800,000 (1.8M) morpheme morpheme+pronunciation+POS/NE -> Koizumi+ko-i-zu-mi+PERSON • Out-of-vocabulary

  11. Problems in ODQA Ambiguous questions input by users • Necessity of interactions between human and machine • Asks questions of its own to resolve ambiguity in the user’s question • Improving QA performance by the user’s answer in response to system queries

  12. Problems in Interactive ODQA • Unable to prepare dialogue scenarios • in system designs • system queries for additional information • optimum interaction strategies • for answer extraction

  13. New Interaction Approach for Open Domain

  14. Very large vocabulary task • Experiment conditions • Acoustic modelread speech(ATR+ASJ+JNAS, about 20 hours)gender-dependent(female)model, 3000 states, 16 mixtures • Vocabulary size: 20K, 65K, 200K, 1M, 1.85M • Language model: n-gram10 years news paper text + questions for QA(other than test sets) • Decoder: SOLON: Approximation in on-the-fly composition [Hori 2004] • Test sets: 1 female speaker, questions for QA 11419 utterances 2000 questions with 20 morphemes 2000 questions with 5 morphemes 7419 isolated words

  15. Test-set OOV rate & Perplexity

  16. Word Accuracy Beam width (score-histogram)

  17. Character Accuracy Beam width (score-histogram)

  18. Decoding speed(Real Time Factor) Beam width (score-histogram) CPU: Opteron 246 2GHz

  19. Weighted Finite-State Transducer: WFST b:y/2.5 State • Morphological analysis [Pereira 1994] • Machine translation[Oncina 1994] • Syntactic analysis[Alshawi 1996] • Speech recognition[Mohri 1997, Willett 2000] Final state a:x/0.8 1 c:z/0.3 State transition 3/1.1 0 2 a:x/1.0 <input>:<output>/weight a:e/1.1 b:v/0

  20. WFSTs in Speech Recognition • Advantages • Yield a unified framework for describing models • Integrate different models into a single model via composition operations • Improve search efficiency via optimization algorithms • Problems • Composition of complex models generates a huge WFST • Search space increases, and huge memory is required • Solution • Efficient algorithm using on-the-fly composition

  21. WFST-based speech recognition Feature Vector Seq. TriphoneSeq Phone Seq. Word Seq. Word Seq. ^ C O W W P HMM Triphone network Lexicon 3-gram Composition & Optimization ^ O W Decoder (Mohri 1997~)

  22. On-the-fly composition ^ C O W W P HMM Triphonenetwork Lexicon 3-gram Composition & Optimization ^ O P W WFST B WFST A Composition during decodingMemory is saved, but search efficiency decreases.

  23. A pair of WFST’s used in on-the-fly composition e:e C/P(C|CC) s2:A s4:e 2 C/P(C|AC) C/P(C|A) 1 s1:e 4 3 5 1 3 A/P(A) 0 s5:e s3:B B/P(B|AC) s7:C 0 B/P(B|CC) C/P(C|CB) s9:e s11:e 9 5 7 B/P(B) s6:C 2 4 6 s10:e C/P(C|B) B/P(C|BC) 8 6 s8:e e:e Second WFST (Language model) First WFST (HMM states to word sequence)

  24. Standard on-the-fly composition Hypotheses of the first WFST s8:e 6 8 s6:C s2:A s4:e 2 s1:e 1 0 4 3 7 5 s5:e s3:B s7:C s9:e s8:e s4:e s6:C s2:A 8,3 6,3 4,1 2,1 s1:e s7:C 0,0 1,0 7,3 5,3 s9:e s3:B 3,2 s6:C 4,2 s8:e s5:e 6,4 8,4 s7:C 5,4 7,4 Hypotheses in on-the-fly composition s9:e time

  25. Approximation in on-the-fly composition Hypotheses of the first WFST s8:e 6 8 s6:C s2:A s4:e 2 s1:e 1 0 4 3 7 5 s5:e s3:B s7:C s9:e s4,s6 : C s8:e s2:A 8,3 6,3 2,1 s1:e 0,0 1,0 s4,s7 : C 7,3 5,3 s9:e s3:B s5,s6 : C 3,2 s8:e 6,4 8,4 s5,s7 : C 5,4 7,4 Hypotheses in on-the-fly composition s9:e time

  26. Proposed on-the-fly composition Hypotheses of the first WFST s8:e 6 8 s6:C s2:A s4:e 2 s1:e 1 0 4 3 7 5 s5:e s3:B s7:C s9:e C A 6,3 2,1 0,0 C 5,3 B C 3,2 6,4 On-the-fly rescoring pass C 5,4 time

  27. Results of the CSJ task • CSJ Benchmark test 1 (10 academic presentations) CPU: Xeon 3.0GHz

  28. Results of the very large vocabulary task • 2,000 utterances in spoken interactive QA domain • Vocabulary size: 65K, 200K, 1M, 1.8M CPU: Opteron 246 2GHz

  29. Distinguishing among Multiple Hypotheses • Suppose documents related to keywords, “World Cup,” include the following information: • Additional information regarding GAMES, COUNTRY, DATE can assist in clarifying the choice of answers.

  30. Disambiguating Ambiguous Questions Which country won the World Cup of soccer held in Japan and Korea in 2002 ? • Indispensable information is not always • present in user’s question. • The missing information is modifiers of • phrases in the user’s question. User’s question Which country won the World Cup? Feature slots

  31. Deriving Disambiguating Query: DDQ • Detecting ambiguous phrase • - Needs more additional information • Generating interrogative sentence • - Combining interrogatives and ambiguous phrase • Selecting the most appropriate • disambiguating query • - linguistic appropriateness

  32. Ambiguous Phrase Detection An ambiguous phrase needs more additional information. Structual ambiguity in users’ questions Phrases with fewer modifying General ambiguity in the retrieved target Phrases appearing more frequently in the corpus

  33. Which country in South America won in the World Cup? Generality Ambiguity Structural Ambiguity The unigram probability of w based on the retrieved corpus is used to calculate a generality ambiguity score. The dependency probability is used to calculate a structualambiguity score. cont: Content words - Ronaldo scores twice to give Brazil a 2-0 victory over Germany in the World Cup final. - Anand, Xu Yuhua Retain Titles at World Cup Chess Championship. - Renate Goetschl and Hermann Maier are the overall champions after the World Cup alpine finals. D(Pi, Pn) is the probability that phrase Pn will be modified by phrase Pi, which can be calculated using Stochastic Dependency Context Free Grammar (SDCFG).

  34. Generating DQs Combining ambiguous phrases in users’ question with templates of all possible interrogative sentences Ambiguous phrase:World Cup Templates of interrogative sentences:What kind of ? What year was held ? DQ candidate 1: What kind of World Cup? DQ candidate 2: What year was the World Cup held? + * *

  35. Linguistic Appropriateness of Interrogative Sentences The n-gram likelihood for interrogative sentences Newspaper text Brazil[COUNTRY] won the World Cup of soccer[SPORTS] held in Japan[COUNTRY] and Korea[COUNTRY] in 2002[DATE]. Quasi interrogative sentences are generated using grammar rules. Which country[COUNTRY] won the World Cup? The World Cup of what sport[SPORTS]? When[DATE] was the World Cup held? Where[COUNTRY] was the World Cup held? Feature slots

  36. Frequency of Feature Slots • The n-gram likelihood for interrogative sentences • The frequency of feature slots -The feature slots appearing in the retrieved target is given high score.

  37. Approach for Generating DQs - Templates ofinterrogative sentences: who, where, when, how, what, … - Let Smn be a DQ generated by inserting the n-th phrase into the m-th templates.- What type of+World Cup ? -What year was+the World Cup +held ? - Candidates = templates ×(nouns + noun-phrases) - DQ score H(Smn) is defined as follows:

  38. 3 4 2 5 7 1 6 8 10 9 1 2 9 8 5 Indispensable Information Extraction from Recognition Results recognition results • exclude words with recognition error • extract indispensable information • compensate for indispensable but misrecognized words

  39. 3 4 2 5 7 6 1 8 10 9 Screening Filter for Recognition Errors A meaningful set of words is extracted from original speech excluding recognition errors through automatic speech summarization. recognition result Important words are sometimes dropped during summarization. 3 4 10 2 9 screened result 1 5 8 Indispensable information for extracting answers should be supplemented by users.

  40. Evaluation Experiments • Our ASR system using finite state transducers, SOLON, (20k vocabulary size) transcribed 69 questions read aloud by seven male speakers. • - 19 morphemes on average in a each question • - The sentences were grammatically correct and formally structured. • - The mean word recognition accuracy for the questions was 76%. • The recognition results screened through speech summarization technique • Answers for the questions reconstructed using additional information queried by the DDQmodule were given by ODQA engine, SAIQA.

  41. MRR w/o recognition errors: 0.43 Evaluation Results recognition results removing recognition errors screened recognition results reconstructed questions using the screened questions and additional information obtained through only once interaction Speakers:7 males Questions:69 sentences Word recognition errors:76%

  42. Conclusion • The DDQ (deriving dsiambiguous queries) module automatically generates queries for indispensable information using ambiguous phrases and templates of interrogative sentences. • Experimental results revealed the DQ’s potential to compensate for missing indispensable information to extract answers. • Future work will include an evaluation of the dialogue strategy in a spoken interactive ODQA system to assess how fast answers are extracted and how exact the answers are.