Spoken Interactive Open Domain Question Answering System:
Download
1 / 42

cmu - PowerPoint PPT Presentation


  • 347 Views
  • Updated On :

Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki Isozaki Speech Open Lab. and Intelligent Communication Lab. NTT Communication Science Laboratories Humanoid Robot I can walk ! I can see ! I can dance !

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'cmu' - paul


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Spoken Interactive Open Domain Question Answering System: SPIQA

Chiori Hori, Takaaki Hori, Hajime Tsukada and Hideki Isozaki

Speech Open Lab. and Intelligent Communication Lab.

NTT Communication Science Laboratories


Humanoid robot l.jpg
Humanoid Robot

I can

walk !

I can

see !

I can

dance!

I can

hear !

Let’s have a conversation freely.

I can

speak !


Domain and db structure for qa system l.jpg
Domain and DB Structure for QA System

specific

(SDQA)

target domain

open

(ODQA)

unstructured corpus

data structure

knowledge DB

table-lookup

natural language

input

text

input

w/o addition

CHAT-80

SAIQA

FALCON

w/

addition

MYCIN

SPIQA

VAQA

speech input

w/o

addition

Harpy

Hearsay-II

SPIQA

w/

addition

JUPITER

addition: additional information requirement


Qa system for open domain through seech interactions l.jpg
QA System for Open Domain through Seech Interactions

2002

Soccer!

Which country won the World Cup?

I’m going to request additional information to disambiguate users’ question.

Got it !!!

Additional information, please !!!

Which World Cup ???

Brazil won

the World Cup of soccer

in 2002.

What kind of

world cup?

When

was the World Cup held?

SPIQA


Slide5 l.jpg

Spoken Interactive Open Domain QA System: SPIQA

Question

reconstructor

additional

information

reconstructed

question

User

ODQA engine

SAIQA

ASR system

SOLON

answer

hypotheses

the first question

Answer

derived?

yes

answers

Answer sentence

generator

TTS system

FinalFluet

DDQ sentence

no

DDQ

generator

disambiguous question and

additional information

question and answer


Odqa task l.jpg
ODQA task

  • A target of Text REtrieval Conference(TREC)

  • by DARPA/NIST

  • Open Domain QA (ODQA)

    • Gives specific answers from a large, unannotated text corpus rather than a ranked list of documents

    • In response to a question written in natural language

    • Question Word Question: {who, where, when, what, why, which, whom, how}


Odqa approach l.jpg
ODQA approach

  • User’s intention classification

    • Interrogative {CLASS of named entity (NE)}

    • Who {PERSON}

    • Where {LOCATION}

  • Relevant document retrieval

    • all documents related to each phrase in questions are retrieved

  • NE extraction according to users’ intention

    • Detected class {NE}

    • PERSON {Bush, Clinton, Gore}

    • COUNTRY {Japan, America, Italy}


Odqa evaluation l.jpg
ODQA evaluation

  • Multiple answer hypotheses extracted:

  • 1. Bush

  • 2. Koizumi ←Correct answer

  • 3. Clinton

  • 4. Obuchi

  • 5. Gore

  • Mean Reciprocal Ranking

    • Reciprocal Ranking = 1/2


Problems in spoken interactive odqa l.jpg
Problems in Spoken Interactive ODQA

  • Speech recognition for open domain

  • QA for open domain

  • Interaction approach for ODQA


Problems in spoken odqa l.jpg
Problems in Spoken ODQA

  • Recognition errors

  • Incomplete sentences and word fragments in spontaneous speech

  • Enormous size of vocabularies:

    1,800,000 (1.8M) morpheme

    morpheme+pronunciation+POS/NE

    -> Koizumi+ko-i-zu-mi+PERSON

  • Out-of-vocabulary


Problems in odqa l.jpg
Problems in ODQA

Ambiguous questions input by users

  • Necessity of interactions between human and machine

  • Asks questions of its own to resolve ambiguity in the user’s question

  • Improving QA performance by the user’s answer in response to system queries


Problems in interactive odqa l.jpg
Problems in Interactive ODQA

  • Unable to prepare dialogue scenarios

  • in system designs

    • system queries for additional information

    • optimum interaction strategies

    • for answer extraction



Very large vocabulary task l.jpg
Very large vocabulary task

  • Experiment conditions

    • Acoustic modelread speech(ATR+ASJ+JNAS, about 20 hours)gender-dependent(female)model, 3000 states, 16 mixtures

    • Vocabulary size: 20K, 65K, 200K, 1M, 1.85M

    • Language model: n-gram10 years news paper text + questions for QA(other than test sets)

    • Decoder:

      SOLON: Approximation in on-the-fly composition [Hori 2004]

    • Test sets: 1 female speaker, questions for QA 11419 utterances

      2000 questions with 20 morphemes

      2000 questions with 5 morphemes

      7419 isolated words



Word accuracy l.jpg
Word Accuracy

Beam width

(score-histogram)


Character accuracy l.jpg
Character Accuracy

Beam width

(score-histogram)


Decoding speed real time factor l.jpg
Decoding speed(Real Time Factor)

Beam width

(score-histogram)

CPU: Opteron 246 2GHz


Weighted finite state transducer wfst l.jpg
Weighted Finite-State Transducer: WFST

b:y/2.5

State

  • Morphological analysis [Pereira 1994]

  • Machine translation[Oncina 1994]

  • Syntactic analysis[Alshawi 1996]

  • Speech recognition[Mohri 1997, Willett 2000]

Final state

a:x/0.8

1

c:z/0.3

State transition

3/1.1

0

2

a:x/1.0

<input>:<output>/weight

a:e/1.1

b:v/0


Wfsts in speech recognition l.jpg
WFSTs in Speech Recognition

  • Advantages

    • Yield a unified framework for describing models

    • Integrate different models into a single model via composition operations

    • Improve search efficiency via optimization algorithms

  • Problems

    • Composition of complex models generates a huge WFST

    • Search space increases, and huge memory is required

  • Solution

    • Efficient algorithm using on-the-fly composition


Wfst based speech recognition l.jpg
WFST-based speech recognition

Feature

Vector Seq.

TriphoneSeq

Phone Seq.

Word Seq.

Word Seq.

^

C

O

W

W

P

HMM

Triphone

network

Lexicon

3-gram

Composition & Optimization

^

O

W

Decoder

(Mohri 1997~)


On the fly composition l.jpg
On-the-fly composition

^

C

O

W

W

P

HMM

Triphonenetwork

Lexicon

3-gram

Composition & Optimization

^

O

P

W

WFST B

WFST A

Composition during decodingMemory is saved, but search efficiency decreases.


A pair of wfst s used in on the fly composition l.jpg
A pair of WFST’s used in on-the-fly composition

e:e

C/P(C|CC)

s2:A

s4:e

2

C/P(C|AC)

C/P(C|A)

1

s1:e

4

3

5

1

3

A/P(A)

0

s5:e

s3:B

B/P(B|AC)

s7:C

0

B/P(B|CC)

C/P(C|CB)

s9:e

s11:e

9

5

7

B/P(B)

s6:C

2

4

6

s10:e

C/P(C|B)

B/P(C|BC)

8

6

s8:e

e:e

Second WFST

(Language model)

First WFST

(HMM states to word sequence)


Standard on the fly composition l.jpg
Standard on-the-fly composition

Hypotheses of the first WFST

s8:e

6

8

s6:C

s2:A

s4:e

2

s1:e

1

0

4

3

7

5

s5:e

s3:B

s7:C

s9:e

s8:e

s4:e

s6:C

s2:A

8,3

6,3

4,1

2,1

s1:e

s7:C

0,0

1,0

7,3

5,3

s9:e

s3:B

3,2

s6:C

4,2

s8:e

s5:e

6,4

8,4

s7:C

5,4

7,4

Hypotheses in on-the-fly composition

s9:e

time


Approximation in on the fly composition l.jpg
Approximation in on-the-fly composition

Hypotheses of the first WFST

s8:e

6

8

s6:C

s2:A

s4:e

2

s1:e

1

0

4

3

7

5

s5:e

s3:B

s7:C

s9:e

s4,s6 : C

s8:e

s2:A

8,3

6,3

2,1

s1:e

0,0

1,0

s4,s7 : C

7,3

5,3

s9:e

s3:B

s5,s6 : C

3,2

s8:e

6,4

8,4

s5,s7 : C

5,4

7,4

Hypotheses in on-the-fly composition

s9:e

time


Proposed on the fly composition l.jpg
Proposed on-the-fly composition

Hypotheses of the first WFST

s8:e

6

8

s6:C

s2:A

s4:e

2

s1:e

1

0

4

3

7

5

s5:e

s3:B

s7:C

s9:e

C

A

6,3

2,1

0,0

C

5,3

B

C

3,2

6,4

On-the-fly rescoring pass

C

5,4

time


Results of the csj task l.jpg
Results of the CSJ task

  • CSJ Benchmark test 1 (10 academic presentations)

CPU: Xeon 3.0GHz


Results of the very large vocabulary task l.jpg
Results of the very large vocabulary task

  • 2,000 utterances in spoken interactive QA domain

  • Vocabulary size: 65K, 200K, 1M, 1.8M

CPU: Opteron 246 2GHz


Distinguishing among multiple hypotheses l.jpg
Distinguishing among Multiple Hypotheses

  • Suppose documents related to keywords, “World Cup,” include the following information:

  • Additional information regarding GAMES, COUNTRY, DATE can assist in clarifying the choice of answers.


Disambiguating ambiguous questions l.jpg
Disambiguating Ambiguous Questions

Which country won the World Cup of soccer held in Japan and Korea in 2002 ?

  • Indispensable information is not always

  • present in user’s question.

  • The missing information is modifiers of

  • phrases in the user’s question.

User’s question

Which country won the World Cup?

Feature slots


D eriving d isambiguating q uery ddq l.jpg
Deriving Disambiguating Query: DDQ

  • Detecting ambiguous phrase

    • - Needs more additional information

  • Generating interrogative sentence

    • - Combining interrogatives and ambiguous phrase

  • Selecting the most appropriate

  • disambiguating query

  • - linguistic appropriateness


  • Ambiguous phrase detection l.jpg
    Ambiguous Phrase Detection

    An ambiguous phrase

    needs more additional information.

    Structual ambiguity

    in users’ questions

    Phrases with fewer

    modifying

    General ambiguity

    in the retrieved target

    Phrases appearing

    more frequently

    in the corpus


    Slide33 l.jpg

    Which country in South America won in the World Cup?

    Generality Ambiguity

    Structural Ambiguity

    The unigram probability of w based on the retrieved corpus is used to calculate a generality ambiguity score.

    The dependency probability is used to calculate a structualambiguity score.

    cont: Content words

    - Ronaldo scores twice to give Brazil a 2-0 victory over Germany in the World Cup final.

    - Anand, Xu Yuhua Retain Titles at World Cup Chess Championship.

    - Renate Goetschl and Hermann Maier are the overall champions after the World Cup alpine finals.

    D(Pi, Pn) is the probability that phrase Pn will be modified by phrase Pi, which can be calculated using Stochastic Dependency

    Context Free Grammar (SDCFG).


    Generating dqs l.jpg
    Generating DQs

    Combining ambiguous phrases in users’ question

    with templates of all possible interrogative sentences

    Ambiguous phrase:World Cup

    Templates of interrogative sentences:What kind of ?

    What year was held ?

    DQ candidate 1: What kind of World Cup?

    DQ candidate 2: What year was the World Cup held?

    +

    *

    *


    Linguistic appropriateness of interrogative sentences l.jpg
    Linguistic Appropriateness of Interrogative Sentences

    The n-gram likelihood for interrogative sentences

    Newspaper text

    Brazil[COUNTRY] won the World Cup of soccer[SPORTS] held

    in Japan[COUNTRY] and Korea[COUNTRY] in 2002[DATE].

    Quasi interrogative sentences are generated

    using grammar rules.

    Which country[COUNTRY] won the World Cup?

    The World Cup of what sport[SPORTS]?

    When[DATE] was the World Cup held?

    Where[COUNTRY] was the World Cup held?

    Feature slots


    Frequency of feature slots l.jpg
    Frequency of Feature Slots

    • The n-gram likelihood for interrogative sentences

    • The frequency of feature slots

      -The feature slots appearing in the retrieved target

      is given high score.


    Slide37 l.jpg

    Approach for Generating DQs

    - Templates ofinterrogative sentences:

    who, where, when, how, what, …

    - Let Smn be a DQ generated by inserting

    the n-th phrase into the m-th templates.- What type of+World Cup ?

    -What year was+the World Cup +held ?

    - Candidates = templates ×(nouns + noun-phrases)

    - DQ score H(Smn) is defined as follows:


    Slide38 l.jpg

    3

    4

    2

    5

    7

    1

    6

    8

    10

    9

    1

    2

    9

    8

    5

    Indispensable Information Extraction from Recognition Results

    recognition results

    • exclude words with recognition error

    • extract indispensable information

    • compensate for indispensable but misrecognized words


    Slide39 l.jpg

    3

    4

    2

    5

    7

    6

    1

    8

    10

    9

    Screening Filter for Recognition Errors

    A meaningful set of words is extracted from original speech excluding recognition errors through automatic speech summarization.

    recognition result

    Important words are sometimes

    dropped during summarization.

    3

    4

    10

    2

    9

    screened result

    1

    5

    8

    Indispensable information for extracting answers

    should be supplemented by users.


    Slide40 l.jpg

    Evaluation Experiments

    • Our ASR system using finite state transducers, SOLON, (20k vocabulary size) transcribed 69 questions read aloud by seven male speakers.

      • - 19 morphemes on average in a each question

      • - The sentences were grammatically correct and formally structured.

      • - The mean word recognition accuracy for the questions was 76%.

    • The recognition results screened through speech summarization technique

    • Answers for the questions reconstructed using additional information queried by the DDQmodule were given by ODQA engine, SAIQA.


    Slide41 l.jpg

    MRR w/o recognition errors: 0.43

    Evaluation Results

    recognition results

    removing recognition errors

    screened recognition results

    reconstructed questions

    using the screened questions and

    additional information obtained

    through only once interaction

    Speakers:7 males

    Questions:69 sentences

    Word recognition errors:76%


    Slide42 l.jpg

    Conclusion

    • The DDQ (deriving dsiambiguous queries) module automatically generates queries for indispensable information using ambiguous phrases and templates of interrogative sentences.

    • Experimental results revealed the DQ’s potential to compensate for missing indispensable information to extract answers.

    • Future work will include an evaluation of the dialogue strategy in a spoken interactive ODQA system to assess how fast answers are extracted and how exact the answers are.


    ad