slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki PowerPoint Presentation
Download Presentation
Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki

Loading in 2 Seconds...

play fullscreen
1 / 42

Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki - PowerPoint PPT Presentation


  • 368 Views
  • Uploaded on

Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki Isozaki Speech Open Lab. and Intelligent Communication Lab. NTT Communication Science Laboratories Humanoid Robot I can walk ! I can see ! I can dance !

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki' - paul


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Spoken Interactive Open Domain Question Answering System: SPIQA

Chiori Hori, Takaaki Hori, Hajime Tsukada and Hideki Isozaki

Speech Open Lab. and Intelligent Communication Lab.

NTT Communication Science Laboratories

humanoid robot
Humanoid Robot

I can

walk !

I can

see !

I can

dance!

I can

hear !

Let’s have a conversation freely.

I can

speak !

domain and db structure for qa system
Domain and DB Structure for QA System

specific

(SDQA)

target domain

open

(ODQA)

unstructured corpus

data structure

knowledge DB

table-lookup

natural language

input

text

input

w/o addition

CHAT-80

SAIQA

FALCON

w/

addition

MYCIN

SPIQA

VAQA

speech input

w/o

addition

Harpy

Hearsay-II

SPIQA

w/

addition

JUPITER

addition: additional information requirement

qa system for open domain through seech interactions
QA System for Open Domain through Seech Interactions

2002

Soccer!

Which country won the World Cup?

I’m going to request additional information to disambiguate users’ question.

Got it !!!

Additional information, please !!!

Which World Cup ???

Brazil won

the World Cup of soccer

in 2002.

What kind of

world cup?

When

was the World Cup held?

SPIQA

slide5

Spoken Interactive Open Domain QA System: SPIQA

Question

reconstructor

additional

information

reconstructed

question

User

ODQA engine

SAIQA

ASR system

SOLON

answer

hypotheses

the first question

Answer

derived?

yes

answers

Answer sentence

generator

TTS system

FinalFluet

DDQ sentence

no

DDQ

generator

disambiguous question and

additional information

question and answer

odqa task
ODQA task
  • A target of Text REtrieval Conference(TREC)
  • by DARPA/NIST
  • Open Domain QA (ODQA)
      • Gives specific answers from a large, unannotated text corpus rather than a ranked list of documents
      • In response to a question written in natural language
      • Question Word Question: {who, where, when, what, why, which, whom, how}
odqa approach
ODQA approach
  • User’s intention classification
    • Interrogative {CLASS of named entity (NE)}
    • Who {PERSON}
    • Where {LOCATION}
  • Relevant document retrieval
    • all documents related to each phrase in questions are retrieved
  • NE extraction according to users’ intention
    • Detected class {NE}
    • PERSON {Bush, Clinton, Gore}
    • COUNTRY {Japan, America, Italy}
odqa evaluation
ODQA evaluation
  • Multiple answer hypotheses extracted:
  • 1. Bush
  • 2. Koizumi ←Correct answer
  • 3. Clinton
  • 4. Obuchi
  • 5. Gore
  • Mean Reciprocal Ranking
    • Reciprocal Ranking = 1/2
problems in spoken interactive odqa
Problems in Spoken Interactive ODQA
  • Speech recognition for open domain
  • QA for open domain
  • Interaction approach for ODQA
problems in spoken odqa
Problems in Spoken ODQA
  • Recognition errors
  • Incomplete sentences and word fragments in spontaneous speech
  • Enormous size of vocabularies:

1,800,000 (1.8M) morpheme

morpheme+pronunciation+POS/NE

-> Koizumi+ko-i-zu-mi+PERSON

  • Out-of-vocabulary
problems in odqa
Problems in ODQA

Ambiguous questions input by users

  • Necessity of interactions between human and machine
  • Asks questions of its own to resolve ambiguity in the user’s question
  • Improving QA performance by the user’s answer in response to system queries
problems in interactive odqa
Problems in Interactive ODQA
  • Unable to prepare dialogue scenarios
  • in system designs
      • system queries for additional information
      • optimum interaction strategies
      • for answer extraction
very large vocabulary task
Very large vocabulary task
  • Experiment conditions
    • Acoustic modelread speech(ATR+ASJ+JNAS, about 20 hours)gender-dependent(female)model, 3000 states, 16 mixtures
    • Vocabulary size: 20K, 65K, 200K, 1M, 1.85M
    • Language model: n-gram10 years news paper text + questions for QA(other than test sets)
    • Decoder:

SOLON: Approximation in on-the-fly composition [Hori 2004]

    • Test sets: 1 female speaker, questions for QA 11419 utterances

2000 questions with 20 morphemes

2000 questions with 5 morphemes

7419 isolated words

word accuracy
Word Accuracy

Beam width

(score-histogram)

character accuracy
Character Accuracy

Beam width

(score-histogram)

decoding speed real time factor
Decoding speed(Real Time Factor)

Beam width

(score-histogram)

CPU: Opteron 246 2GHz

weighted finite state transducer wfst
Weighted Finite-State Transducer: WFST

b:y/2.5

State

  • Morphological analysis [Pereira 1994]
  • Machine translation[Oncina 1994]
  • Syntactic analysis[Alshawi 1996]
  • Speech recognition[Mohri 1997, Willett 2000]

Final state

a:x/0.8

1

c:z/0.3

State transition

3/1.1

0

2

a:x/1.0

<input>:<output>/weight

a:e/1.1

b:v/0

wfsts in speech recognition
WFSTs in Speech Recognition
  • Advantages
    • Yield a unified framework for describing models
    • Integrate different models into a single model via composition operations
    • Improve search efficiency via optimization algorithms
  • Problems
    • Composition of complex models generates a huge WFST
    • Search space increases, and huge memory is required
  • Solution
    • Efficient algorithm using on-the-fly composition
wfst based speech recognition
WFST-based speech recognition

Feature

Vector Seq.

TriphoneSeq

Phone Seq.

Word Seq.

Word Seq.

^

C

O

W

W

P

HMM

Triphone

network

Lexicon

3-gram

Composition & Optimization

^

O

W

Decoder

(Mohri 1997~)

on the fly composition
On-the-fly composition

^

C

O

W

W

P

HMM

Triphonenetwork

Lexicon

3-gram

Composition & Optimization

^

O

P

W

WFST B

WFST A

Composition during decodingMemory is saved, but search efficiency decreases.

a pair of wfst s used in on the fly composition
A pair of WFST’s used in on-the-fly composition

e:e

C/P(C|CC)

s2:A

s4:e

2

C/P(C|AC)

C/P(C|A)

1

s1:e

4

3

5

1

3

A/P(A)

0

s5:e

s3:B

B/P(B|AC)

s7:C

0

B/P(B|CC)

C/P(C|CB)

s9:e

s11:e

9

5

7

B/P(B)

s6:C

2

4

6

s10:e

C/P(C|B)

B/P(C|BC)

8

6

s8:e

e:e

Second WFST

(Language model)

First WFST

(HMM states to word sequence)

standard on the fly composition
Standard on-the-fly composition

Hypotheses of the first WFST

s8:e

6

8

s6:C

s2:A

s4:e

2

s1:e

1

0

4

3

7

5

s5:e

s3:B

s7:C

s9:e

s8:e

s4:e

s6:C

s2:A

8,3

6,3

4,1

2,1

s1:e

s7:C

0,0

1,0

7,3

5,3

s9:e

s3:B

3,2

s6:C

4,2

s8:e

s5:e

6,4

8,4

s7:C

5,4

7,4

Hypotheses in on-the-fly composition

s9:e

time

approximation in on the fly composition
Approximation in on-the-fly composition

Hypotheses of the first WFST

s8:e

6

8

s6:C

s2:A

s4:e

2

s1:e

1

0

4

3

7

5

s5:e

s3:B

s7:C

s9:e

s4,s6 : C

s8:e

s2:A

8,3

6,3

2,1

s1:e

0,0

1,0

s4,s7 : C

7,3

5,3

s9:e

s3:B

s5,s6 : C

3,2

s8:e

6,4

8,4

s5,s7 : C

5,4

7,4

Hypotheses in on-the-fly composition

s9:e

time

proposed on the fly composition
Proposed on-the-fly composition

Hypotheses of the first WFST

s8:e

6

8

s6:C

s2:A

s4:e

2

s1:e

1

0

4

3

7

5

s5:e

s3:B

s7:C

s9:e

C

A

6,3

2,1

0,0

C

5,3

B

C

3,2

6,4

On-the-fly rescoring pass

C

5,4

time

results of the csj task
Results of the CSJ task
  • CSJ Benchmark test 1 (10 academic presentations)

CPU: Xeon 3.0GHz

results of the very large vocabulary task
Results of the very large vocabulary task
  • 2,000 utterances in spoken interactive QA domain
  • Vocabulary size: 65K, 200K, 1M, 1.8M

CPU: Opteron 246 2GHz

distinguishing among multiple hypotheses
Distinguishing among Multiple Hypotheses
  • Suppose documents related to keywords, “World Cup,” include the following information:
  • Additional information regarding GAMES, COUNTRY, DATE can assist in clarifying the choice of answers.
disambiguating ambiguous questions
Disambiguating Ambiguous Questions

Which country won the World Cup of soccer held in Japan and Korea in 2002 ?

  • Indispensable information is not always
  • present in user’s question.
  • The missing information is modifiers of
  • phrases in the user’s question.

User’s question

Which country won the World Cup?

Feature slots

d eriving d isambiguating q uery ddq
Deriving Disambiguating Query: DDQ
  • Detecting ambiguous phrase
      • - Needs more additional information
  • Generating interrogative sentence
      • - Combining interrogatives and ambiguous phrase
  • Selecting the most appropriate
  • disambiguating query
  • - linguistic appropriateness
ambiguous phrase detection
Ambiguous Phrase Detection

An ambiguous phrase

needs more additional information.

Structual ambiguity

in users’ questions

Phrases with fewer

modifying

General ambiguity

in the retrieved target

Phrases appearing

more frequently

in the corpus

slide33

Which country in South America won in the World Cup?

Generality Ambiguity

Structural Ambiguity

The unigram probability of w based on the retrieved corpus is used to calculate a generality ambiguity score.

The dependency probability is used to calculate a structualambiguity score.

cont: Content words

- Ronaldo scores twice to give Brazil a 2-0 victory over Germany in the World Cup final.

- Anand, Xu Yuhua Retain Titles at World Cup Chess Championship.

- Renate Goetschl and Hermann Maier are the overall champions after the World Cup alpine finals.

D(Pi, Pn) is the probability that phrase Pn will be modified by phrase Pi, which can be calculated using Stochastic Dependency

Context Free Grammar (SDCFG).

generating dqs
Generating DQs

Combining ambiguous phrases in users’ question

with templates of all possible interrogative sentences

Ambiguous phrase:World Cup

Templates of interrogative sentences:What kind of ?

What year was held ?

DQ candidate 1: What kind of World Cup?

DQ candidate 2: What year was the World Cup held?

+

*

*

linguistic appropriateness of interrogative sentences
Linguistic Appropriateness of Interrogative Sentences

The n-gram likelihood for interrogative sentences

Newspaper text

Brazil[COUNTRY] won the World Cup of soccer[SPORTS] held

in Japan[COUNTRY] and Korea[COUNTRY] in 2002[DATE].

Quasi interrogative sentences are generated

using grammar rules.

Which country[COUNTRY] won the World Cup?

The World Cup of what sport[SPORTS]?

When[DATE] was the World Cup held?

Where[COUNTRY] was the World Cup held?

Feature slots

frequency of feature slots
Frequency of Feature Slots
  • The n-gram likelihood for interrogative sentences
  • The frequency of feature slots

-The feature slots appearing in the retrieved target

is given high score.

slide37

Approach for Generating DQs

- Templates ofinterrogative sentences:

who, where, when, how, what, …

- Let Smn be a DQ generated by inserting

the n-th phrase into the m-th templates.- What type of+World Cup ?

-What year was+the World Cup +held ?

- Candidates = templates ×(nouns + noun-phrases)

- DQ score H(Smn) is defined as follows:

slide38

3

4

2

5

7

1

6

8

10

9

1

2

9

8

5

Indispensable Information Extraction from Recognition Results

recognition results

  • exclude words with recognition error
  • extract indispensable information
  • compensate for indispensable but misrecognized words
slide39

3

4

2

5

7

6

1

8

10

9

Screening Filter for Recognition Errors

A meaningful set of words is extracted from original speech excluding recognition errors through automatic speech summarization.

recognition result

Important words are sometimes

dropped during summarization.

3

4

10

2

9

screened result

1

5

8

Indispensable information for extracting answers

should be supplemented by users.

slide40

Evaluation Experiments

  • Our ASR system using finite state transducers, SOLON, (20k vocabulary size) transcribed 69 questions read aloud by seven male speakers.
    • - 19 morphemes on average in a each question
    • - The sentences were grammatically correct and formally structured.
    • - The mean word recognition accuracy for the questions was 76%.
  • The recognition results screened through speech summarization technique
  • Answers for the questions reconstructed using additional information queried by the DDQmodule were given by ODQA engine, SAIQA.
slide41

MRR w/o recognition errors: 0.43

Evaluation Results

recognition results

removing recognition errors

screened recognition results

reconstructed questions

using the screened questions and

additional information obtained

through only once interaction

Speakers:7 males

Questions:69 sentences

Word recognition errors:76%

slide42

Conclusion

  • The DDQ (deriving dsiambiguous queries) module automatically generates queries for indispensable information using ambiguous phrases and templates of interrogative sentences.
  • Experimental results revealed the DQ’s potential to compensate for missing indispensable information to extract answers.
  • Future work will include an evaluation of the dialogue strategy in a spoken interactive ODQA system to assess how fast answers are extracted and how exact the answers are.