Alyssa
Download
1 / 1

Alyssa at TREC 2006 Statistically-Inspired Q&A - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on

Alyssa at TREC 2006 Statistically-Inspired Q&A. Andreas Merkel 1 Dan Shen 1 Jochen L. Leidner 1,2 Dietrich Klakow 1. [email protected] 1) Spoken Language Systems, Saarland University, 66123 Saarbrücken, Germany. 2) Linguit Ltd. Query Construction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Alyssa at TREC 2006 Statistically-Inspired Q&A' - virgo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Alyssa at TREC 2006Statistically-Inspired Q&A

Andreas Merkel1 Dan Shen1 Jochen L. Leidner1,2 Dietrich Klakow1

[email protected]) Spoken Language Systems, Saarland University, 66123 Saarbrücken, Germany.

2) Linguit Ltd.

  • Query Construction

  • Expands the query with the topic multiple times

  • Including the topic twice is sufficient

  • Introduction

  • Information theoretically well-founded approach to open domain QA

  • Objective: long-term experimental platform

    • flexibility

    • modularity

  • Uses a cascade of:

    • LM-based document retrieval

    • LM-based sentence retrieval

    • maximum entropy based answer extraction over a dependency relation representation

  • Document Retrieval

  • Integrates Lemur IR toolkit

  • Uses Language Model based approach:

    • Bayesian smoothing with Dirichlet priors

    • with stemming and no stop-word removal

  • Fetches 60 relevant documents to get about 90% of answers

  • Query Analysis

  • Topic Resolution

    • a simple anaphora resolution strategy

  • Expected Answer Type Extraction

    • NE taxonomy1

    • 6 coarse and 50 fine grained NE types

  • Question Pattern Matching

    • Maps high frequent questions to classes

    • 92 questions are classified to question class

  • Dependency Relation Path Extraction

    • Dependency Parsing

    • Extracts dependency relation paths

      • <question words, path, all question key chunks >

  • Sentence Retrieval

  • Integrates LSVLM Language Modeling toolkit

  • Uses Bayesian smoothing with Dirichlet priors

    • with stemming and dynamic stop-word reduction

    • query and document expansion according to query type

    • removing of query word

  • Results:

Factoid

List

QT

QP2

QP1

AQUAINT Corpus

Wikipedia

Definition

Query Construction

Document Retrieval

Wiki Passage Retrieval

Corref. Resolution

Def QA

Answer

SR 2

SR 1

Google

Definition

Sentence Annotation

  • Question Classification & Typing

  • Uses question types and data by Li and Roth2

  • Employs Bayes Classifier

  • Estimates probabilities using language model toolkit

  • Explored different smoothing and backing-off techniques

  • Results:

  • Answer Extraction & Fusion

  • Answer Extraction

    • Linguistic analysis -- NER, Chunking, Dependency Parsing

    • Chunk-based Surface Text Pattern Matching

    • Maximum Entropy-based Ranking Model

      • Correlation of Dependency Relation Path

  • Answer Fusion

    • Converts rank to probability using Mandelbrot distribution

    • Fuses results from sentence retrieval and Wikipedia

Dep. AE

Pattern AE

List QA

Aquaint

Candidates

Wiki

Candidates

Web

Candidates

Answer Validation

(NIL Judgment)

Answer

  • Results

  • Performs better than median of participants:

  • Flexibility of the system architecture: experiment with different approaches systematically

  • Data driven approach: improve bottlenecks at any point in time

  • No increase of unsupported answers, unlike other participants

  • Future Work

  • Study the impact of the substitution of pronouns

  • Develop more fine-grained named entity recognizer

  • Investigate more sophisticated learning algorithms for evidence fusion

  • Use answer extraction patterns as precise Web validation patterns

  • Error analysis

References: 1)http://I2r.cs.uiuc.edu/~cogcomp/Data/QA/QC 2) X. Li and D. Roth. Learning Question Classifiers. In Proceedings of COLING (2002).

3)D. Zhang and W. Lee. Question Classification using Support Vector Machines. In Proceedings of SIGIR (2003).


ad