slide1
Download
Skip this Video
Download Presentation
Alyssa at TREC 2006 Statistically-Inspired Q&A

Loading in 2 Seconds...

play fullscreen
1 / 1

Alyssa at TREC 2006 Statistically-Inspired Q&A - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on

Alyssa at TREC 2006 Statistically-Inspired Q&A. Andreas Merkel 1 Dan Shen 1 Jochen L. Leidner 1,2 Dietrich Klakow 1. [email protected] 1) Spoken Language Systems, Saarland University, 66123 Saarbrücken, Germany. 2) Linguit Ltd. Query Construction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Alyssa at TREC 2006 Statistically-Inspired Q&A' - virgo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Alyssa at TREC 2006Statistically-Inspired Q&A

Andreas Merkel1 Dan Shen1 Jochen L. Leidner1,2 Dietrich Klakow1

[email protected]) Spoken Language Systems, Saarland University, 66123 Saarbrücken, Germany.

2) Linguit Ltd.

  • Query Construction
  • Expands the query with the topic multiple times
  • Including the topic twice is sufficient
  • Introduction
  • Information theoretically well-founded approach to open domain QA
  • Objective: long-term experimental platform
    • flexibility
    • modularity
  • Uses a cascade of:
    • LM-based document retrieval
    • LM-based sentence retrieval
    • maximum entropy based answer extraction over a dependency relation representation
  • Document Retrieval
  • Integrates Lemur IR toolkit
  • Uses Language Model based approach:
    • Bayesian smoothing with Dirichlet priors
    • with stemming and no stop-word removal
  • Fetches 60 relevant documents to get about 90% of answers
  • Query Analysis
  • Topic Resolution
    • a simple anaphora resolution strategy
  • Expected Answer Type Extraction
    • NE taxonomy1
    • 6 coarse and 50 fine grained NE types
  • Question Pattern Matching
    • Maps high frequent questions to classes
    • 92 questions are classified to question class
  • Dependency Relation Path Extraction
    • Dependency Parsing
    • Extracts dependency relation paths
      • <question words, path, all question key chunks >
  • Sentence Retrieval
  • Integrates LSVLM Language Modeling toolkit
  • Uses Bayesian smoothing with Dirichlet priors
    • with stemming and dynamic stop-word reduction
    • query and document expansion according to query type
    • removing of query word
  • Results:

Factoid

List

QT

QP2

QP1

AQUAINT Corpus

Wikipedia

Definition

Query Construction

Document Retrieval

Wiki Passage Retrieval

Corref. Resolution

Def QA

Answer

SR 2

SR 1

Google

Definition

Sentence Annotation

  • Question Classification & Typing
  • Uses question types and data by Li and Roth2
  • Employs Bayes Classifier
  • Estimates probabilities using language model toolkit
  • Explored different smoothing and backing-off techniques
  • Results:
  • Answer Extraction & Fusion
  • Answer Extraction
    • Linguistic analysis -- NER, Chunking, Dependency Parsing
    • Chunk-based Surface Text Pattern Matching
    • Maximum Entropy-based Ranking Model
      • Correlation of Dependency Relation Path
  • Answer Fusion
    • Converts rank to probability using Mandelbrot distribution
    • Fuses results from sentence retrieval and Wikipedia

Dep. AE

Pattern AE

List QA

Aquaint

Candidates

Wiki

Candidates

Web

Candidates

Answer Validation

(NIL Judgment)

Answer

  • Results
  • Performs better than median of participants:
  • Flexibility of the system architecture: experiment with different approaches systematically
  • Data driven approach: improve bottlenecks at any point in time
  • No increase of unsupported answers, unlike other participants
  • Future Work
  • Study the impact of the substitution of pronouns
  • Develop more fine-grained named entity recognizer
  • Investigate more sophisticated learning algorithms for evidence fusion
  • Use answer extraction patterns as precise Web validation patterns
  • Error analysis

References: 1)http://I2r.cs.uiuc.edu/~cogcomp/Data/QA/QC 2) X. Li and D. Roth. Learning Question Classifiers. In Proceedings of COLING (2002).

3)D. Zhang and W. Lee. Question Classification using Support Vector Machines. In Proceedings of SIGIR (2003).

ad