Using information extraction for question answering
Sponsored Links
This presentation is the property of its rightful owner.
1 / 27

Using Information Extraction for Question Answering PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Using Information Extraction for Question Answering. Done by Rani Qumsiyeh. Problem. More Information added to the web everyday. Search engines exist but they have a problem This calls for a different kind of search engine. History of QA. QA can be dated back to the 1960’s

Download Presentation

Using Information Extraction for Question Answering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Using Information Extraction for Question Answering

Done by

Rani Qumsiyeh


  • More Information added to the web everyday.

  • Search engines exist but they have a problem

  • This calls for a different kind of search engine.

History of QA

  • QA can be dated back to the 1960’s

  • Two common approaches to design QA:

    • Information Extraction

    • Information Retrieval

  • Two conferences to evaluate QA systems

    • TREC (Text REtrieval Conference)

    • MUC (Message Understanding Conference)

Common Issues with QA systems

  • Information retrieval deals with keywords.

  • Information extraction learns the question.

  • The question could have multiple variations which means

    • Easier for IR but more broad results

    • Harder for IE but more EXACT results

Message Understanding Conference (MUC)

  • Sponsored by the Defense Advanced Research Projects Agency (DARPA) 1991-1998.

  • Developed methods for formal evaluation of IE systems

  • In the form of a competition, where the participants compare their results with each other and against human annotators‘ key templates.

  • Short system preparation time to stimulate portability to new extraction problems. Only 1 month to adapt the system to the new scenario before the formal run.

Evaluation Metrics

  • Precision and recall:

    • Precision: correct answers/answers produced

    • Recall: correct answers/total possible answers

  • F-measure

    • Where is a parameter representing relative importance of P & R:

    • E.g., =1, then P&R equal weight, =0, then only P

  • Current State-of-Art: F=.60 barrier

MUC Extraction Tasks

  • Named Entity task (NE)

  • Template Element task (TE)

  • Template Relation task (TR)

  • Scenario Template task (ST)

  • Coreference task (CO)

Named Entity Task (NE)

  • Mark into the text each string that represents, a person, organization, or location name, or a date or time, or a currency or percentage figure

Template Element Task (TE)

  • Extract basic information related to organization, person, and artifact entities, drawing evidence from everywhere in the text.

Template Relation task (TR)

  • Extract relational information on employee_of, manufacture_of, location_of relations etc. (TR expresses domain independent relationships between entities identified by TE)

Scenario Template task (ST)

  • Extract prespecified event information and relate the event information to particular organization, person, or artifact entities (ST identifies domain and task specific entities and relations)

Coreference task (CO)

  • Capture information on corefering expressions, i.e. all mentions of a given entity, including those marked in NE and TE (Nouns, Noun phrases, Pronouns)

An Example

  • The shiny red rocket was fired on Tuesday. It is the brainchild of Dr. Big Head. Dr. Head is a staff scientist at We Build Rockets Inc.

  • NE: entities are rocket, Tuesday, Dr. Head and We Build Rockets

  • CO: it refers to the rocket; Dr. Head and Dr. Big Head are the same

  • TE: the rocket is shiny red and Head‘s brainchild

  • TR: Dr. Head works for We Build Rockets Inc.

  • ST: a rocket launching event occurred with the various participants.

Scoring templates

  • Templates are compared on a slot-by-slot basis

    • Correct: response = key

    • Partial: response » key

    • Incorrect: response != key

    • Spurious: key is blank

      • overgen=spurious/actual

    • Missing: response is blank

Maximum Results Reported

KnowitAll, TextRunner, KnowitNow

  • Differ in implementation, but do the same thing.

Using them as QA systems

  • Able to handle questions that produce 1 relation

    • Who is the president of the US? “can handle”

    • Who was the president of the US in 1998? “fails”

  • Produces a huge number of facts that the user still has to go through.


  • Aims at solving ambiguity in text by introducing more named entities.

  • What is Julian Werver Hill's wife's telephone number?

    • equivalent to: What is Polly's telephone number?

  • Where is Werver Hill's affiliated company located?

    • equivalent to: Where is Microsoft located?

Proposed System

  • Determine what named entity we are looking for using Textract.

  • Use Part of Speech tagging.

  • Use TextRunner as the basis for search.

  • Use WordNet to find synonyms.

  • Use extra entities in text as “constraints”



  • (WP who) (VBD was) (DT the) (JJ first) (NN man) (TO to) (VB land) (IN on) (DT the) (NN moon)

  • The verb (VB) is treated as the argument.

  • The noun (NN) is treated as the predicate

  • We make sure that position is maintained

  • We keep prepositions if they have two nouns. (president of the US)

  • Other non stop words are constraints, i.e., “first”


Anaphora Resolution

  • Use anaphora resolution to determine that landed is not associated with landed but wrote instead.

Use Synonyms

  • We use word net to find possible synonyms for verbs and nouns to produce more facts.

  • We only consider 3 synonyms as it takes more time the more fact retrievals we have to do.

Using constraints


  • Works well with Who, When, Where questions as named entity is easily determined.

    • Achieves about 90% accuracy on all

  • Works less well with What, How questions

    • Achieves about 70% accuracy

  • Takes about 13 seconds to answer question.

Future Work

  • Build an ontology to determine named entity and parse question (faster)

  • Handle combinations of questions.

    • When and where did the holocaust happen?

  • Login