question answering n.
Skip this Video
Download Presentation
Question Answering

Loading in 2 Seconds...

play fullscreen
1 / 40

Question Answering - PowerPoint PPT Presentation

  • Uploaded on

Question Answering. Group Members: Satadru Biswas (05005021) Tanmay Khirwadkar (05005016) Arun Karthikeyan Karra (05d05020) CS 626-460 Course Seminar Group-2. Outline. Introduction Why Question Answering ? AskMSR FALCON Conclusion. Introduction.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Question Answering' - nona

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
question answering

Question Answering

Group Members:

Satadru Biswas (05005021)

Tanmay Khirwadkar (05005016)

Arun Karthikeyan Karra (05d05020)

CS 626-460 Course Seminar


  • Introduction
  • Why Question Answering ?
  • AskMSR
  • Conclusion
  • Question Answering (QA) is the task of automatically answering a question posed in natural language.
  • To find the answer to a question, a QA computer program may use either a pre-structured database or a collection of natural language documents (a text corpus such as the World Wide Web or some local collection).
a few sample questions
A few sample questions
  • Q: Who shot President Abraham Lincoln?
  • A: John Wilkes Booth
  • Q: How many lives were lost in the Pan Am crash in Lockerbie?
  • A: 270
  • Q: How long does it take to travel from London to Paris through the Channel?
  • A: three hours 45 minutes
  • Q: Which Atlantic hurricane had the highest recorded wind speed?
  • A: Gilbert (200 mph)
why question answering
Why Question Answering ?
  • Google – Query driven search
    • Answers to a query are documents
  • Question Answering – Answer driven search
    • Answers to a query are phrases
  • Question classification
  • Finding entailed answer type
  • Use of WordNet
  • High-quality document search
question classes
Question Classes
  • Class 1
    • Answer: single datum or list of items
    • C: who, when, where, how (old, much, large)
    • Example: Who shot President Abraham Lincoln?
    • Answer: John Wilkes Booth
  • Class 2
    • A: multi-sentence
    • C: extract from multiple sentences
    • Example: Who was Picasso?
    • Answer: Picasso was great Spanish painter
  • Class 3
    • A: across several texts
    • C: comparative/contrastive
    • Example: What are the Valdez Principles?
question classes contd
Question Classes (contd…)
  • Class 4
    • A: an analysis of retrieved information
    • C: synthesized coherently from several retrieved fragments
    • Example: Which Atlantic hurricane had the highest recorded wind speed?
    • Answer: Gilbert (200 mph)
  • Class 5
    • A: result of reasoning
    • C: word/domain knowledge and common sense reasoning
    • Example: What did Richard Feynman say upon hearing he would receive the Nobel Prize in Physics?
types of qa
Types of QA
  • Closed-domain question answering deals with questions under a specific domain, and can be seen as an easier task because NLP systems can exploit domain-specific knowledge frequently formalized in ontologies.
  • Open-domain question answering deals with questions about nearly everything, and can only rely on general ontologies and world knowledge. On the other hand, these systems usually have much more data available from which to extract the answer.
qa concepts
QA - Concepts
  • Question Classes: Different types of questions require the use of different strategies to find the answer.
  • Question Processing: A semantic model of question understanding and processing is needed, one that would recognize equivalent questions, regardless of the speech act or of the words, syntactic inter-relations or idiomatic forms.
  • Context and QA: Questions are usually asked within a context and answers are provided within that specific context.
  • Data sources for QA: Before a question can be answered, it must be known what knowledge sources are available.
  • Answer Extraction: Answer extraction depends on the complexity of the question, on the answer type provided by question processing, on the actual data where the answer is searched, on the search method and on the question focus and context.
  • Answer Formulation: The result of a QA system should be presented in a way as natural as possible.
  • Real time question answering: There is need for developing Q&A systems that are capable of extracting answers from large data sets in several seconds, regardless of the complexity of the question, the size and multitude of the data sources or the ambiguity of the question.
  • Multi-lingual QA: The ability to answer a question posed in one language using an answer corpus in another language (or even several).
  • Interactive QA: Often the questioner might want not only to reformulate the question, but (s)he might want to have a dialogue with the system.
  • Advanced reasoning for QA: More sophisticated questioners expect answers which are outside the scope of written texts or structured databases.
  • User profiling for QA: The user profile captures data about the questioner, comprising context data, domain of interest, reasoning schemes frequently used by the questioner, common ground established within different dialogues between the system and the user etc.
issues with traditional qa systems
Issues with traditional QA Systems
  • Retrieval is performed against small set of documents
  • Extensive use of linguistic resources
    • POS tagging, Named Entity Tagging, WordNet etc.
  • Difficult to recognize answers that do not match question syntax
    • E.g. Q: Who shot President Abraham Lincoln?
      • A: John Wilkes Booth is perhaps America’s most infamous assassin having fired the bullet that killed AbrahamLincoln.
the web can help
The Web can help !
  • Web – A gigantic data repository with extensive data redundancy
  • Factoids likely to be expressed in hundreds of different ways
  • At-least a few will match the way the question was asked
    • E.g. Q: Who shotPresidentAbrahamLincoln?
      • A: John Wilkes BoothshotPresidentAbrahamLincoln.
  • Based on Data-Redundancy of the Web
  • Process the question
    • Form a web-search engine query
    • Recognize the answer-type
  • Rank answers on basis of frequency
  • Project the answers on TREC-corpus
1 query reformulation
1. Query Reformulation
  • Question is often syntactically close to answer
    • E.g. Where istheLouvreMuseumlocated?
      • TheLouvreMuseumislocated in Paris
    • Who createdthecharacterofScrooge?
      • Charles DickenscreatedthecharacterofScrooge.
1 query reformulation1
1. Query Reformulation
  • Classify the query into 7 categories
    • Who, When, Where …
  • Hand-crafted category-specific rewrite rules
    • [String, L/R/-, Weight]
  • Weight – preference for a query
    • “Abraham Lincoln born on” preferred to “Abraham” “Lincoln” “born”
  • String – Simple String Manipulations
1 query reformulation2
1. Query Reformulation
  • E.g. For ‘where’ questions move ‘is’ to all possible locations –
    • Q: What is relative humidity?
      • [“is relative humidity”, LEFT, 5]
      • [”relative is humidity”, RIGHT, 5]
      • [”relative humidity is”, RIGHT, 5]
      • [”relative humidity”, NULL, 2]
      • [”relative” AND “humidity”, NULL, 1]
  • Some rewrites may be non-sensical
2 query search engine
2. Query Search Engine
  • Send all rewrites to a Web search engine
  • Retrieve top N answers (100-200)
  • For speed, rely just on search engine’s “snippets”, not the full text of the actual document
3 n gram harvesting
3. N-gram Harvesting
  • Process the snippet to retrieve string to left/right of query
  • Enumerate all n-grams (1, 2 and 3)
  • Score of n-gram -
    • Occurrence frequency weighted by ‘weight’ of rewrite rule that fetched the summary
    • Formula:
4 filtering answers
4. Filtering Answers
  • Apply filters based on question-types of queries
    • Regular Expressions
    • Natural Language Analysis
      • E.g. “Genghis Khan”, “Benedict XVI”
  • Boost score of answer when it matches expected answer-type
  • Remove answers from candidate list
    • When set of answers is a closed set
      • “Which country …” , “How many times …”
5 answer tiling
5. Answer Tiling
  • Shorter N-grams have higher weights
    • Solution: Perform tiling
  • Combine overlapping shorter n-grams into longer n-grams
  • Score = maximum(constituent n-grams)
  • E.g.

Pierre Baron (5)

Pierre Baron de Coubertin (20)

Baron de Coubertin (20)

de Coubertin (10)

6 answer projection
6. Answer Projection
  • Retrieve support ing documents from document collection for each answer
  • Use a standard IR system
    • IR Query : Web-Query + Candidate Answer
falcon boosting knowledge for qa systems

FALCON(Boosting Knowledge for QA systems)

Arun Karthikeyan Karra


falcon introduction
FALCON Introduction
  • Another QA system
  • Integrates syntactic, semantic and pragmatic knowledge for achieving better performance
  • Handles question reformulations, incorporates Wordnet semantic net, performs unifications on semantic forms to extract answers
architecture of falcon
Architecture of FALCON

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

working of falcon a gist
Working of FALCON: A gist

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

question reformulations 1
Question Reformulations (1)

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

question reformulations 2
Question Reformulations (2)

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

expected answer type 2
Expected Answer Type (2)

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

semantic knowledge
Semantic Knowledge

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

key words and alternations
Key words and Alternations
  • Morphological Alternations
  • Lexical Alternations
    • Who killed Martin Luther King?
    • How far is the moon?
  • Semantic Alternations
results reported
Results Reported
  • 692 Questions
  • Key word alternations used for 89 questions
  • TREC-9 (Text Retrieval Conference)

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

  • Question Answering requires more complex NLP techniques compared to other forms of Information Retrieval
  • Two main approaches;
    • Data Redundancy – AskMSR
    • Boosting Knowledge Base – FALCON
  • Ultimate Goal : System which we can ‘talk’ to
  • There is a long way to go ... And a lot more money to come 
  • Data Intensive Question Answering, Eric Brill, TREC-10, 2001
  • An Analysis of the AskMSR Question-Answering System, Eric Brill et. al., Proceedings of the Conference on Empirical Methods in Natural Association for Computational Linguistics. Language Processing (EMNLP), Philadelphia, July 2002, pp. 257-264.
  • FALCON: Boosting Knowledge for Answer Engines, SandaHarabagiu, Dan Moldovan et. al., Southern Methodist University, TREC-9, 2000.
  • Wikipedia