1 / 32

Question Answering Final Stage Presentation By Deepa Paranjpe

Question Answering Final Stage Presentation By Deepa Paranjpe. Talk Outline. What is QA? Why QA? Current QA methodologies Our Goal Our Approach Key techniques System Performance Failure Analysis Future Work: proposed solutions Conclusion. Problem definition.

dalmar
Download Presentation

Question Answering Final Stage Presentation By Deepa Paranjpe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Question Answering Final Stage Presentation By Deepa Paranjpe

  2. Talk Outline • What is QA? • Why QA? • Current QA methodologies • Our Goal • Our Approach • Key techniques • System Performance • Failure Analysis • Future Work: proposed solutions • Conclusion

  3. Problem definition Input: a factual natural language question Output: a ranked list of passages marked with the answer Input resource: a corpus Question: Who was the first Russian astronaut to do a space walk ? Output: …the Soyuz commander .<ans>Leonov</ans>was the first person to walk in space and is chief of cosmonaut training in Moscow… Question: Name the volcano that destroyed the ancient city of Pompeii ? Output: …are considered better preserved than those at Pompeii , which was also destroyed by <ans>Mount Vesuvius</ans> eruption…

  4. Why QA ? • Is keyword search sufficient ? • Can search engines bridge the Lexical chasm ? • Can search engines bridge the Semantic chasm ? • Questioning a corpus is more natural than querying • The corpus contains more information than what can be extracted from syntactic matching as done in search engines Who established the Theosophical society ? Dr. Annie Beasant, the founder of the Theosophical society, … Who killed Brutus ? X Brutus killed Caesar

  5. QA vs Keyword Search • QA is not same as keyword search • QA requires a shift of focus from Document Retrieval to Information Retrieval Search QA Input Processing Output

  6. Existing approaches Apriori, fixed classification Who question  lookout for a person Where question  look for a place Question Classification Question to Query Transformation Search engine Question Query Score = Keyword Match+ (0.65*Expanded Keyword Match) Ranked List of Documents • Matching an upper-cased term adds a 60% bonus • Matching a WordNet synonym discounts by 10% • Lower-case term matches after Porter stemming are discounted 30% Lexical Resources Passage Retrieval Passage containing the answer Answer selection Answer Conclusion: Current systems use hand-crafted, fine-tuned resources and magical formulas And parameters for scoring

  7. Our Goal • Provide a generic and clean recipe for QA • Provide a declarative treatment rather than an operational one • Avoid expert tuning and magical scoring functions • Avoid hand-crafted rule sets and taxonomies • Build a factual QA system based on these principles • Evaluate the system performance on the TREC corpus and old QA pairs that are provided

  8. Our Approach • Noisy simulation of question as an SQL query • Tokyo is the capital of which country ? select country from world (country, capital, ...) where capital = 'Tokyo • Identification of useful question tokens • That appear unchanged in the passage (selectors) • That specialize to answer tokens (said to provide the atype (answer type) ) • That act as attributes of the atype in the question and of the answer in the passage

  9. Relational view Question Atypeclues Attributeor columnname Locate whichcolumn to read Selectors Directsyntacticmatch Entity class IS-A Limit searchto certain rows “Landing zone” “Landingzone” Questionwords Answerpassage

  10. Key Techniques • Identify structure from the question • Identify atype clues • Easy: who, when, where, how many, how tall… • Harder: What… and which… • Identify selectors • Learn to identify the structure from QA pairs • Learn to identify selectors and atype clues • Use selectors for getting candidate passages using an IR system • Re-rank the passages using a learnt re-ranking function

  11. Getting the Atype clue • Atype of a question is represented as a WordNet (WN) synset • “what” and “which” under-constrained to give the atype • Get the atype from the focus of the question • Heuristic: focus = head of the NP VP NP PP WPNP NP NP VBZ DT NN IN NNP WP of Japan What is the capital

  12. Atype through cue phrase • Who, where, when provide the atype clue • Cue-phrase to WN-synset mapping is required • Discover mappings from QA pairs abstraction#n#6NNS how What entity#n#1, living_thing#n#1 achievement university fast far many rich city paper_money#n#1 currency#n#1 mile#n#3linear_unit#n#1 physical_phenomenon#n#1 rate#n#2magnitude_ relation#n#1 measure#n#3definite_quantity#n#1 area#n#1 geographical_area#n#1

  13. Connecting Atype with the answer • Answer should be a specialization of the Atype of the question • Measure of connection between Atype and answer: overlap of nodes on the path from Atype and answer to all the noun roots • HyperPath Similarity = | Ha  Ht | / |Ha  Ht | • Ha = set of atype hypernymy synsets • Ht = set of answer hypernymy synsets What is the capital of Japan ? … the conference was held in the city of Tokyo the headquarters of Japan

  14. Customized WordNet Similarity Measures • Glosses • Descriptive glosses : as given by WordNet • Eg. First noun sense of Volcano : a fissure in the earth's crust (or in the surface of some other planet) through which molten lava and gases erupt • Hypernymy glosses : using the hypernymy chain of a word • Eg. Gloss for volcano : {volcano, mountain, mountain, natural elevation, elevation, geological formation, formation, natural object, object, physical object, entity} • Similarity between two words w1#pos#sense and w2#pos#sense • Get the required gloss entries for the two words • Use either of the similarity metric • Cosine Similarity • Jaccard Similarity

  15. Learning to detect question focus • Training data: Questions tagged with focus words • Features to capture the intuition • Focus words are part of NPs • Generally the head of the NPs • Need to capture neighborhood phrase information • Train a classifier with the generated instances

  16. Phrases as instances Tokens as instances • Is the word under consideration, the head of a phrase(0/1) • Get the tightly enclosing phrase to which this word belongs to. In a phrase such as • (ADJP (JJ married) (PP (TO to))) the tightly enclosing phrase for ``to“ PP while for ``married" it is ADJP. • POS of this token • phrase that lies to the left of the phrase identified • phrase that lies to the right of the phrase identified • Depth of the phrase • POS of the head of the phrase • Type of the phrase the phrase type for the phrase (PP (IN of) (NP (NNP Japan))) is PP. • The phrase type of the phrase that lies to the left of this phrase • Eg., For instance (PP (IN of) (NP (NNP Japan))), • left phrase is (NP (DT the) (NN capital)) • The phrase type of the phrase that lies to the right of this phrase TREC 2000, trained on TREC 2002 Accuracy = 81.58% with Logistic Regressor TREC 2000, trained on TREC 2002 Accuracy = 79.62% with Logistic Regressor

  17. Learning to detect selectors • Which question words are likely to appear (almost) unchanged in an answer passage? • Features • Local: POS of word, POS of adjacent words, case info, proximity to wh-word • Global: #senses of the word (from WN), #docs containing the word • DecisionTree gives accuracy of 81%

  18. System Tasks • Offline Tasks • Data Preparation: cleaning the corpus, sentencification, creating passages, POS tagging, Named Entity tagging and indexing • Train the selector learner, focus word learner, the re-ranker • Discover the mappings from the cue phrases to WN synsets • Online Tasks • Identify selectors from the given question • Use selectors as query to a search engine (Lucene) • Retrieve the top 100—300 passages from the hits • Re-rank the passages using the re-ranker

  19. Learning to re-rank the passages • Instance for every QA pair • A passage that contains the answer has, • High similarity between some tokens in the passage(the answer tokens) with the atype clue Feature: max similarity of atype with passage tokens • High linear proximity between identified answer tokens and the selectors in the question Features: min, avg and max distance between selectors and likely answer tokens • Contains tokens that exactly match the identified selectors Feature: Whether selectors match • Is highly ranked by Lucene Feature: Lucene Rank

  20. <NUMEX TYPE="ORDINAL"> selector match WN similarity How many inhabitants live in the town of Albany Albany, a town of about 30,000 people is the capital of New York state and …

  21. Tokenizer POS Tagger ShallowParser AtypeExtractor Noun andverb markers Taggedquestion SelectorLearner Atype clues • Learning to rerank passages • Sample features: • Do selectors match? • Is some passage token a WordNet hyponym of question’s atype clue? • Does a non-selector passage token have large WordNet similarity with a selector in the question? • Min, avg, max linear token distance between selectors and variousmatches Is QA pair? Taggedpassage Tokenizer POS TaggerEntity Extractor WEKA LogisticRegression Rerankedpassages Putting the system together Question Keyword querygenerator Keyword query PassageIndex Candidatepassage Sentence splitterPassage indexer Corpus

  22. System Performance • MRR of over 0.7 • All question types benefit from re-ranking • Benefits differ by question type • “what” and “which” questions benefit a lot

  23. Failure Analysis TREC 2000 Performance Statistics : Questions for which no answer passage is retrieved in top 100 : 21.78% (151 out of 693) TREC 2002 Performance Statistics : Questions for which no answer passage is retrieved in top 100 : 35.27% (176 out of 499) High Tail • Data Preparation Problems • Failure Question: Where does dew come from ? • Reason: “dew” is not present in the answer passage but present in other passages in the document • Proper Nouns mis-match • Failure Question: Where did David Ogden stiers get his under-graduate degree ? • Reason: “David Ogden stiers” is not present in the corpus in this form

  24. Failure Analysis (cont’d) • Ambiguous questions • When did Muhammad live? • Question-type not supported • How do you say “French fries” in French ? • Non-conclusive Atypes • What do penguins eat ? • What lays blue egss ? Atype = entity • Difficult to form a pertinent query • What age should one be to get married in South California ? • What hair color can I use to just cover a little gray? • Inadequate NE tagging • What group sang the song “Happy Together” ?

  25. Future Directions • Query Processing • Refining the selector idea (soft and hard) • Identification of syntactic variants of hard selectors • Expanding soft selectors to synonyms and hypernyms • Using an expanded query • Processed selectors + focus words • What business was the source of John D. Rockefeller’sfortune ? • Query = (source OR origin) AND (fortune OR treasure) AND (John D. Rockefeller OR John Rockefeller) OR ( business OR occupation) • Using Backoff: Backoff after every variant query if #hits are too low or too high

  26. Using the Web as a supplementary corpus • Query the web using Google and retrieve passages for the query • Supplement the Lucene-retrieved TREC passages with the Web passages • Using the Web for creating knowledge resources • Definitions repository • is-a relation hierarchy • Synonyms repository

  27. Conclusion • A new view of QA as a “noisy simulation” of structured query • Recover structure info from question • Match extracted structure to passage • A system that gives competitive accuracy with minimum domain expertise or manual intervention • A generic framework adaptable to new corpora, ranking algorithms and new languages ! • Future work involves improving the accuracy at the ranking step and using a supplementary corpus

  28. References [1]Is factoid Question Answering an acquired skill ?Ganesh Ramakrishnan, Soumen Chakrabarti, Deepa Paranjpe, Pushpak Bhattacharyya To appear in proceedings of the Thirteenth International World Wide Web Conference (WWW2004), New York City, 17-22 May 2004 [2] Learning search engine specific query transformations for questionanswering. E. Agichtein, S. Lawrence, and L. Gravano. In Proceedings of the 10th World Wide Web Conference (WWW10), pages 169--178,2001. Proceedings of 18th International Conference on Data Engineering, San Jose, USA, 2002. [3] Apache Software Group. Jakarta lucene text search engine. GPL Library, 2002. http://jakarta.apache.org/lucene/. [4] Scaling question answering to the Web. C. Kwok, O. Etzioni, and D. S. Weld. In WWW, volume 10, pages 150--161, Hong Kong, may 2001. IW3C2 and ACM. http://www10.org/cdrom/papers/120/. [5] Web question answering: Is more always better? S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. In SIGIR, pages 291-298, aug 2002.

  29. Thank You !

  30. Re-ranking results • Categorical and numeric attributes • Used Logistic Regression • Good precision, poor recall • Rank of first correctpassage shifts downsubstantially TREC2000 Acc =84.88% TREC2002 Acc = 89.46%

  31. Learning to tag with Named Entities • Question: Where does the vice president live when in office? • Answer passage: … airspace over the U.S. Naval Observatory, the official residence of the vice president, Shumann said … • Problem: “U.S. Naval Observatory" is not known to be a “place" • Use of hand-crafted grammar rules, gazetteers and dictionaries in NE taggers available off-the-shelf • Difficult to customize them for the QA task • Solution: Build an NE tagger that learns to tag using a tagged corpus

  32. Learning Setup • Input: Text marked with NE tags • Processing: • Every sentence in the text is parsed • An instance for every phrase of the parse is created • A decision tree classifier is trained using these instances • Features: • Whether the first word in the phrase starts with a cap letter, all caps, contains any numeric quantity or contains any single char quantity (one feature for each) • POS tag of each entity in the phrase : one feature for every word • POS of the last word on the phrase at -1 distance from the current phrase • POS of the first word on the phrase at +1 distance from the current phrase • Whether it is the start of a sentence • Type of the phrase • How much percent of the tagged chunk is formed by this phrase (on a scale from 0 to 1) • Output: A decision tree that can tag with named entities • Current Tag Set used during training • COUNT • DATE:DAY • DATE • YEAR • ORGANIZATION • FEMALE:PERSON • MALE:PERSON • PERSON • LOCATION • CURRENCY:MEASUREMENT • LENGTH:MEASUREMENT • TITLE • TIME • Training Accuracy = 89.77%

More Related