1 / 14

Use of Patterns for Detection of Answer Strings

Use of Patterns for Detection of Answer Strings. Soubbotin and Soubbotin. Essentials of Approach. A certain shift from deep text analysis and NLP methods to surface techniques Use of formulas describing the structure of strings likely bearing certain semantic information. Example.

verlee
Download Presentation

Use of Patterns for Detection of Answer Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin

  2. Essentials of Approach • A certain shift from deep text analysis and NLP methods to surface techniques • Use of formulas describing the structure of strings likely bearing certain semantic information

  3. Example • FBI Director Louis Freeh • A person represented by his/her first/last names • A person occupies a post in an organization

  4. The formula • A word composed of capital letters • An item from a list of posts in an organization • An item from a list of first names • A capitalized word

  5. Patterns • Formulas of such kind were called “patterns” • First used at TREC-10 QA track • Each pattern is characterized by a certain generalized semantics

  6. Steps (Overview) • Identify strings corresponding to a formula • Identify the question terms (types) • Check for expressions negating the semantics of the found strings • Apply the set of formulas (for a particular question type) to match the strings in question-relevant passages

  7. A Surface Approach • No need to distinguish linguistic entities • Formulas for strings look like regular expressions • But patterns include elements referring to lists of predefined words/phrases

  8. Patterns and Question Types • Who is person X? • Who occupies post Y in organization Z? • A relationship is established between 2 or more entities: person, post, organization etc • Where-question: • suggest geographical items as answers • Construct formulas like: item from list of cities/towns/counties, countries/states.

  9. Examples • ”In what year” – questions • Find strings with a sequence of 4 digits • Questions regarding length, area, weight, speed, etc • Digits plus units of measurement • “What is the area of Venezuela?” • 340,569 square miles (a simple pattern match)

  10. Complex Patterns • Strings expressing relationship between several semantic entities • The more complex a pattern is, the higher its reliability

  11. Names and Dates • People Names • Items from first name list • Capitalized words • Specific name elements (bin, van, etc) • Abbreviations like Sr. and Jr. • Dates • Prepositions, articles, digits, month names, commas, dashes, brackets, phrases like “early,” “in the period of,” “years ago,” “B.C.”

  12. Pattern-Matching Strings and Question Semantics • How question words are located in the pattern-matching string (distance, left/right, position to other matching strings etc) • Simplicity of a pattern’s structure is compensated by complexity of rules • Without applying heuristic rules, sufficiently reliable results cannot be ensured • Rank assigned to question words/phrases and score assigned to candidate answers

  13. QA Process • Define question types for all questions • Order the questions with more reliable patterns • Form and rank queries from question terms • Modify queries (if score is below threshold) • Identify pattern-matching strings (apply complex and then simple) • Check correlation between patterns and question semantics • Identify exact answers and calculate their scores

  14. Analysis of Results • TREC 2002: • confidence-weighted score = 0.691 • 271 right answers, 209 wrong answers, 148 “no answer” • First 29 correct answers belonged to question types with highly reliable patterns • Incorrectly identified answer strings = 13.6% (excluding NIL answers)

More Related