1 / 26

Discovery of Inference Rules for Question Answering

Discovery of Inference Rules for Question Answering. Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted by Peter Clark (Nov 2007). Preview. This is a most remarkable piece of work! Extent:

kin
Download Presentation

Discovery of Inference Rules for Question Answering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovery of Inference Rules for Question Answering Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted by Peter Clark (Nov 2007)

  2. Preview • This is a most remarkable piece of work! • Extent: • Hasegawa paraphrase database: 4,600 rules • Sekine paraphrase database: 211,000 rules • DIRT: 12,000,000 rules • Quality: ~50% are sensible (personal opinion) • Content: not just paraphrases but world knowledge

  3. Overview • Problem: There are many ways of saying the same thing “Who is the author of the Star Spangled Banner?” …Francis Scott Key wrote the “Star Spangled Banner” in 1814. …comedian-actress Roseanne Barr sang her famous shrieking rendition of the “Star Spangled Banner” before a San Diego Padres-Cincinnati Reds Game.

  4. Some Manual “Paraphrases” (from ISI)

  5. Problem… there are lots of paraphrases! Can we learn them automatically?

  6. DIRT Method • Take a binary relation, e.g., X buys Y. • Collect examples of “X buys…”, and “…buys Y”

  7. Method • Take a binary relation, e.g., X buys Y. • Collect examples of “X buys…”, and “…buys Y” - build a “tuple” database The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) …

  8. Method • Take a binary relation, e.g., X buys Y. • Collect examples of “X buys…”, and “…buys Y” • Find other relations with a similar distribution The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) … The following can be purchased: shares (56) companies (23) securities (21) power (20) dogs (14) Macs (10) firms (2) …

  9. Method • Take a binary relation, e.g., X buys Y. • Collect examples of “X buys…”, and “…buys Y” • Find other relations with a similar distribution The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) … The following can be purchased: shares (56) companies (23) securities (21) power (20) dogs (14) Macs (10) firms (2) … • 4. Collect as rules: “X buys Y” ↔ “X purchases Y”

  10. The Math • Can’t simply compare counts • as some words are more frequent than others The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) … “people” counts not very indicative of similarity, as “people” occurs with many relations

  11. The Math • Mutual information gives better notion of “importance” mi(x,y) = log [ p(x,y) / p(x) p(y) ] P(“company buy”)/p(“company”)p(“buy”) The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … Can compute from frequency counts: freq(“company buy *”) x freq(“* * *”) freq(“company * *”) x freq(“* buy *”) ] mi(“company buy”) = log [

  12. The Math • Now use a similarity measure, and sum over all words… Sum [ mi(“X buy”) + mi(“X purchase”) ] Xs that can “buy” AND “purchase” sim(“X buy”,”X purchase”) = Sum mi(“X buy”) + Sum mi(“X purchase”) Xs that can “buy” Xs that can “purchase” The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … The following can purchase: things (97) 0.3 companies (23) 8.2 people (21) 0.2 businesses (13) 4.3 governments (5) 7.6 …

  13. The Math • Now use a similarity measure, and sum over all words… Sum [ mi(“X buy”) + mi(“X purchase”) ] Xs that can “buy” AND “purchase” sim(“X buy”,”X purchase”) = Sum mi(“X buy”) + Sum mi(“X purchase”) Xs that can “buy” Xs that can “purchase” (.7+.3)“things” + (8.4+8.2)“companies” + (.5+.2)“people” = (.7+.5+8.4+8.1+4.5+2.3) + (.3+8.2+.2+4.3+7.6) The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … = 0.41 The following can purchase: things (97) 0.3 companies (23) 8.2 people (21) 0.2 businesses (13) 4.3 governments (5) 7.6 … (ignoring other words not shown on this page)

  14. The Math • Then combine “X verb *” and “* verb Y” scores: (take geometric average) sim(“X buy Y”,”X purchase Y”) = sim(“X buy”,”X purchase”) sim(“buy Y”,” purchase Y”)

  15. The Results

  16. The Results

  17. Use for Question-Answering …China provided Iran with decontamination materials… Question: “Iran received decontamination materials?” Answer: Yes! I have general knowledge that: IF X is provided with Y THEN X obtains Y Here: X = Iran, Y = materials Thus, here: We are told in T: Iran is provided with materials Thus it follows that: Iran obtains materials In addition, I know: "obtain" and "receive" mean roughly the same thing Hence: Iran received decontamination materials. DIRT Para- phrase WordNet

  18. ;;; T: "William Doyle works for an auction house in Manhattan." ;;; H: "The auction house employs William Doyle?" ;;; DIRT rule "N:for:V<work>V:subj:N -> N:subj:V<hire>V:obj:N" fired Yes! I have general knowledge that: IF Y works for X THEN X hires Y Here: X = the house, Y = Doyle Thus, here: We are told in T: Doyle works for the house Thus it follows that: the house hires Doyle In addition, I know: "hire" and "employ" mean roughly the same thing Hence: The auction house employs William Doyle. SCORE!!!!

  19. ;;; T: "The technician cooled the room." ;;; H: "The technician was cooled by the room?" ;;; DIRT rule "N:obj:V<cool>V:subj:N -> N:subj:V<cool>V:obj:N" fired Yes! I have general knowledge that: IF Y cools X THEN X cools Y Here: X = the room, Y = the technician Thus, here: We are told in T: the technician cools the room Thus it follows that: the room cools the technician Hence: The technician was cooled by the room. Rats, WRONG again

  20. Comments: The good… • I simplified to just “X verb Y” patterns • In fact does general “X <syntactic path> Y” • (uses minpar parser to find paths) • Finds world knowledge, not just paraphrases • Also finds junk… • Amazingly large and (relatively) high quality • It helped us (a bit) in our question-answering task • In our (limited) system on 800 test questions • DIRT rules were used 59 times (41 right + 18 wrong)

  21. Critique • Is limited to “X pattern1 Y” → “X pattern2 Y” rules • E.g., out of scope: • “John sold a car” → “John received money” • No word senses • “X turns Y” → “X transforms Y”, “X rolls Y”, “X sells Y” • No constraints on the classes (types) X and Y • “driven to X by Y” → “Y has a child in X”, “Y provides medical care for Y” • “The president was driven to the memorial by the general.” • Lin and Pantel point to this as a possible future direction • ~50% is noisy • Antonyms often found to correlate • “X loves Y” → “X hates Y” • No notion of time, tense, etc • “Fred bought a car” → “Fred owns a car” • “Fred sold a car” → “Fred owns a car”

  22. Summary and Comments • Is this The Answer to AI/Machine Reading? • Highly impressive knowledge collection • Can suggest implicit (unstated) knowledge in text • but: Learns only one inference pattern • (can imagine extending to others) • Noisy • (can imagine using more data for more reliability) • More significantly…. • Rules only tell us atomic things which might happen…

  23. Summary and Comments (cont) • Rules only tell us atomic things which might happen • Can reduce the noise, but in the end the rules are only possibilities “Illinois Senator Barack Obamavisited the state's most diverse high school to deliver a comprehensive K-12 education plan…”  “Obama arrived in the high school”  “Obama toured the high school” ? “Obama flew to the high school” ? “Obama told a reporter in the high school” !! “Obama inaugurated the high school” ?! “Obama headed a delegation to the high school” !! “Obama was arrested in the high school” ?! “Obama thanked the high school’s government”  “Obama stopped in the high school” !! “Obama held a summit in the high school” !! “Obama was evacuated from the high school”  “Obama was accompanied to the high school” …

  24. Summary and Comments (cont) • What is still needed: • Requires notions of: • Coherence; • macro-sized structures of how things combine (eg “scripts”); • reasoning about the implications; • knowledge integration; • search; • plausibility assessment and uncertainty reasoning constructing a “most coherent representation” (model) from the pieces which the knowledge base suggests.

  25. Summary and Comments (cont) The bottom line: • Rule learning from text: • A key bit of the puzzle of AI, overcomes part of the knowledge acquisition bottleneck • Provides much-needed bottom-up input to language understanding • BUT: only part of the solution: • provides low-level “knowledge fodder” but • not the mechanism for processing it into coherent and useful representations • Not the larger-level structures (scripts, typical scenarios) also required for understanding and “organizing the fodder”

More Related