1 / 56

Evaluating and Improving Inference Rules via Crowdsourcing Naomi Zeichner

Evaluating and Improving Inference Rules via Crowdsourcing Naomi Zeichner Supervisors: Prof. Ido Dagan & Dr. Meni Adler. Inference Rules – important component in semantic a pplications. Q Where was Reagan raised?. A Reagan was brought up in Dixon. PERSON. ROLE. IE. Hiring Event.

shay
Download Presentation

Evaluating and Improving Inference Rules via Crowdsourcing Naomi Zeichner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating and Improving Inference Rules via Crowdsourcing Naomi Zeichner Supervisors: Prof. Ido Dagan & Dr. Meni Adler

  2. Inference Rules – important component in semantic applications Q Where was Reagan raised? AReagan was brought up in Dixon. PERSON ROLE IE Hiring Event Bob worked as an analyst for Dell X brought up in Y  X raised in Y X work as Y  X hired as Y RTE TextHypothesis analyst Bob 1

  3. Current State X reside in Y  X live in Y X reside in Y  X born in Y X criticize Y  X attack Y • Many algorithms for the automatic acquisition of inference-rules • Poor quality of automatically acquired rules • We would like an indication of how likely the rule is to extract correct rule-applications 2

  4. Our Goal • An efficient and reliable way to manually assess the validity of inference rules • Useful for two purposes: • Dataset for training and evaluation • Improving rule-base 3

  5. Outline 1 Current state Inference Rule-Base Evaluation 2 Our Framework Crowdsourcing Rule Application Annotations 3 Use Cases Evaluate & Improve Inference Rule-Base 4

  6. Outline 2 3 1 Current state Inference Rule-Base Evaluation Our Framework Crowdsourcing Rule Application Annotations Use Cases Evaluate & Improve Inference Rule-Base 4

  7. Evaluation - What are the options? 1 • Impact on end task • QA, IE, RTE • Pro: What interests an inference system developer • Con: Many components, address multiple phenomena • Hard to asses the effect of a single resource. Instance-based evaluation (Szpektor et al 2007., Bhagat et al. 2007) Pro: Simulates utility of rules in an application Yields high inter-annotator agreement. • Judge rule correctness directly • Pro: Theoretically most intuitive • Con: In fact hard to do • Often results in low inter-annotator agreement. 3 2 X reside in Y  X born in Y X criticize Y  X attack Y X reside in Y  X live in Y Inference Rule-Base Evaluation Crowdsourcing Rule Application Annotations Evaluate & Improve Inference Rule-Base 5

  8. Sentence LHS Sentence RHS Instance Based Evaluation Kim acquired new abilities at school. Kim acquired new abilities at school. Kim acquired new abilities at school. X acquire Y X buy Y Dropbox acquired Audiogalaxy. Dropbox acquired Audiogalaxy. Find LHS Sentence Rule Dropbox buy Audiogalaxy. Kim acquired school. No Entailing Invalid Yes Kim acquired abilities Yes RHS meaningful? Generate RHS Yes Kim buy abilities. Dropbox buy Audiogalaxy. No No Kim buy abilities. Not Entailing Inference Rule-Base Evaluation Crowdsourcing Rule Application Annotations Evaluate & Improve Inference Rule-Base 6

  9. Sentence LHS Hard to Replicate Sentence RHS Instance Based Evaluation – Issues Complex Find LHS Sentence Rule No Entailing Invalid Szpektor reported 43% Yes Yes RHS meaningful? Generate RHS Yes Requires lengthy guidelines & training No No Not Entailing Inference Rule-Base Evaluation Crowdsourcing Rule Application Annotations Evaluate & Improve Inference Rule-Base 7

  10. Crowdsourcing • Recent trend of using crowdsourcing for annotation tasks • Requires tasks to be • Coherent • Simple • Does not allow for • Long instructions • Extensive training Inference Rule-Base Evaluation Crowdsourcing Rule Application Annotations Evaluate & Improve Inference Rule-Base 8

  11. Requirements Summary • Replicable & Reliable • Rule applications: • Good representation of rule use • Coherent • Annotation process: • Simple • Communicate entailment without lengthy guidelines and training Inference Rule-Base Evaluation Crowdsourcing Rule Application Annotations Evaluate & Improve Inference Rule-Base 9

  12. Outline 1 3 Current state Inference Rule-Base Evaluation 2 Our Framework Crowdsourcing Rule Application Annotations Use Cases Evaluate & Improve Inference Rule-Base 10

  13. Overview Generation Rule Base Rule Applications Crowdsourcing Annotated Rule Applications Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 11

  14. Overview - Generation Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing Annotated Rule Applications Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 12

  15. Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing Annotated Rule Applications Rule Application Generation Rule:X shoot Y X attack Y shoot:V attack:V  obj obj subj subj X:N Y:N X:N Y:N Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 13

  16. Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing Annotated Rule Applications Rule Application Generation Sentence Extraction shoot:V 4:shoot:V obj obj subj subj X:N Y:N 5:one:N 3:manager:N det nn comp1 1:The:Det 6:of:Prep 2:bank:N Sentence: The bank manager shoots one of the robbers. pcomp-n 8:robber:N det 7:the:Det Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 14

  17. Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing Annotated Rule Applications Rule Application Generation Sentence Extraction shoot:V 4:shoot:V obj obj subj subj X:N Y:N 5:one:N 3:manager:N det nn comp1 1:The:Det 6:of:Prep 2:bank:N Sentence: The bank manager shoots one of the robbers. pcomp-n 8:robber:N det 7:the:Det Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 14

  18. Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing Annotated Rule Applications Rule Application Generation RHS Phrase Generation attack:V 4:shoot:V obj obj subj subj X:N Y:N 5:one:N 3:manager:N det nn comp1 1:The:Det 6:of:Prep 2:bank:N Sentence: The bank manager shoots one of the robbers. Phrase: X attack Y pcomp-n 8:robber:N det 7:the:Det Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 15

  19. Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing Annotated Rule Applications Rule Application Generation RHS Phrase Generation attack:V 4:shoot:V obj obj subj subj X:N Y:N 5:one:N 3:manager:N det nn comp1 1:The:Det 6:of:Prep 2:bank:N Sentence: The bank manager shoots one of the robbers. Phrase: manager attack one pcomp-n 8:robber:N det 7:the:Det Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 15

  20. Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing Annotated Rule Applications Rule Application Generation RHS Phrase Generation attack:V 4:shoot:V X obj obj subj subj Y X:N Y:N 5:one:N 3:manager:N det nn comp1 1:The:Det 6:of:Prep 2:bank:N Sentence: The bank manager shoots one of the robbers. Phrase: The bank manager attack one of the robbers. pcomp-n 8:robber:N det 7:the:Det Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 15

  21. Generation Sentence LHS Rule Base Rule Applications Generate RHS Find LHS Sentence Bonus Filter out ungrammatical sentences Crowdsourcing Annotated Rule Applications Rule Application Generation Sentence Filtering Problem: ‘Left phrase not entailed by sentence’ Cause: Parsing Errors Solution: Verify sentence parsing fight:V Sentence: They were first used as fighting dogs. LHS extraction: they fight dogs subj obj 53% of sentences filtered out X:N Y:N Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 16

  22. Sentence RHS Overview - Crowdsourcing Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing RHS meaningful? Yes Yes Annotated Rule Applications Entailing No No Not Entailing Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 17

  23. Generation Rule Base 1 Is a phrase meaningful? Rule Applications Generate RHS Find LHS Sentence Rule: X shoot Y X attack Y Sentence: The bank manager shoots one of the robbers. Phrase: The bank manager attack one of the robbers. Crowdsourcing Rule: X greet Y  X marry Y Sentence: Mr. Monk visits her, and she greets him with real joy. Phrase:she marry him RHS meaningful? Yes Sentence RHS Yes Annotated Rule Applications Entailing No Rule:X acquire YX buy Y Sentence:Kim acquired new abilities at school.Phrase:Kim buy abilities No Not Entailing Crowdsourcing: Simplify Process Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 18

  24. Generation Rule Base 1 Is a phrase meaningful? Rule Applications Generate RHS Find LHS Sentence Rule:X shoot YX attack Y Sentence: The bank manager shoots one of the robbers. Phrase: The bank manager attack one of the robbers. Crowdsourcing Rule: X greet Y  X marry Y Sentence: Mr. Monk visits her, and she greets him with real joy. Phrase:she marry him RHS meaningful? Yes Sentence RHS Yes Annotated Rule Applications Entailing No Rule:X acquire YX buy Y Sentence:Kim acquired new abilities at school.Phrase: Kim buy abilities No Not Entailing Crowdsourcing: Simplify Process The bank manager attack one of the robbers. she marry him Kim buy abilities Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 18

  25. Generation Rule Base 1 Is a phrase meaningful? Rule Applications Generate RHS Find LHS Sentence Crowdsourcing RHS meaningful? Yes Sentence RHS Rule:X shoot Y X attack Y Sentence: The bank manager shoots one of the robbers. Phrase: The bank manager attack one of the robbers. Yes Annotated Rule Applications Entailing Rule: X greet Y  X marry Y Sentence: Mr. Monk visits her, and she greets him with real joy. Phrase:she marry him No Rule: X acquire Y  X buy Y Sentence: Kim acquired new abilities at school.Phrase: Kim buy abilities No Not Entailing Crowdsourcing: Simplify Process The bank manager attack one of the robbers. Kim buy abilities she marry him Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 18

  26. Generation Rule Base 1 Is a phrase meaningful? Rule Applications Generate RHS Find LHS Sentence Crowdsourcing 2 Judge if a phrase is true given a sentence. RHS meaningful? Yes Sentence RHS Yes Annotated Rule Applications Entailing No No Not Entailing Crowdsourcing: Simplify Process The bank manager attack one of the robbers. Kim buy abilities she marry him Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 18

  27. Generation Rule Base 1 Is a phrase meaningful? Rule Applications Generate RHS Find LHS Sentence Crowdsourcing 2 Judge if a phrase is true given a sentence. RHS meaningful? Yes Sentence RHS Yes Annotated Rule Applications Entailing No No Not Entailing Crowdsourcing: Simplify Process The bank manager attack one of the robbers. The bank manager attack one of the robbers. Kim buy abilities she marry him she marry him Sentence: The bank manager shoots one of the robbers. Phrase: Sentence: Mr. Monk visits her, and she greets him with real joy. Phrase: Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 18

  28. Generation Rule Base 1 Is a phrase meaningful? Rule Applications Generate RHS Find LHS Sentence Crowdsourcing 2 Judge if a phrase is true given a sentence. RHS meaningful? Yes Sentence RHS Yes Annotated Rule Applications Entailing No No Not Entailing Crowdsourcing: Simplify Process The bank manager attack one of the robbers. Kim buy abilities she marry him she marry him Sentence: The bank manager shoots one of the robbers. Phrase: The bank manager attack one of the robbers. Sentence: Mr. Monk visits her, and she greets him with real joy. Phrase: she marry him Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 18

  29. 1 • Educating • “Confusing” examples used as gold with feedback if Turkers get them wrong Generation Rule Base Rule Applications Generate RHS Find LHS Sentence 2 • Enforcing • Unanimousexamples used as gold to estimate Turker reliability Crowdsourcing RHS meaningful? Yes Sentence RHS Yes Annotated Rule Applications Entailing No No Not Entailing Crowdsourcing: Communicate Entailment Gold standard – annotated rule applications Sentence: Michelle thinks like an artist Phrase: Michelle behave like an artist Feedback:No. It is quite possible for someone to think like an artist but not behave like an artist. Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 19

  30. Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing RHS meaningful? Yes Sentence RHS Yes Annotated Rule Applications Entailing No No Not Entailing Crowdsourcing: Aggregate Annotation • Each rule application is evaluated by 3 Turkers • Annotations aggregated by • Majority Vote • Bias correction for non-expert annotatorsmeasure (Snow et al. 2008) Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 20

  31. Generation Rule Base Rule Applications Generate RHS Find LHS Sentence Crowdsourcing RHS meaningful? Yes Sentence RHS Yes Annotated Rule Applications Entailing No No Not Entailing Crowdsourcing: Aggregate Annotation Snow’s Method xi – `true’ label of example i yiw – label provided by worker to example i Estimate of worker w’s probability to label Y or N given the `true’ label xi Calculated using performance on expert annotated examples Aggregated label Uniform distribution Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 21

  32. Crowdsourcing: Evaluation *considerably higher than the 0.65 kappa reported by Szpektor et al. (2007) Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations Agreement between Turker and Expert Annotations 22

  33. Requirements Summary • Replicable & Reliable • Rule applications: • Good representation of rule use • Coherent • Annotation process: • Simple • Communicate entailment without lengthy guidelines and training • Crowdsourcing – Parsing validation – Simple Wikipedia & argument sub-tree – Split tasks – Gold standard Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations 23

  34. Outline 1 2 Current state Inference Rule-Base Evaluation Our Framework Crowdsourcing Rule Application Annotations 3 Use Cases Evaluate & Improve Inference Rule-Base 24

  35. Use Cases Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base Our Goals • Find an efficient and reliable way to manually assess the validity of inference rules • Useful for two purposes: • Dataset for training and evaluation • Use Case 1: Evaluating Rule Acquisition Methods • Improving rule-base • Use Case 2: Improving Accuracy-estimate of Automatically Acquired Inference-rules 25

  36. Use Case 1: Data Set Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base Supplement study derived from this work (Zeichner et al. 2012) Generated rule applications using four inference rule learning methods Annotated each rule application using our framework After some filtering 6,567 rule applications remained 26

  37. Use Case 1: Output • non-entailment • passed to Task 2 Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base • Task 1 • 1,012 meaningless phrase • 5,555 meaningful phrase • Task 2 • 2,447 positive entailment • 3,108 negative entailment • Overall • 6,567 rule applications • Annotated for $1000 • About a week 27

  38. Use Case 1: Algorithm Comparison Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base 28

  39. Use Case 1: Results Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base Large scale data-set of rule-application annotations Quickly Reasonable cost Allowed comparison between different inference-rule learning methods 29

  40. Use Case 2: Setting shoot 0.245773 destroy 0.298797 bomb 0.30322 X Y Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base We follow the evaluation methodology by Szpektor et al. (2008) Implemented a naïve Information Extraction (IE) system Attack X Y  X attack Y X Y  X attack Y X Y  X attack Y Banks was convicted of shooting and killing a 16-year-old at a park in 1980 30

  41. Use Case 2: Rule Re-Scoring Methods Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base • Crowd Scorenumber of instantiations annotated as entailing out of those judged for the rule • Combined Scorelinear combination of Crowd and Rule learning score • Original Score (Baseline) score produced by rule-learning algorithm 31

  42. Use Case 2: Context Specific Instructions X shoot Y  X dismiss Y  X fire Y X fire Y Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base Context in which rules will be used must be reflected in crowdsourced annotations End-Position Annotation guidelines adapted to consider context in judgment 32

  43. Use Case 2: Evaluation Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base Comparison with manual expert ranking Performance on Information Extraction (IE) task 33

  44. Use Case 2: Evaluation – Manual Ranking Crowd Score convict X of Ysentence X for Y condemn X to Ysentence Y to X X serve Y sentence Y to X convict X of Y sentence Y to X convict X of Y sentence Y in X Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base Sentence Original Score convict X of Ysentence Y to X convict X of Ysentence X for Y X guilty of Y sentence X for Y X order Y X sentence Y convict X of Y Y sentence X Mean Average Precision Original Score: 0.47 Crowd Score: 0.80 34

  45. Use Case 2: Evaluation – IE Performance Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base Ranking Settings – Mean Average Precision 35

  46. Use Cases: Error Analysis Crowdsourced Annotation Performance ? Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base Ambiguity Sentence: members disagree with leadership Phrase: members take exception to leadership raise an objection take offense Entailment definitionSentence: A doctor claimed he died of stomach cancer Phrase: he die of stomach cancer 36

  47. Use Cases: Error Analysis Rule-base performance on the IE Task Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate& Improve Inference Rule-Base • Corpora Differences Event: Arrest-Jail Rule: X capture Y  X arrest Y From IE corpus: Sentence: American commandos captured a half brother of Saddam Hussein on Thursday Phrase: commandos arrest half brother From Simple Wikipedia: Sentence: In 1622 AD, Nurhaci's armies captured Guang Ning. Phrase: Nurhaci's armies arrest Guang Ning 37

  48. Future work Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate & Improve Inference Rule-Base Better corpus to use for rule-application generation Use the framework to determine rule-context Look at rule-base ranking as a Machine Learning problem `Learning to rank’ 38

  49. Thank You Conclusion Inference Rule-Base Evaluation CrowdsourcingRule Application Annotations Evaluate & Improve Inference Rule-Base Replicable framework High quality annotations quickly and at reasonable cost Hopefully encourage the use of inference-rules 39

  50. Generation: Creating the RHS instantiation • Template Linearization attack:V mod obj attack X in Y X:N in:Prep Y:N Inference Rule-Base Evaluation Evaluate & Improve Inference Rule-Base Crowdsourcing Rule Application Annotations

More Related