210 likes | 350 Views
Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach. Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione Università di Milano Bicocca Italy. Maria Teresa Pazienza and Marco Pennacchiotti
E N D
Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione Università di Milano Bicocca Italy Maria Teresa Pazienza and Marco Pennacchiotti Department of Computer Science, Systems and Production University of Roma “Tor Vergata”
Classifying Textual Entailment (TE) Two dimensions Semantic dimension • paraphrasing (i.e., synonymy) • strict entailment Recognition dimension • semantic subsumption • America Airlines will lay off... America Airlines will fire ... • syntactic subsumption • American Airlines began laying off hundreds of flight attendants on Tuesday American Airlines will fire hundreds of flight attendants • direct implication • America Airlines will fire flight attendants hundreds of flight attendents will lose their jobs
semantic subsumption syntactic subsumption TE is a Graph Matching problem! Recognizing Textual Entailment (TE) T: H:
Graph Matching (GM) GM is used, for instance, in Image Recognition One Problem: distortions in the input graphs!!
Textual Entailment as Graph Matching (GM) Known limitations • distortion in the input syntactic/semantic graphs (errors in parsing, word sense disambiguation, etc.) • matching nodes is more complex than simple label matching • syntactic transformations should be an invariant phenomenon (nominalization, passivization, argument movement, ...) • textual entailment relation is an asimmetric relation Textual Entailment Measure
What’s next Step 1 • Definition of the syntactic representation model (Extended Dependency Graph, XDG) Step 2: Rule-based Approach • Definition of the Graph Matching measure for the textual entailment relation Step 3: SVM-based Approach • Using a SVM to evaluate parameters of the Graph matching measure Step 4 • Preliminary analysis of the results on the development set
Extended Dependency Graph (XDG) • C are constituents • syntactic head • potential semantic governor • D are dependencies among constituents
GM on XDG: definitions • Isomorphic subsumption if two biiective functions fc and fd exist • Subgraph isomorphic subsumption if it exists so that • Maximal Common Subsumption Subgraph (MCSS) given and , is the MCSS if and then
Finding the bijective function and evaluating the measure • Step 1 Constituent matching (fc:ChCt bijective) • Step 2 Dependency matching (fd:DhDt bijective) • Step 3 Define MCSS using fc and fd • Step 4 Evaluate Similarity Measure on MCSS
Constituent Similarity • Degree of similarity where t h Parameter Box a
AL Dependency Similarity • Degree of Similarity Parameter Box a
dependencies constituents Textual Entailment Measure Finally.... textual entailment holds if >t Parameter Box a,d,t
Some more details • Syntactic Transformation • nominalization • passive form • Other phenomena • be-sentences vs appositions, e.g., the president of XYZ is ... • treating the not
Estimating Parameters with SVM • Main idea: divide the Graph Matching measure in many subparts • Assumptions • The hypothesis H is a simple S-V-O sentence • SVM must learn parameters and thresholds • A possibility: • Feature space divided in three parts: • Subject Related Features • Main Verb Related Features • Object Related Features
Feature Spaces T: H:
Feature Spaces • Percent of common tokens and lemmas • Task • Structural (Graph) Features • Subgraph matching indicators • Mean number of commonly anchored dependencies within constituents
Used Resources • Chaos: A modular and lexicalised parser for English and Italian (Basili&Zanzotto, 1998, 2002) based on the extended dependency graph (XDG) formalism • WordNet • SVMlight
Preliminary analysis (Rule-based System) Analysis of a on dev1 we decided for: a=0.85 g=0.85 d=0.5
winning horse! Preliminary analysis (SVM-based system) • Test Bed: dev1+dev2 • Test Method: 3-fold cross validation repeated 10 times
... and back to real life!!!! Comdex -- once among the world's largest trade shows, the launching pad for new computer and software products, and a Las Vegas fixture for 20 years -- has been canceled for this year. Los Vegas hosted the Comdex trade show for 20 years.