1 / 44

Memory-based learning for noun phrase coreference resolution

Memory-based learning for noun phrase coreference resolution. Veronique Hoste. Outline. Noun phrase coreference resolution Definition Why? Problems A memory-based learning approach. Definition (Hirst, 81).

velika
Download Presentation

Memory-based learning for noun phrase coreference resolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory-based learning for noun phrase coreference resolution Veronique Hoste

  2. Outline • Noun phrase coreference resolution • Definition • Why? • Problems • A memory-based learning approach

  3. Definition (Hirst, 81) Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity.

  4. Definition (Hirst, 81) ANAPHOR Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity.

  5. Definition (Hirst, 81) ANTECEDENTor REFERENT Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity. ANAPHOR

  6. Definition (Hirst, 81) ANTECEDENTor REFERENT ANAPHOR Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity. RESOLUTION

  7. Example Kim Clijstershas won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.

  8. Example Kim Clijstershas won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.

  9. Example Kim Clijstershas won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her/ her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.

  10. Example Kim Clijstershas won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her/her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.

  11. Example Kim Clijsters has won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.

  12. Why? Weakness in existing IE systems Who: ….. What: ….. Where: ….. When: ….. How: ….. Information extraction

  13. Morphological and lexical knowledge Real-world knowledge Syntactic knowledge Anaphora resolution Semantic knowledge Discourse knowledge Coreference resolution, a complex problem

  14. Approaches • The past: mostly knowledge-based techniques (constraints and preferences) e.g. Lappin & Leass (1994), Baldwin (CogNIAC, 1996) • Recently: machine learning (C4.5) Redefine coreference resolution as a CLASSIFICATION task.

  15. A classification based approach • Given two entities in a text, NP1 and NP2, classify the pair as coreferent of not coreferent. • E.g. • [Clijsters] broke [[Farina Elia]’s second service game] but [[her] opponent] broke back immediately. [her opponent] - [Farina Elia’s second service game] coref? - [Farina Elia] coref? - [Clijsters] coref?

  16. Free text Tokenization POS tagging NP chunking NER Nested NP extraction GETTING STARTED

  17. Learner ingredients • Starting point: corpora annotated with coreferential chains • “About one month ago <COREF ID=“1”>American Airlines</COREF> sent <COREF ID=“2”> a delegation</COREF> to Brussels. <COREF ID=“3” TYPE=“IDENT” REF=“1”> The large air plane company </COREF> was interested in DAT and wished to discuss this interest with <COREF ID=“4”>the prime minister</COREF>. But <COREF ID=“5” TYPE=“IDENT” REF=“4”>Guy Verhofstadt</COREF> refused to see <COREF ID=“6” REF=“2”>the delegation</COREF>.”

  18. Two data sets • ENGLISH: MUC-6 (2141/2091 corefs) and MUC-7 (2569/1728 corefs) • The only datasets which are publicly available • Extensively used for evaluation • Articles from WSJ and NYT • DUTCH: KNACK-2002 • First Dutch coreferentially annotated corpus • Articles from KNACK 2002 on different topics: politics, science, culture, …

  19. Learner ingredients (ctd) • Training data to train and validate the machine learner • Procedure: n-fold cross-validation • partition the training data in n parts • repeat n times: take each part as test set and train on the remaining other parts • Hold-out test data to test the resulting learner

  20. Learner ingredients (ctd) • Creating instances • One instance for each pair of NPs • At the end of the instance: class values (both NPs are coreferential, not coreferential). E.g. [Clijsters] broke [[Farina Elia]’s second service game] but [[her] opponent] broke back immediately. [her opponent] - [Farina Elia’s second service game] not coreferential [her opponent] - [Farina Elia] coreferential [her opponent] - [Clijsters] not coreferential

  21. Learner ingredients (ctd) • Instance: describes the characteristics of two NPs and their context • Features per instance: • local context: words + POS • string matching features (complete match, partial match) E.g. president Bush, George W. Bush • grammatical: - pronoun, demonstrative, definite, proper noun

  22. Features (ctd) • grammatical (ctd): • number, gender • appositive • subject/object • semantic: • synonym, hypernym • alias • same named entity? • Distance in number of sentences and NPs

  23. Task Build a small instance base for the following sentences. • work from right to left • link every NP (the potential anaphor) to all its preceding NPs (the candidate antecedents) • build for each pair a vector with the following features • feature 1+2: gender • feature 3+4: number • feature 5: exact match (binary) • feature 6: partial match (binary) • feature 7+8: pronoun/demonstrative/definite/proper • feature 9: synonyms/hypernyms (binary)

  24. “About one month ago <COREF ID=“1”>American Airlines</COREF> sent <COREF ID=“2”> a delegation</COREF> to Brussels. <COREF ID=“3” TYPE=“IDENT” REF=“1”> The large air plane company </COREF> was interested in DAT and wished to discuss this interest with <COREF ID=“4”> prime minister Verhofstadt </COREF>. But <COREF ID=“5” TYPE=“IDENT” REF=“4”>Guy Verhofstadt</COREF> refused to see <COREF ID=“6” REF=“2”>the delegation</COREF>.”

  25. Resulting instance base • the delegation - prime minister Verhofstadt • the delegation - this interest • the delegation - DAT • the delegation - the large airplane company • the delegation - Brussels • the delegation - a delegation • Guy Verhofstadt - prime minister Verhofstadt • (…) NP pairs Neutral, person, singular, singular, no, no, definite, proper, no, nocoref Neutral, person, singular, singular, yes, yes, definite, indefinite, yes, coref Male, person, singular, singular, no, yes, proper, proper, yes, coref (…) Feed these instances to the learning algorithm

  26. Learning • TRAINING: • Input : set of training instances • Output: a coreference classifier • TESTING: • Input : new unseen instances • Output: classification

  27. Memory-based learning • Background: performance in real-world tasks is based on remembering past events rather than creating rules or generalizations • Lazy (vs. eager) : MBL keeps all training data in memory and only abstracts at classification time by extrapolating a class from the most similar items in memory to the new test item

  28. MBL components • memory-based learning component: During learning, the learning component adds new training instances to the memory without any abstraction or restructuring • similarity-based performance component: The classification of the most similar instance in memory is taken as classification for the new test instance

  29. In other words ... • Given (x1, y1), (x2, y2), (x3, y3), …. (xn, yn) • Task at classification time is to find the closest xi for a new data point xq

  30. Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours

  31. Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours

  32. Distance metrics When presenting a new instance for classification to the MBL learner, the learner looks in its memory in order to find all instances whose attributes are similarto the newly presented test instance.

  33. Distance metrics • How far are xi and xq? • Most basic metric: Overlap Metric (xq,xi) = ni=1 (xqi,xii) where  (xqi,xii) = 0 if xqi = xii  (xqi,xii) = 1 if xqi  xii

  34. Feature weighting • Problem: some features will be more informative for the prediction of the class label than others • Solution: feature selection or feature weighting • information gain weighting • gain ratio weighting • chi-squared weighting

  35. Information gain weighting • Expresses the average entropy reduction from a feature when its value is known H(C) = -  cC P(c) log2 P(c) wi = H(C) -  vVi P(v) x H(C|v) Problem: features with many possible values are favoured above features with fewer possible values

  36. Gain ratio weighting • Normalized version of information gain • = information gain divided by the entropy of the feature values wi = H(C) -  vVi P(v) x H(C|v) si(i) si(i) = -  vVi P(v) log2 P(v)

  37. Chi-squared weighting • Given: contingency table consisting of all classes and feature values • Chi square: measures the difference between the expected values and the observed values in each of the cells of the table (Eij -Oij)2 2 =  ij Eij n.j ni. Eij= n..

  38. Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours

  39. k • Nearest neighbours: the instances in memory which are near to the test item to be classified • The classification of these nearest neighbours is used as classification for the new test instance • Expressed byk • k = 1 : the instances with the nearest distance to the test instance are used for classification

  40. Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours

  41. Extrapolation from the nearest neighbours • Goal: decide which will be the class of a new test item • Approaches: • Majority voting: all nearest neighbours receive equal weight • Distance weighted voting: link the choice of classification to the distance between the nearest neighbours and the test item

  42. Potential problems

  43. Noise How will MBL handle many uninformative features?

  44. Skewedness E.g. 10% coreferential instances and 90% noncoreferential instances Does MBL suffer from skewed class distributions?

More Related