1 / 59

CSA3180: Natural Language Processing

CSA3180: Natural Language Processing. Information Extraction 2 Named Entities Question Answering Anaphora Resolution Co-Reference. Introduction. Slides partially based on talk by Lucian Vlad Lita Sheffield GATE Multilingual Extraction slides based on Diana Maynard’s talks

howie
Download Presentation

CSA3180: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3180: Natural Language Processing Information Extraction 2 Named Entities Question Answering Anaphora Resolution Co-Reference CSA3180: Information Extraction II

  2. Introduction • Slides partially based on talk by Lucian Vlad Lita • Sheffield GATE Multilingual Extraction slides based on Diana Maynard’s talks • Anaphora resolution slides based on Dan Cristea slides, with additional input from Gabriela-Eugenia Dima, Oana Postolache and Georgiana Puşcaşu CSA3180: Information Extraction II

  3. References • Fastus System Documentation • Robert Gaizauskas “IE Perspective on Text Mining” • Daniel Bikel’s “Nymble: A High Performance Learning Name Finder” • Helena Ahonen-Myka’s notes on FSTs • Javelin system documentation • MUC 7 Overview & Results CSA3180: Information Extraction II

  4. Named Entities • Named Entities • Person Name: Colin Powell, Frodo • Location Name: Middle East, Aiur • Organization: UN, DARPA • Domain Specific vs. Open Domain CSA3180: Information Extraction II

  5. unprocessed text AR annotated text AR golden standard Anaphora Resolution AR engine annotation tool fine-tuning comparison & evaluation CSA3180: Information Extraction II

  6. Anaphora Resolution • Text: • Nature of discourse • Anaphoric phenomena • Anaphora Resolution Engines: • Models • General AR Frameworks • Knowledge Sources CSA3180: Information Extraction II

  7. Anaphora Resolution Anaphora represents the relationbetween a “proform”(called an “anaphor”) and another term (called an "antecedent"), when the interpretation of the anaphor is in a certain way determined by the interpretation of the antecedent. Barbara Lust, Introduction to Studies in the Acquisition of Anaphora, D. Reidel, 1986 CSA3180: Information Extraction II

  8. Anaphora Example It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him. Orwell, 1984 anaphor antecedent anaphor antecedent CSA3180: Information Extraction II

  9. Anaphora • pronouns (personal, demonstrative, ...) • full pronouns • clitics (RO: dă-mi-l, IT: dammelo) • nouns • definite • indefinite • adjectives, numerals (generally associated with an ellipsis) • In this the play is expressionist1 in its approach to theme. • But it is also so1 in its use of unfamiliar devices... CSA3180: Information Extraction II

  10. Referential Expressions • mark the noun phrases • for each NP ask a question about it • keep as REs those NPs that can be naturally referenced in the question The policeman gotin the car in a hurry in order to catch the run-away thief. CSA3180: Information Extraction II

  11. Referential Expressions a. John was going down the street looking for Bill‘s house. b. He found it at the first corner. CSA3180: Information Extraction II

  12. Referential Expressions a. John was going down the street looking for Bill‘s house. b. He met him at the first corner. CSA3180: Information Extraction II

  13. Referential Expressions The empty anaphor Gianni diede una mela a Michele. Piu tardi,  gli diede un’arancia. [Not&Zancanara, 1996] John gave an apple to Michelle. Later on,gave her an orange. CSA3180: Information Extraction II

  14. Textual Ellipsis The functional (bridge) anaphora The state of the accumulator is indicated to the user.30 minutes before the complete uncharge, the computer signals for 5 seconds. [Strube&Hahn, 1996] CSA3180: Information Extraction II

  15. Events, States, Descriptions He left without eating1. Because of this1, he was starving in the evening. But, he adds, Priesley is more interested in Johnson living than in Johnson dead1. In this1 the play is expressionist in its approach to theme. [Halliday & Hassan, 1976] CSA3180: Information Extraction II

  16. Definite/Indefinite NPs Once upon a time, there was a king and a queen. And the king one day went hunting. Apollo took out hisbow... Take the elevator to the 4th floor. CSA3180: Information Extraction II

  17. Anaphora Resolution • State of the art in Anaphora Resolution: • Identity: 65-80% • Other: much less… CSA3180: Information Extraction II

  18. What is so difficult? Nothing – everything is so simple! John1 has just arrived. He1 seems tired. The girl1 leaves the trash on the table and wants to go away. The boy2 tries to hold her1 by the arm31; she1 escapes and runs; he2 calls her1 back. Caragiale, At the Mansion CSA3180: Information Extraction II

  19. What is so difficult? Nothing indeed, but imagine letting the machine go wrong... There‘s a pile of inflammable trash next to your car. You‘ll have to get rid of it. If the baby does not thrive on the raw milk, boil it. [Hobbs, 1997] CSA3180: Information Extraction II

  20. What is so difficult? Semantic restrictions Jeff1 helped Dick2 wash the car. He1 washed the windows as Dick2 waxed the car. He1 soaped a pane. Jeff1 helped Dick2 wash the car. He1 washed the windows as Dick2 waxed the car. He2 buffed the hood. [Walker, Joshi & Prince, 1997] CSA3180: Information Extraction II

  21. What is so difficult? Semantic corelates An elephant1hit the car with the trunk.The animal1had to be taken away not to produce other damages. * An animal1hit the car with the trunk.The elephant1had to be taken away not to produce other damages. CSA3180: Information Extraction II

  22. What is so difficult? Long distance recovery (pronominalization) • His re-entry into Hollywood came with the movie “Brainstorm”, • but its completion and release has been delayed by the death of co-star Natalie Wood. • He plays Hugh Hefner of Playboy magazine in Bob Fosse’s “Star 80.” • It’s about Dorothy Stratton, the Playboy Playmate who was killed by her husband. • He also stars in the movie “Class.” Los Angeles Times, July 18, 1983, cited in [Fox, 1986] CSA3180: Information Extraction II

  23. What is so difficult? Gender mismatches Mr. Chairman..., what is her position upon this issue? (political correctness!!) Number mismatches The governmentdiscussed ...They... CSA3180: Information Extraction II

  24. What is so difficult? Distributed antecedents John1 invited Mary2 to the cinema. After the movie endedthey3={1,2} went to a restaurant. CSA3180: Information Extraction II

  25. What is so difficult? Empty/non-empty anaphors Johngave an apple toMichelle. Later on, gave her an orange. Johngave an apple toMichelle. Later on, hegave her an orange. Johngave an apple toMichelle. Later on, this oneasks him for an orange. CSA3180: Information Extraction II

  26. Semantics are Essential Police ... They Teacher... She/He A car... The automobile A Mercedes... The car A lamp... The bulb CSA3180: Information Extraction II

  27. Gender match! Gender mismatch ! Semantics are not all • Pronouns - poor semantic features he[+animate, +male, +singular] she[+animate, +female, +singular] it[+inanimate, +singular] they [+plural] • Gender in Romance languages Ro. maşină = ea (feminine) Ro. automobil =el (masculine) • Anaphora resolution by concord rules Un camion a heurté une voiture. Celle-ci a été complètement détruite. (A truck hit a car. It was completely destroyed.) CSA3180: Information Extraction II

  28. Anaphora Resolution [Charniak, 1972] It order to do AR, one has to be able to do everything else. Once everything else is done AR comes for free. CSA3180: Information Extraction II

  29. Referential expressions Collect Filter a1, a2, a3, … an Preference Anaphora Resolution Most current anaphora resolution systems implement a pipeline architecture with three modules: • Collect: • determines the List of Potential Antecedents (LPAs). a1, a2, a3, … an • Filter: • eliminates from the LPA the referees that are incompatible with the referential expression under scrutiny. • Preference: • determines the most likely antecedent on the basis of an ordering policy. CSA3180: Information Extraction II

  30. Anaphora Resolution Models • [Hobbs, 1976] (pronominal anaphora) Naïve algorithm: • implies a surface parse tree • navigation on the syntactic tree of the anaphor‘s sentence and the preceding ones in the order of recency, each tree in a left-to-right, breadth-first manner A semantic approach: • implies a semantic representation of the sentences (logical expression) • a collection of semantic operations (inferences) • type of pronoun is important CSA3180: Information Extraction II

  31. Anaphora Resolution Models • [Lappin & Leass, 1994] (pronominal anaphora) • syntactic structures • an intrasentensial syntactic filtering • morphological filter (person, number, gender) • detection of pleonastic pronouns • salience parameters (grammatical role, parallelism of grammatical roles, frequency of mention, proximity, sentence recency) CSA3180: Information Extraction II

  32. Anaphora Resolution Models • [Sidner, 1981], [Grosz&Sidner, 1986] • focus/attentional based • give more salience to those semantic entities that are in focus • define where to look for an antecedent in the semantic structure of the preceding text (a stack in G&S‘s model) CSA3180: Information Extraction II

  33. AR Models: Centering • [Grosz, Joshi, Weinstein, 1983, 1995] • [Brennan, Friedman and Pollard, 1987] • Cf(u) = <e1, e2, ... ek> - an ordered list • Cb(u) = ei • Cp(u) = e1 • CON > RET > SSH > ASH Cb(u) = Cb(u-1) Cb(u)  Cb(u-1) Cb(u) = Cp(u) Cb(u)  Cp(u) CSA3180: Information Extraction II

  34. AR Models: Centering a. I haven’t seen Jeff for several days. b. Carl thinks he’s studying for his exams. c. I think he? went to the Cape with Linda. [Grosz, Joshi & Weinstein, 1983] Cf = (I=[I], [Jeff]) Cb = [I] Cf = ([Carl], he=[Jeff], [Jeff´s exams]) Cb = [Jeff] CSA3180: Information Extraction II

  35. Cf = (I=[I], he=[Jeff], [the Cape], [Linda]) Cb = [Jeff] Cf = (I=[I], he=[Carl], [the Cape], [Linda]) Cb = [Carl] AR Models: Centering b. Carl thinks he’s studying for his exams. c. I think he? went to the Cape with Linda. Cf = ([Carl], he=[Jeff], [Jeff´s exams]) Cb = [Jeff] Jeff RETAINING ABRUPT SHIFT CSA3180: Information Extraction II

  36. Anaphora Resolution Models • [Mitkov, 1998] • knowledge-poor approach • POS tagger, noun phrase rules • 2 previous sentences • definiteness, giveness, lexical reiteration, section heading preference, distance, terms of the field, etc. CSA3180: Information Extraction II

  37. General Framework Build a framework capable of easily accommodating any of the existing AR models, fine-tune them, practice with them to enhance performance (learning), eventually obtaining a better model CSA3180: Information Extraction II

  38. AR-model1 AR-model2 AR-model3 General Framework text AR-engine CSA3180: Information Extraction II

  39. The text layer b The semantic layer a evokes centera b evokes centera centera Co-References • Halliday and Hassan: a semantic relation, not a textual one Co-referential anaphoric relation a CSA3180: Information Extraction II

  40. real time 1 2 discourse time 1 2 story time 2 1 800 920 1000 1030 Time and Discourse • Discourse has a dynamic nature Time axes CSA3180: Information Extraction II

  41. his Dillard Dillard Cheshire Cheshire Resolution Moment Police officer David Cheshire went to Dillard's home. Putting his ear next to Dillard's head, Cheshire heard the music also. [Tanaka, 1999] CSA3180: Information Extraction II

  42. Resolution Delay • Sanford and Garrod (1989) • initiation point • completion point • Information is kept in a temporary location of memory CSA3180: Information Extraction II

  43. Cataphora – What is there? • The element referred to is anticipated by the referring element • Theories • scepticism • syntactic reality From the corner of the divan of Persian saddle-bags on which he was lying, smoking, as was his custom, innumerable cigarettes, Lord Henry Wotton could just catch the gleam of the honey-sweet and honey-coloured blossoms of a laburnum… Oscar Wilde, The Picture of Dorian Gray CSA3180: Information Extraction II

  44. I taught Gabriel to read. = Ro.L-am învatat pe Gabriel sa citeasca. No right reference needed in discourse processing • Introduction of an empty discourse entity • Addition of new features as discourse unfolds • Pronoun anticipation in Romanian CSA3180: Information Extraction II

  45. he John gender = masc number = sg sem = person name = John gender = masc number = sg ? gender = masc number = sg sem = person name = John Unique directionality in interpretation John he anaphora cataphora CSA3180: Information Extraction II

  46. b RE a projects fsa fsa evokes centera Automatic Interpretation • necessity for an intermediate level a The text layer fsa The restriction layer centera The semantic layer CSA3180: Information Extraction II

  47. projects projects no = sg sem=bicycle det = yes no = sg sem=¬human evokes evokes no = sg sem=bicycle det = yes Three Layer Approach to AR 1. John sold his bicycle 2. although Bill would have wanted it. his bicycle it The text layer …………………………………………… The restrictions layer …… ………………… The semantic layer ………… CSA3180: Information Extraction II

  48. t0 t1 t2 t3 Dillard Dillard Cheshire his fshis candidates={ , } fsDillard fsDillard fsCheshire Cheshire Dillard Delayed Interpretation Police officer David Cheshire went to Dillard's home. Putting his ear next to Dillard's head, Cheshire heard the music also. The text layer The restriction layer The semantic layer CSA3180: Information Extraction II

  49. t2 t1 t0 Lord Henry Wotton his he The text layer projection gender=masc number=sing sem= person name= Lord Henry Wotton The restriction layer evoking completes gender=masc number=sing sem= person name= Lord Henry Wotton gender=masc number=sing sem = person ? evoking initiates The semantic layer Delayed Interpretation From the corner of the divan of Persian saddle-bags on which he was lying, smoking, as was his custom, innumerable cigarettes, Lord Henry Wotton could just catch the gleam of the honey-sweet and honey-coloured blossoms of a laburnum… time CSA3180: Information Extraction II

  50. projects projects no = sg sem=bicycle det = yes no = sg sem=¬human evokes evokes no = sg sem=¬human The case of Cataphora 1. Although Bill would have wanted it, 2. John sold his bicycle to somebody else. it his bicycle The text layer …………………………………………… The restrictions layer …… ………………… no = sg sem=bicycle det = yes The semantic layer ………… CSA3180: Information Extraction II

More Related