1 / 66

807 - TEXT ANALYTICS

807 - TEXT ANALYTICS. Massimo Poesio Lecture 9: Relation extraction. OTHER ASPECTS OF SEMANTIC INTERPRETATION. Identification of RELATIONS between entities mentioned Focus of interest in modern CL since 1993 or so Identification of TEMPORAL RELATIONS From about 2003 on

mahina
Download Presentation

807 - TEXT ANALYTICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 807 - TEXT ANALYTICS Massimo PoesioLecture 9: Relation extraction

  2. OTHER ASPECTS OF SEMANTIC INTERPRETATION • Identification of RELATIONS between entities mentioned • Focus of interest in modern CL since 1993 or so • Identification of TEMPORAL RELATIONS • From about 2003 on • QUALIFICATION of such relations (modality, epistemicity) • From about 2010 on

  3. TYPES OF RELATIONS • Predicate-argument structure (verbs and nouns) • John kicked the ball • Nominal relations • The red ball • Relations between events / temporal relations • John kicked the ball and scored a goal • Domain-dependent relations (MUC/ACE) • John works for IBM

  4. TYPES OF RELATIONS • Predicate-argument structure (verbs and nouns) • John kicked the ball • Nominal relations • The red ball • Relations between events / temporal relations • John kicked the ball and scored a goal • Domain-dependent relations (MUC/ACE) • John works for IBM

  5. Powell met Zhu Rongji battle wrestle join debate Powell and Zhu Rongji met consult Powell met with Zhu Rongji Proposition:meet(Powell, Zhu Rongji) Powell and Zhu Rongji had a meeting PREDICATE/ARGUMENT STRUCTURE meet(Somebody1, Somebody2) . . . When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))

  6. PREDICATE-ARGUMENT STRUCTURE • Linguistic Theories • Case Frames – FillmoreFrameNet • Lexical Conceptual Structure – JackendoffLCS • Proto-Roles – DowtyPropBank • English verb classes (diathesis alternations) - LevinVerbNet • Talmy, Levin and Rappaport

  7. a GM-Jaguar pact give(GM-J pact, US car maker, 30% stake) PROPBANK REPRESENTATION a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company. Arg0 that would give Arg1 *T*-1 an eventual 30% stake in the British company Arg2 the US car maker

  8. ARGUMENTS IN PROPBANK • Arg0 = agent • Arg1 = direct object / theme / patient • Arg2 = indirect object / benefactive / instrument / attribute / end state • Arg3 = start point / benefactive / instrument / attribute • Arg4 = end point • Per word vs frame level – more general?

  9. FROM PREDICATES TO FRAMES In one of its senses, the verb observe evokes a frame called Compliance: this frame concerns people’s responses to norms, rules or practices. The following sentences illustrate the use of the verb in the intended sense: • Our family observes the Jewish dietary laws. • You have to observe the rules or you’ll be penalized. • How do you observe Easter? • Please observe the illuminated signs.

  10. FrameNet FrameNet records information about English words in the general vocabulary in terms of • the frames (e.g. Compliance) that they evoke, • the frame elements (semantic roles) that make up the components of the frames (in Compliance, Norm is one such frame element), and • each word’s valence possibilities, the ways in which information about the frames is provided in the linguistic structures connected to them (with observe, Norm is typically the direct object). theta

  11. NOMINAL RELATIONS

  12. THE MUC AND ACE TASKS • Modern research in relation extraction, as well, was kicked-off by the Message Understanding Conference (MUC) campaigns and continued through the Automatic Content Extraction (ACE) and Machine Reading follow-ups • MUC: NE, coreference, TEMPLATE FILLING • ACE: NE, coreference, relations

  13. TEMPLATE-FILLING

  14. EXAMPLE MUC: JOB POSTING

  15. THE ASSOCIATED TEMPLATE

  16. AUTOMATIC CONTENT EXTRACTION (ACE)

  17. ACE: THE DATA

  18. ACE: THE TASKS

  19. RELATION DETECTION AND RECOGNITION

  20. ACE: RELATION TYPES

  21. OTHER PRACTICAL VERSIONS OF RELATION EXTRACTION • Biomedical domain (BIONLP, BioCreative) • Chemistry • Cultural Heritage

  22. HISTORY OF RELATION EXTRACTION • Before 1993: Symbolic methods (using knowledge bases) • Since then: statistical / heuristic based methods • From 1995 to around 2005: mostly SUPERVISED • More recently: also quite a lot of UNSUPERVISED / SEMI SUPERVISED techniques

  23. SUPERVISED RE: RE AS A CLASSIFICATION TASK • Binary relations • Entities already manually/automatically recognized • Examples are generated for all sentences with at least 2 entities • Number of examples generated per sentence isNC2 – Combination of N distinct entities selected 2 at a time

  24. GENERATING CANDIDATES TO CLASSIFY

  25. RE AS A BINARY CLASSIFICATION TASK

  26. NUMBER OF CANDIDATES TO CLASSIFY – SIMPLE MINDED VERSION

  27. THE SUPERVISED APPROACH TO RE • Most current approaches to RE are kernel-based • Different information is used • Sequences of words, e.g., through the GLOBAL CONTEXT / LOCAL CONTEXT kernels of Bunescu and Mooney / GiulianoLavelli & Romano • Syntactic information through the TREE KERNELS of Zelenko et al / Moschitti et al • Semantic information in recent work

  28. KERNEL METHODS: A REMINDER • Embedding the input data in a feature space • Using a linear algorithm for discovering non-linear patterns • Coordinates of images are not needed, only pairwise inner products • Pairwiseinner products can be efficiently computed directly from X using a kernel function K:X×X→R

  29. THE WORD-SEQUENCE APPROACH • Shallow linguistic Information: • tokenization • Lemmatization • sentence splitting • PoStagging Claudio Giuliano, Alberto Lavelli, and Lorenza Romano (2007), FBK-IRST: Kernel methods for relation extraction, Proc. Of SEMEVAL-2007

  30. LINGUISTIC REALIZATION OF RELATIONS Bunescu & Mooney, NIPS 2005

  31. WORD-SEQUENCE KERNELS • Two families of “basic” kernels • Global Context • Local Context • Linear combination of kernels • Explicit computation • Extremely sparse input representation

  32. THE GLOBAL CONTEXT KERNEL

  33. THE GLOBAL CONTEXT KERNEL

  34. THE LOCAL CONTEXT KERNEL

  35. LOCAL CONTEXT KERNEL (2)

  36. KERNEL COMBINATION

  37. EXPERIMENTAL RESULTS • Biomedical data sets • AIMed • LLL • Newspaper articles • Roth and Yih • SEMEVAL 2007

  38. EVALUATION METHODOLOGIES

  39. EVALUATION (2)

  40. EVALUATION (3)

  41. EVALUATION (4)

  42. RESULTS ON AIMED

  43. NON-SUPERVISED METHODS FOR RELATION EXTRACTION • Unsupervised relation extraction: • Hearst • Other work on extracting hyponymy relations • Extracting other relations: Almuhareb and Poesio, Cimiano and Wenderoth • Semi-supervised methods • KNOW-IT-ALL

  44. HEARST 1992, 1998: USING PATTERNS TO EXTRACT ISA LINKS • Intuition: certain constructions typically used to express certain types of semantic relations • E.g., for ISA: • The seabass IS A fish • Swimming, running AND OTHER activities • Vehicles such as cars, trucks and bikes

  45. TEXT PATTERNS FOR HYPONYMY EXTRACTION HEARST 1998: NP {, NP}* {,} or other NPbruises …… broken bones, and other INJURIESHYPONYM (bruise, injury) EVALUATION: 55.46% precision wrt WordNet

  46. THE PRECISION / RECALL TRADEOFF • X and other Y: high precision, low recall • X isa Y: low precision, high recall

  47. HEARST’ REQUIREMENTS ON PATTERNS

  48. OTHER WORK ON EXTRACTING HYPONYMY • CaraballoACL 1999 • Widdows & Dorow 2002 • Pantel & Ravichandran ACL 2004

  49. OTHER APPROACHES TO RE • Using syntactic information • Using lexical features

  50. Syntactic information for RE • Pros: • more structured information useful when dealing with long-distance relations • Cons: • not always robust • (and not available for all languages)

More Related