Processing Semantic Relations Across Textual Genres

Processing Semantic Relations Across Textual Genres Bryan Rink University of Texas at Dallas December 13, 2013

Outline • Introduction • Supervised relation identification • Unsupervised relation discovery • Proposed work • Conclusions

Motivation • We think about our world in terms of: • Concepts (e.g., bank, afternoon, decision, nose) • Relations (e.g. Is-A, Part-Whole, Cause-Effect) • Powerful mental constructions for: • Representing knowledge about the world • Reasoning over that knowledge: • From Part-Whole(brain, Human) and Is-A(Socrates, Human) • We can reason that Part-Whole(brain, Socrates)

Representation and Reasoning • Large general knowledge bases exist: • WordNet, Wikipedia/DBpedia/Yago, ConceptNet, OpenCyc • Some domain specific knowledge bases exist: • Biomedical (UMLS, • Music (Musicbrainz) • Books (RAMEAU) • All of these are available in the standard RDF/OWL data model • Powerful reasoners exist for making inferences over data stored in RDF/OWL • Knowledge acquisition is still the most time consuming and difficult among these

Relation Extraction from Text • Relations between concepts are encoded explicitly or implicitly in many textual resources: • Encyclopedias, news articles, emails, medical records, academic articles, web pages • For example: • “The report found Firestone made mistakes in the production of the tires.”  Product-Producer(tires, Firestone)

Supervised Relation Identification • SemEval-2010 Task 8 – “Multi-Way Classification of Semantic Relations Between Pairs of Nominals” • Given a sentence and two marked nominals • Determine the semantic relation and directionality of that relation between the nominals. • Example: A small piece of rock landed into the trunk • This contains an Entity-Destination(piece, trunk) relation: • The situation described in the sentence entails the fact that trunk is the destination of piece in the sense of piece moving (in a physical or abstract sense) toward trunk.

Semantic Relations

Observations • Three types of evidence useful for classifying relations: • Lexical/Contextual cues • “The seniors poured flourinto wax paper and threw the items as projectiles on freshmen during a morning pep rally” • Knowledge of the typical role of one nominal • “The rootball was in a cratethe size of a refrigerator, and some of the arms were over 12 feet tall. • Knowledge of a pre-existing relation between the nominals • “The Ca content in the cornflourhas also a strong dependence on the pericarp thickness.”

Approach • Use an SVM classifier to first determine the relation type • Each relation type then has its own SVM classifier to determine direction of the relation • All SVMs share same set of 45 feature types which fall into the following 8 categories: • Lexical/Contextual • Hypernyms from WordNet • Dependency parse • PropBank parse • FrameNet parse • Nominalization • Nominal similarity derived from Google N-Grams • TextRunner predicates

System

Lexical/Contextual Features • Words between the nominals are very important: • Number of tokens between the nominals is also helpful: • Product-Producer, Entity-Origin often have zero: “organ builder”, “Coconut oil” • Additional features for: • E1/E2 words, E1/E2 part of speech, Words before/after the nominals, Prefixes of words between • Sequence of word classes between the nominals: • Verb_Determiner, Preposition_Determiner, Preposition_Adjective_Adjective, etc.

Example Feature Values • Sentence: Forward [motion]E1of the vehicle through the air caused a [suction]E2on the road draft tube. • Feature values: • e1Word=motion, e2Word=suction • e1OrE2Word={motion, suction} • between={of, the, vehicle, through, the, air, caused, a} • posE1=NN, posE2=NN • posE1orE2=NN • posBetween=I_D_N_I_D_N_V_D • distance=8 • wordsOutside={Forward, on} • prefix5Between={air, cause, a, of, the, vehic, throu, the}

Parsing Features • Dependency Parse (Stanford parser) • Paths of length 1 from each nominal • Paths of length 2 between E1 and E2 • PropBank SRL Parse (ASSERT) • Predicate associated with both nominals • Number of tokens in the predicate • Hypernyms of predicate • Argument types of nominals • FrameNet SRL Parse (LTH) • Lemmas of frame trigger words, with and without part of speech • Also make use of VerbNet to generalize verbs from dependency and PropBank parses

Example Feature Values • Sentence: Forward [motion]E1of the vehicle through the air caused a [suction]E2on the road draft tube. • Dependency • <E1>nsubjcauseddobj<E2> • <E1>nsubjvn:27dobj<E2> • VerbNet/Levin class 27 is the class of engender verbs such as: cause, spawn, generate, etc. • This feature value indicates that E1 is the subject of an engender verb, and the direct object is E2 • PropBank • Hypernyms of the predicate: cause#v#1, create#v#1

Nominal Role Affiliation Features • Sometimes context is not enough and we must use background knowledge about the nominals • Consider the nominal: writer • Knowing that a writer is a person increases the likelihood that the nominal will act as a Producer or an Agency • Use WordNet hypernyms for the nominal’s sense determined by SenseLearner • Additionally, writer nominalizes the verb write, which is classified by Levin as a “Creation and Transformation” verb. • Most likely to act as a Producer • Use NomLex-Plus to determine the verb being nominalized and retrieve the Levin class from VerbNet

Google N-Grams for Nominal Role Affiliation • Semantically-similar nominals should participate in the same roles • They should also occur in similar contexts in a large corpus • Using Google 5-grams, the 1,000 most frequent words appearing in the context of a nominal are collected • Using Jaccard similarity on those context words, the 4 nearest neighbor nominals are determined, and used as a feature • Also, determine the role most frequently associated with those neighbors

Example Values for Google N-Grams Feature • Sentence 4739: As part of his wicked plan, Pete promotes Mickey and his pals into the [legion]E1 of [musketeers]E2and assigns them to guard Minnie. • Member-Collection(E2 , E1) • E1 nearest neighbors: legion, army, heroes, soldiers, world • Most frequent role: Collection • E2 nearest neighbors: musketeers , admirals, sentries, swordsmen, larks • Most frequent role: Member

Pre-existing Relation Features • Sometimes the context gives few clues about the relation • Can use knowledge about a context-independent relation between the nominals • TextRunner • A queryable database of Noun-Verb-Noun triples from a large corpus of web text • Plug in E1 and E2 as the nouns and query for predicates that occur between them

Example Feature Values for TextRunner Features • Sentence: Forward [motion]E1of the vehicle through the air caused a [suction]E2on the road draft tube. • E1 ____ E2 : may result from, to contact, created, moves, applies, causes, fall below, corresponds to which • E2 ____ E1 : including, are moved under, will cause, according to, are effected by, repeats, can match

Results

Learning Curve 82.19 79.93 77.02 73.08 Training Size

Ablation Tests • All 255 (= 28 – 1) combinations of the 8 feature sets were evaluated by 10-fold cross validation Lexical is the single best feature set, Lexical+Hypernym is the best 2-feature set combination, etc.

Other Supervised Tasks Causal relations between events – FLAIRS 2010

Causal Relations Between Events • Discovered graph patterns that were then used as features in a supervised classifier • Example pattern: • “Under the agreement”, “In the affidavits”, etc.

Detecting Indications of Appendicitis in Radiology Reports • Submitted to AMIA TBI 2013

Resolving Coreference in Medical Records • i2b2 2011 and JAMIA 2012 • Approach • Based on Stanford Multi-Pass Sieve method • Added supervised learning by introducing features to each pass • Showed that creating a first pass which identifies all the mentions of the patient provides a competitive baseline

Extracting Relations Between Concepts in Medical Records • i2b2 2010 Shared Task and JAMIA 2011

Supervised Relations Conclusion • Identifying semantic relations requires going beyond contextual and lexical features • Use the fact that arguments sometimes have a high affinity for one of the semantic roles • Knowledge of pre-existing relations can aid classification when context is not enough

Relations in Electronic Medical Records • Medical records contain natural language narrative with very valuable information • Often in the form of a relation between medical treatments, tests, and problems • Example: • … with the [transfusion] and [IV Lasix] she did not go into [flash pulmonary edema] • Treatment-Improves-Problem relations: • (transfusion, flash pulmonary edema) • (IV Lasix, flash pulmonary edema)

Relations in Electronic Medical Records • Additional examples: • [Anemia] secondary to [blood loss]. • A causal relationship between problems • On [exam] , the patient looks well and lying down flat in her bed with no [acute distress] . • Relationship between a medical test (“exam”) and what it revealed (“acute distress”). • We consider both positive and negative findings.

Relations in Electronic Medical Records • Utility • Detected relations can aid information retrieval • Automated systems which review patient records for unusual circumstances • Drugs prescribed despite previous allergy • Tests and treatments never performed despite recommendation

Relations in Electronic Medical Records • Unsupervised detection of relations • No need for large annotation efforts • Easily adaptable to new hospitals, doctors, medical domains • Does not require a pre-defined set of relation types • Discover relations actually present in the data, not what the annotator thinks is present • Relations can be informed by very large corpora

Unsupervised Relation Discovery • Assumptions: • Relations exist between entities in text • Those relations are often triggered by contextual words: trigger words • Secondary to, improved, revealed, caused • Entities in relations belong to a small set of semantic classes • Anemia, heart failure, edema: problems • Exam, CT scan, blood pressure: tests • Entities near each other in text are more likely to have a relation

Unsupervised Relation Discovery • Latent Dirichlet Allocation baseline • Assume entities have already been identified • Form pseudo-documents for every consecutive pair of entities: • Words from first entity • Words between the entities • Words from second entity • Example: • If she has evidence of [neuropathy] then we would consider a [nerve biopsy] • Pseudo-document: {neuropathy, then, we, would, consider, a, nerve, biopsy}

Unsupervised Relation Discovery • These pseudo-documents lead LDA to form clusters such as:

Unsupervised Relation Discovery • Clusters formed by LDA • Some good trigger words • Many stop words as well • No differentiation between: • Words in first argument • Words between the arguments • Words in second argument • Can do a better job • By better modeling the linguistic phenomenon

Relation Discovery Model (RDM) • Three observable variables: • w1 : Token from the first argument • wc : Context word (between the arguments) • w2 : Tokens from the second argument • Example: • Recent [chest x-ray] shows [resolving right lower lobe pneumonia] . • w1: {chest, x-ray} • wc: {shows} • w2: {resolving, right, lower, lobe, pneumonia}

Relation Discovery Model (RDM) • In RDM: • A relation type (tr) is generated • Context words (wc) are generated from: • Relation type-specific word distribution (showed, secondary, etc.); or • General word distribution (she, patient, hospital) • Relation type-specific semantic classes for the arguments are generated • e.g. a problem-causes-problem relation would be unlikely to generate a test or a treatment class • Argument words (w1, w2) are generated from argument class-specific word distributions • “pneumonia”, “anemia”, “neuropathy” from a problem class

Relation Discovery Model (RDM) • Graphical model:

Experimental Setup • Dataset • 349 medical records from 4 hospitals • Annotated with: • Entities: problems, treatments, tests • Relations: Used to evaluate our unsupervised approach • Treatment-Addresses-Problem • Treatment-Causes-Problem • Treatment-Improves-Problem • Treatment-Worsens-Problem • Treatment-Not-Administered-Due-To-Problem • Test-Reveals-Problem • Test-Conducted-For-Problem • Problem-Indicates-Problem

Results • Trigger word clusters formed by the RDM:

Results • Instances of “connected problems” Last example is actually a Treatment-Administered-For-Problem

Results • Instances of “Test showed”

Results • Instances of “prescription”

Results • Instances of “prescription 2”

Results • Discovered Argument Classes

Evaluation • Two versions of the data: • DS1: Consecutive pairs of entities which have a manually identified relation between them • DS2: All consecutive pairs of entities • Train/Test sets: • Train: 349 records, with 5,264 manually annotated relations • Test: 477 records, with 9,069 manually annotated relations

Evaluation • Evaluation metrics • NMI: Normalized Mutual Information • An information-theoretic measure of how well two clusterings match • F measure: • Computed based on the cluster precision and cluster recall • Each cluster is paired with the cluster which maximizes the score

Processing Semantic Relations Across Textual Genres