Inductive Approaches to the Detection and Classification of Semantic Relation Mentions

Inductive Approaches to the Detection and Classification of Semantic Relation Mentions Depth Report Examination Presentation Gabor Melli August 27, 2007 http://www.gabormelli.com/2007/2007_DepthReport_Melli_Presentation.ppt

Overview • Introduction (~5 mins.) • Task Description (~5 mins.) • Predictive Features (~10 mins.) • Inductive Algorithms (~10 mins.) • Benchmark Tasks (~5 mins.) • Research Directions (~5 mins.)

Simple examples of the “shallow” semantics sought • “E. coli is a bacteria.” • RTypeOf (E. coli, bacteria) • “An organism has proteins.” • RPartOf(proteins, organism) • “IBM is based in Armonk, NY.” • RHeadquarterLocation (IBM, Armonk, NY)

Motivations • Information Retrieval • Researchers could retrieve scientific papers based on relations • E.g. “all papers that report localization experiments on V. cholera’s outer membrane proteins” • Judges could retrieve legal cases. • E.g. “all Supreme Court cases involving third party liability claims” • Information Fusion • Researchers could populate a database with semantic relations in research articles. • E.g. SubcellularLocalization(Organism,Protein,Location) • Activists could save resources when compiling statistics from newspaper reports. • Document Summarization, Question Answering, …

State-of-the-Art • Current focus is to automatically induce predictive patterns/classifiers. • Can be more quickly applied to a new domain than an engineered solution. • Human levels of competency are nearby. • F-measure: • 76% on the ACE-2004 benchmark task (Zhou et al, 2007) • 75% on a protein/gene interaction (Fundel et al, 2007) • 72% on the SemEval-2007 task (Beamer et al, 2007). • Though under simplified conditions • binary relations within a single sentence • perfectly classified entity mentions.

Shallow semantic analysis is challenging • Many ways to say the same thing • O is based in L.; L-based O …;Headquartered in L, O …; From its L headquarters, O … • Many relations to disambiguate from.

Next Section • Introduction • Task Description • Predictive Features • Inductive Algorithms • Benchmark Tasks • Research Directions

Task Description • Documents, Token, Sentences • Entity Mentions: Detected and Classified • Semantic Relation Cases and Mentions • Performance Metrics • Comparison with Information Extraction Task • What name for the task? • General Pipelined Process • Subtask: Relation Case Generation • Subtask: Relation Case Labeling • Naïve Baseline Algorithms • Documents, Token, Sentences • Entity Mentions: Detected and Classified • Semantic Relation Cases and Mentions • Performance Metrics • Comparison with Information Extraction Task • What name for the task? • General Pipelined Process • Subtask: Relation Case Generation • Subtask: Relation Case Labeling • Naïve Baseline Algorithms

Document, Tokens, Sentences

Entity Mentions are pre-Detected (and pre-Classified)

Semantic Relations • A relation with fixed set of two or more arguments. Ri(Arg1,…,Arga)Þ{TRUE, FALSE} • Examples: • TypeOf (E.coli, Bacteria)ÞTRUE • OrgLocation(IBM, Jupiter)ÞFALSE • SCL(V.cholerae, TcpC, Extracellular) ÞTRUE

Semantic Relation Cases • Some permutation of distinct entity mentions within the document. • D1: “E.coli1 is a bacteria2. As with all bacteria3, E.coli4 has a cytoplasm5” e – entity mentions amax – arguments c – relation cases C(Ri, D1, E1, E2) C(Ri, D1, E2, E1) … C(Rj, D1, E4, E3, E5) C(Rj, D1, E3, E4, E5)

RelationDetection{True,False} ? Relation Classification{1,2,…,r} Semantic RelationDetection vs. Classification Predict the semantic relation Rj associated with a relation mention. Predict whether this is a true mention of some semantic relation. C(R, Di, Ej,…,Ek) Þ ?

Test and Training Sets C(R1, D1, E1, E2)Þ F C(R1, D1, E1, E3)Þ T … C(Rr, Dd, E2, E3, E5)Þ F C(Rr, Dd, E3, E4, E5)Þ F C(R?, Dd+1, E1, E2) Þ ? … C(R?, Dd+k, Ex,…, Ey) Þ ?

Performance Metrics • Precision (P): probability that a test case that is predicted to have label True is tp. • Recall (R): probability that a True test case will be tp. • F-measure (F1): Harmonic mean of the Precision and Recall estimates. • Accuracy: Proportion of predictions with correct labels, True of False.

Pipelined Process Framework

Predictive Feature Categories • Token-based • Entity Mention Argument-based • Chunking-based • Shallow Phrase-Structure Parse Tree-based • Phrase-Structure Parse Tree-based • Dependency Parse Tree-based • Semantic Role Label-based • Token-based • Entity Mention Argument-based • Chunking-based • Shallow Phrase-Structure Parse Tree-based • Phrase-Structure Parse Tree-based • Dependency Parse Tree-based • Semantic Role Label-based

Vector of Feature Information “Protein1 is a Location1 lipoprotein requiredfor Location2 biogenesis.”

Token-based Features“Protein1is aLocation1 ...” • Token Distance • 2 intervening tokens • Token Sequence(s) • Unigrams • Bigrams

Token-based Features (cont.) • Stemmed Word Sequences • “banksÞ bank” • “schedulingÞ schedule” • Disambiguated Word-Sense (WordNet) • “bank” Þriver’s edge; financial inst.; row of objects • Token Part-of-Speech Role Sequences

Entity Mention-based Features • Entity Mention Tokens • IBMÞ1, Tierra del FuegoÞ3, … • Entity Mention’s Semantic Type • Semantic Class • Organization • Location • Subclass • Company; University; Charity • Country; Province; Region; City

Entity Mention Features (cont.) • Entity Mention Type • Name ÜJohn Doe, E. coli, periplasm, … • Nominal Üthe president, the country, … • Pronomial Ühe, she, they, it, … • Entity Mention’s Ontology Id • secreted; extracellularÞGO0005576 • E. coli; Escheria coliÞ571 (NCBI tax_id)

Phrase-Structure Parse Tree http://lfg-demo.computing.dcu.ie/lfgparser.html

Shortest-Path Enclosed Tree Loss of context?

Two types of subtrees proposed Elementary subtrees Elementary subtrees generalsubtrees Both approaches lead to an exponential number of subtree features!

Now we have a populatedfeature space

Inductive Approaches Available • Supervised Algorithms • Requires a training set • Semi-supervised Algorithms • Also accepts an unlabeled set • Unsupervised Algorithms • Does not use a training set • Most solutions restrict themselves to the task of detecting and classifying binary relation cases that are intra-sentential.

Supervised Algorithms • Discriminative model • Feature-based (state of the art) • E.g. k-Nearest Neighbor, Logistic Regression, … • Kernel-based (state of the art) • E.g. Support Vector Machine • Generative model • E.g. Probabilistic Context Free Grammars, and Hidden Markov Models

Feature-based Algorithms • Kambhatla, 2004 • Early proposal to the use a broad set of features. • Liu et al, 2007 • Proposed the use of features previously found to be predictive for the task of Semantic Role Labeling. • Jiang and Zhai, 2007 • Used bigram and trigram PS parse tree subtree features (and dependency parse tree subtrees). • Adding trigram-based features produced marginal improvement in performance; therefore marginal improvement likely by adding higher-order subtrees.

Kernel-based Induction • Zelenko et al, 2003; Culotta and Sorensen, 2004; Bunescu and Mooney, 2005; Zhao and Grishman, 2005; Zhang et al, 2006. • Require a kernel function, K(C1,C2)→ [0,∞], that maps any two feature vectors to a similarity score from within some transformed space. • If symmetric and positive definite then comparison between vectors can often be performed efficiently in a high-dimensional space. • If cases are separable in that space then the kernel attains the benefit of the high-dimensional space without explicitly generating the feature space.

Kernel by Zhang et al, 2006 • Applies the Convolution Tree Kernel proposed in (Collins and Duffy, 2001; Haussler, 1999) • Number of common subtrees Kc(T1,T2) • Nj is the set of parent nodes in tree Tj • D(n1, n2) evaluates the common sub-trees rooted at n1 and n2

Kernel computed recursively inO(|N1| ´ |N2|) • D(n1, n2)=0 If productions at n1 and n2 differ • D(n1, n2)=1´l if n1 andn2 are POS nodes • Otherwise, • #ch(ni) is the number of children of node ni • ch(n,k) is the kth child of node n • l, (0<l <1) is a decay factor

Generative Models Approaches • Earliest approach (Leek 1997; Miller 1998). • Instead of directly estimating model parameters for the conditional probability P(Y | X). • Estimate model parameters for P(X | Y) and P(Y) from the training set • Then apply Bayes rules to decide which label has the highest posterior probability. • If the model fits the data then the generated likelihood ratio estimate is known to be optimal

Two Approaches Surveyed • Probabilistic Context Free Grammars • Miller et al, 1998; Miller et al, 2000 • Hidden Markov Models • Leek, 1997 • McCallum et al, 2000 • Ray and Craven, 2001; Skounakis, Craven, and Ray, 2003

PCFG-based Model

Miller et al, 1998/2000 • From augmented representation learn a PCFG based on these trees. • Infer the maximum likelihood estimates of the probabilities based on the frequencies in the training corpus, along with an interpolated adjustment of lower order estimates to handle the (increased) challenge of data sparsity. • Parses of test cases that contain the semantic labels are predicted to be relation mentions.

Semi-Supervised Approaches • (Brin, 1998; Agichtein and Gravano, 2000) • Use token-based features • Apply resampling with replacement • Assume that relations in the training set are redundantly present and restated in test set. • (Shi et al, 2007) • Uses (Miller et al, 1998/2000) approach. • Uses a naïve baseline to convert unlabelled cases to true training cases.

Snowball’s Bootstrapping (Xia, 2006)

Unsupervised Use of Lexico-Syntactic Patterns • Suggested initially by (Hearst, 1992). • Applied to relation detection by (Pantel et al, 2004; Etzioni et al, 2005) • Sample patterns: • <Class> such as <Member1>, …, <Memberi> • <Class> like <Member1> and <Member2> • <Member> is a <Class> • <Class>, including <Member> • Suited for the detection of TypeOf() subsumption relations over large corpora.

Benchmark Tasks • Message Understanding Conference (MUC) • DARPA, (1989 – 1997), Newswire • TR task: Location_Of(ORG, LOC); Employee_of(PER, ORG); and Product_Of(ARTIFACT, ORG) • Automatic Content Extraction (ACE) • NIST, (2002 – …), Newswire • Relation Mention Detection: ~5 major, ~24 minor rels • Physical(E1,E2); Social(Personx, Persony); Employ(Org, Person); … • Protein Localization Relation Extraction • SFU, (2006 – …) • SubcellularLocation(Organism, Protein, Location)

Message Understanding Conference 1997 Miller et al, 1998

ACE-2003

Prokaryote Protein LocalizationRelation Extraction (PPLRE) Task

Research Directions • Additional Features/Knowledge • Inter-sentential Relation Cases • Relations with More than Two Arguments • Grounding Entity Mentions to an Ontology • Qualifying the Certainty of a Relation Case • Additional Features/Knowledge • Inter-sentential Relation Cases • Relations with More than Two Arguments • Grounding Entity Mentions to an Ontology • Qualifying the Certainty of a Relation Case

Additional Features/Knowledge • Expose additional features that can identify the more esoteric ways of expressing a relation. • Features from outside of the “shortest-path”. • Challenge: past open-ended attempts have reduced performance (Jiang and Zhi, 2007) • (Zhou et al, 2007) add heuristics for five common situations. • Use domain-specific background knowledge. • E.g. Gram-positive bacteria (such as M. tuberculosis) do not have a periplasm therefore do not predict periplasm.

Inter-sentential Relation Cases • Challenge: current approaches focus on syntactic features which cannot be extended beyond the sentence boundary. • Idea: apply Centering Theory (Hirano et al, 2007) • Idea: create a text graph and to apply graph mining. • Challenge: A significant increase in the proportion false relation cases. • Idea: a threshold on the number of pairings anyone entity mention can take.

Inductive Approaches to the Detection and Classification of Semantic Relation Mentions

Inductive Approaches to the Detection and Classification of Semantic Relation Mentions

Presentation Transcript

Intrusion Detection Approaches and Techniques

Legal Approaches to the Employment Relation

Inductive Classification

CS 391L: Machine Learning: Inductive Classification

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection

GeoRSS and its relation to the SEMANTIC WEB

Detection, segmentation and classification of heart sounds

Feature Based Approaches to Semantic Similarity

Automated Detection and Classification of NFRs

Algorithmic Detection of Semantic Similarity

Importance of Semantic Representation: Dataless Classification

Malware Classification And Detection

Contrasting Approaches To Semantic Knowledge Representation and Inference

Emotion Classification and Detection

Legal Approaches to the Employment Relation

Semantic Relation Detection in Bioscience Text

Importance of Semantic Representation: Dataless Classification