Bootstrapping Subjective Nouns for NLP Applications

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe , Theresa Wilson Computing Science University of Pittsburgh CoNLL-03

Introduction(1/2) • Many Natural Language Processing applications can benefit from being able to distinguish between factual and subjective information . • Subjective remarks come in a variety of forms , including opinions , rants , allegations , accusations and speculation . • QA should distinguish between factual and speculative answers . • Multi-document summarization system need to summarize different opinions and perspectives . • Spam filtering systems must recognize rants and emotional tirades , among other things .

Introduction (2/2) • In this paper , we use Meta-Bootstrapping (Riloff and Jones 1999) , Basilisk (Thelen and Riloff 2002) algorithms to learn lists of subjective nouns : • Both bootstrapping algorithms automatically generated extraction patterns to identify words belonging to a semantic category . • We hypothesize that extraction patterns can also identify subjective words . • The Pattern “expressed <direct_object>” often extracts subjective nouns , such as “concern” , “hope” , “support” . • Both bootstrapping algorithm require only a handful of seed words and unannotated texts for training ; no annotated data is need at all .

Annotation Scheme • The goal of the annotation scheme is to identify and characterize expressions of private states in a sentence . • Private state is a general covering term for opinions , evaluations , emotions and speculations . • “ The time has come , gentleman , for Sharon , the assassin , to realize that injustice cannot last long” -> writer express a negative evaluation . • Annotator are also asked to judge the strength of each private state . A private state can have low , medium , high or extreme strength .

Corpus , Agreement Results • Our data consist of English-language versions of foreign news document from FBIS . • The annotated corpus used to train and test our subjective classifiers (the experiment corpus) consist of 109 documents with a total of 2197 sentences . • We use a separate , annotated tuning corpus to establish experiment parameters .

Extraction Pattern • In the last few years , two bootstrapping algorithms have been developed to create semantic dictionaries by exploiting extraction patterns . • Extraction patterns represent lexico-syntactic expression that typically rely on shallow parsing and syntactic role assignment . • “ <subject> was hired . ” • A bootstrapping process looks for words that appear in the same extraction patterns as the seeds and hypothesize that those words belong to the same semantic category .

Meta-Bootstrapping (1/2) • Meta-Bootstrapping process begins with a small set of seed words that represent a targeted semantic category (eg.” seashore ” is a location) and an unannotated corpus . • Step1 , MetaBoot automatically creates a set of extraction patterns for the corpus by applying syntactic templates . • Step2 , MetaBoot computes a score for each pattern based on the number of the seed words among its extractions . • The best pattern is saved and all of its extracted noun phrase are automatically labeled as the targeted semantic category .

Meta-Bootstrapping (2/2) • MetaBoot then re-scores the extraction patterns , using the original seed words plus the newly labeled words , and the process repeats . (Mutual Bootstrapping) • When the mutual bootstrapping process is finished , all nouns that were put into the semantic dictionary are re-evaluated. • Each noun is assigned a score based on how many different patterns extracted it . • Only the five best nouns are allowed to remain in the dictionary . • Mutual bootstrapping process starts over again using the revised semantic dictionary

Basilisk (1/2) • Step1 , Basilisk automatically creates a set of extraction patterns for the corpus and scores each pattern based on the number of seed words among its extraction . Basilisk Put the best patterns into a Pattern Pool . • Step2 , All nouns extracted by a pattern in the pattern pool are put into a Candidate Word Pool . • Basilisk scores each noun based on the set of patterns that extracted it and their collective association with the seed words . • Step3 , the top 10 nouns are labeled as the targeted semantic class and are added to dictionary .

Basilisk (2/2) • Then the bootstrapping process then repeats , using the original seed and the newly labeled words . • The major difference Basilisk and Meta-Bootstrapping : • Basilisk scores each noun based on collective information gathered from all patterns that extracted it . • Meta-Bootstrapping identify a single best pattern and assumes that everything it extracts belongs to the same semantic category . • In comparative experiment , Basilisk outperformed Meta-Bootstrapping .

Experimental Results (1/2) • We create the bootstrapping corpus , by gathering 950 new texts from FBIS and manually selected 20 high-frequency words as seed words . • We run each bootstrapping algorithm for 400 iterations , generating 5 word per iteration . Basilisk generates 2000 nouns and Meta-Bootstrapping generates 1996 nouns .

Experimental Results (2/2) • Next , we manually review 3996 words proposed by the algorithm and classify the words as StrongSubjective , Weak Subjective or Objective . X - the number of words generated Y - the percentage of those words that were manually classified as subjective

Subjective Classifier (1/3) • To evaluate the subjective nouns , we train a Naïve Bayes classifier using the nouns as features . We also incorporated previously established subjectivity clues , and added some new discourse features . • Subjective Noun Features : • We define four features BA-Strong , BA-weak , MB-Strong , MB-Weak to represent the sets of subjective nouns produced by bootstrapping algorithm . • We create a three-valued feature based on the presence of 0 , 1 , >=2 words from that set .

Subjective Classifier (2/3) • WBO Features : • Wiebe , Bruce and O’Hara (1999) , a machine learning system to classify subjective sentences . • Manual Features : • Levin 1993 ; Ballmer and Brennenstuhl 1981 • Some fragment lemmas with frame element experiencer (Baker et al. 1998) • Adjectives manually annotated for polarity (Hatzivassiloglou and McKeown 1997 ) • Some subjective clues list in (Wiebe 1990)

Subjective Classifier (3/3) • Discourse Features : • We use discourse feature to capture the density of clues in the text surrounding a sentence . • First , we compute the average number of subjective clues and objective clues per sentence . • Next , we characterize the number of subjective and objective clues in the previous and next sentence as : higher-than-expected (high) , lower-than-expected (low) , expected (medium) . • We also define a feature for sentence length .

Classification Result (1/3) • We evaluate each classifier using 25-fold cross validation on the experiment corpus and use paired t-test to measure significance at the 95% confidence level . • We compute Accuracy (Acc) as the percentage that match the gold-standard , and Precision (Prec) , Recall (Rec) with respect to subjective sentences . • Gold-standard : a sentence is subjective if it contains at least one private-state expression of medium or higher strength . • Objective class consist of everything else .

Classification Result (2/3) • We train a Naive Bays classifier using only the SubjNoun features . This classifier achieve good precision (77%) but only moderate recall (64%) . • We discover that the subjective nouns are good indicators when they appear , but not every subjective sentence contains a subjective noun .

Classification Result (3/3) • There is a synergy between these feature set : using both types of features achieves better performance than either one alone . • In Table 8 Row 1 , we use WBO + SubjNoun + manual + discourse feature . This classifier achieve 81.3% precision , 77.4% recall and 76.1% accuracy .

Conclusion • We demonstrate that weakly supervised bootstrapping techniques can learn subjective terms from unannotated texts. • Bootstrapping algorithms can learn not only general semantic category , but any category for which words appear in similar linguistic phrase . • The experiment suggest that reliable subjective classification require a broad array of features .

Bootstrapping Subjective Nouns for NLP Applications

Bootstrapping Subjective Nouns for NLP Applications

Presentation Transcript

Learning Subjective Adjectives From Corpora

Learning Subjective Nouns using Extraction Pattern Bootstrapping

Using Adjectives as Nouns

Bootstrapping

Learning Subjective Adjectives from Corpora

Subjective Complements: Predicate Adjectives and Predicate Nominatives(Nouns)

Bootstrapping

Bootstrapping information extraction from semi-structured web pages

Bootstrapping Mobile PINs Using Passwords

Bootstrapping

Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping

Relational Learning of Pattern-Match Rules for Information Extraction

Comparing Information Extraction Pattern Models

Bootstrapping Information Extraction from Semi-Structured Web Pages

Relational Learning of Pattern-Match Rules for Information Extraction

BOEMIE: Bootstrapping Ontology Evolution with Multimedia Information Extraction

Learning Subjective Nouns using Extraction Pattern Bootstrapping

Using Adjectives as Nouns

Sampling Approaches to Pattern Extraction

Bootstrapping