Learning subjective nouns using extraction pattern bootstrapping
1 / 19

Learning Subjective Nouns using Extraction Pattern Bootstrapping - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: Pets / Animals

Learning Subjective Nouns using Extraction Pattern Bootstrapping. Ellen Riloff School of Computing University of Utah Janyce Wiebe , Theresa Wilson Computing Science University of Pittsburgh CoNLL-03. Introduction (1/2).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Learning Subjective Nouns using Extraction Pattern Bootstrapping

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Learning Subjective Nouns using Extraction Pattern Bootstrapping

Ellen Riloff

School of Computing University of Utah

Janyce Wiebe , Theresa Wilson

Computing Science University of Pittsburgh



  • Many Natural Language Processing applications can benefit from being able to distinguish between factual and subjective information .

    • Subjective remarks come in a variety of forms , including opinions , rants , allegations , accusations and speculation .

    • QA should distinguish between factual and speculative answers .

    • Multi-document summarization system need to summarize different opinions and perspectives .

    • Spam filtering systems must recognize rants and emotional tirades , among other things .

Introduction (2/2)

  • In this paper , we use Meta-Bootstrapping (Riloff and Jones 1999) , Basilisk (Thelen and Riloff 2002) algorithms to learn lists of subjective nouns :

    • Both bootstrapping algorithms automatically generated extraction patterns to identify words belonging to a semantic category .

    • We hypothesize that extraction patterns can also identify subjective words .

    • The Pattern “expressed <direct_object>” often extracts subjective nouns , such as “concern” , “hope” , “support” .

    • Both bootstrapping algorithm require only a handful of seed words and unannotated texts for training ; no annotated data is need at all .

Annotation Scheme

  • The goal of the annotation scheme is to identify and characterize expressions of private states in a sentence .

    • Private state is a general covering term for opinions , evaluations , emotions and speculations .

    • “ The time has come , gentleman , for Sharon , the assassin , to realize that injustice cannot last long” -> writer express a negative evaluation .

  • Annotator are also asked to judge the strength of each private state . A private state can have low , medium , high or extreme strength .

Corpus , Agreement Results

  • Our data consist of English-language versions of foreign news document from FBIS .

  • The annotated corpus used to train and test our subjective classifiers (the experiment corpus) consist of 109 documents with a total of 2197 sentences .

  • We use a separate , annotated tuning corpus to establish experiment parameters .

Extraction Pattern

  • In the last few years , two bootstrapping algorithms have been developed to create semantic dictionaries by exploiting extraction patterns .

  • Extraction patterns represent lexico-syntactic expression that typically rely on shallow parsing and syntactic role assignment .

    • “ <subject> was hired . ”

  • A bootstrapping process looks for words that appear in the same extraction patterns as the seeds and hypothesize that those words belong to the same semantic category .

Meta-Bootstrapping (1/2)

  • Meta-Bootstrapping process begins with a small set of seed words that represent a targeted semantic category (eg.” seashore ” is a location) and an unannotated corpus .

  • Step1 , MetaBoot automatically creates a set of extraction patterns for the corpus by applying syntactic templates .

  • Step2 , MetaBoot computes a score for each pattern based on the number of the seed words among its extractions .

    • The best pattern is saved and all of its extracted noun phrase are automatically labeled as the targeted semantic category .

Meta-Bootstrapping (2/2)

  • MetaBoot then re-scores the extraction patterns , using the original seed words plus the newly labeled words , and the process repeats . (Mutual Bootstrapping)

  • When the mutual bootstrapping process is finished , all nouns that were put into the semantic dictionary are re-evaluated.

    • Each noun is assigned a score based on how many different patterns extracted it .

    • Only the five best nouns are allowed to remain in the dictionary .

  • Mutual bootstrapping process starts over again using the revised semantic dictionary

Basilisk (1/2)

  • Step1 , Basilisk automatically creates a set of extraction patterns for the corpus and scores each pattern based on the number of seed words among its extraction .

    Basilisk Put the best patterns into a Pattern Pool .

  • Step2 , All nouns extracted by a pattern in the pattern pool are put into a Candidate Word Pool .

    • Basilisk scores each noun based on the set of patterns that extracted it and their collective association with the seed words .

  • Step3 , the top 10 nouns are labeled as the targeted semantic class and are added to dictionary .

Basilisk (2/2)

  • Then the bootstrapping process then repeats , using the original seed and the newly labeled words .

  • The major difference Basilisk and Meta-Bootstrapping :

    • Basilisk scores each noun based on collective information gathered from all patterns that extracted it .

    • Meta-Bootstrapping identify a single best pattern and assumes that everything it extracts belongs to the same semantic category .

  • In comparative experiment , Basilisk outperformed Meta-Bootstrapping .

Experimental Results (1/2)

  • We create the bootstrapping corpus , by gathering 950 new texts from FBIS and manually selected 20 high-frequency words as seed words .

  • We run each bootstrapping algorithm for 400 iterations , generating 5 word per iteration . Basilisk generates 2000 nouns and Meta-Bootstrapping generates 1996 nouns .

Experimental Results (2/2)

  • Next , we manually review 3996 words proposed by the algorithm and classify the words as StrongSubjective , Weak Subjective or Objective .

X - the number of words generated

Y - the percentage of those words

that were manually classified as


Subjective Classifier (1/3)

  • To evaluate the subjective nouns , we train a Naïve Bayes classifier using the nouns as features . We also incorporated previously established subjectivity clues , and added some new discourse features .

  • Subjective Noun Features :

    • We define four features BA-Strong , BA-weak , MB-Strong , MB-Weak to represent the sets of subjective nouns produced by bootstrapping algorithm .

    • We create a three-valued feature based on the presence of 0 , 1 , >=2 words from that set .

Subjective Classifier (2/3)

  • WBO Features :

    • Wiebe , Bruce and O’Hara (1999) , a machine learning system to classify subjective sentences .

  • Manual Features :

    • Levin 1993 ; Ballmer and Brennenstuhl 1981

    • Some fragment lemmas with frame element experiencer (Baker et al. 1998)

    • Adjectives manually annotated for polarity (Hatzivassiloglou and McKeown 1997 )

    • Some subjective clues list in (Wiebe 1990)

Subjective Classifier (3/3)

  • Discourse Features :

    • We use discourse feature to capture the density of clues in the text surrounding a sentence .

    • First , we compute the average number of subjective clues and objective clues per sentence .

    • Next , we characterize the number of subjective and objective clues in the previous and next sentence as :

      higher-than-expected (high) , lower-than-expected (low) , expected (medium) .

  • We also define a feature for sentence length .

Classification Result (1/3)

  • We evaluate each classifier using 25-fold cross validation on the experiment corpus and use paired t-test to measure significance at the 95% confidence level .

  • We compute Accuracy (Acc) as the percentage that match the gold-standard , and Precision (Prec) , Recall (Rec) with respect to subjective sentences .

    • Gold-standard : a sentence is subjective if it contains at least one private-state expression of medium or higher strength .

    • Objective class consist of everything else .

Classification Result (2/3)

  • We train a Naive Bays classifier using only the SubjNoun features . This classifier achieve good precision (77%) but only moderate recall (64%) .

  • We discover that the subjective nouns are good indicators when they appear , but not every subjective sentence contains a subjective noun .

Classification Result (3/3)

  • There is a synergy between these feature set : using both types of features achieves better performance than either one alone .

  • In Table 8 Row 1 , we use WBO + SubjNoun + manual + discourse feature . This classifier achieve 81.3% precision , 77.4% recall and 76.1% accuracy .


  • We demonstrate that weakly supervised bootstrapping techniques can learn subjective terms from unannotated texts.

  • Bootstrapping algorithms can learn not only general semantic category , but any category for which words appear in similar linguistic phrase .

  • The experiment suggest that reliable subjective classification require a broad array of features .

  • Login