learning subjective nouns using extraction pattern bootstrapping
Download
Skip this Video
Download Presentation
Learning Subjective Nouns using Extraction Pattern Bootstrapping

Loading in 2 Seconds...

play fullscreen
1 / 19

Learning Subjective Nouns using Extraction Pattern Bootstrapping - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

Learning Subjective Nouns using Extraction Pattern Bootstrapping. Ellen Riloff School of Computing University of Utah Janyce Wiebe , Theresa Wilson Computing Science University of Pittsburgh CoNLL-03. Introduction (1/2).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Learning Subjective Nouns using Extraction Pattern Bootstrapping' - lane-rollins


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning subjective nouns using extraction pattern bootstrapping

Learning Subjective Nouns using Extraction Pattern Bootstrapping

Ellen Riloff

School of Computing University of Utah

Janyce Wiebe , Theresa Wilson

Computing Science University of Pittsburgh

CoNLL-03

introduction 1 2
Introduction(1/2)
  • Many Natural Language Processing applications can benefit from being able to distinguish between factual and subjective information .
    • Subjective remarks come in a variety of forms , including opinions , rants , allegations , accusations and speculation .
    • QA should distinguish between factual and speculative answers .
    • Multi-document summarization system need to summarize different opinions and perspectives .
    • Spam filtering systems must recognize rants and emotional tirades , among other things .
introduction 2 2
Introduction (2/2)
  • In this paper , we use Meta-Bootstrapping (Riloff and Jones 1999) , Basilisk (Thelen and Riloff 2002) algorithms to learn lists of subjective nouns :
    • Both bootstrapping algorithms automatically generated extraction patterns to identify words belonging to a semantic category .
    • We hypothesize that extraction patterns can also identify subjective words .
    • The Pattern “expressed <direct_object>” often extracts subjective nouns , such as “concern” , “hope” , “support” .
    • Both bootstrapping algorithm require only a handful of seed words and unannotated texts for training ; no annotated data is need at all .
annotation scheme
Annotation Scheme
  • The goal of the annotation scheme is to identify and characterize expressions of private states in a sentence .
    • Private state is a general covering term for opinions , evaluations , emotions and speculations .
    • “ The time has come , gentleman , for Sharon , the assassin , to realize that injustice cannot last long” -> writer express a negative evaluation .
  • Annotator are also asked to judge the strength of each private state . A private state can have low , medium , high or extreme strength .
corpus agreement results
Corpus , Agreement Results
  • Our data consist of English-language versions of foreign news document from FBIS .
  • The annotated corpus used to train and test our subjective classifiers (the experiment corpus) consist of 109 documents with a total of 2197 sentences .
  • We use a separate , annotated tuning corpus to establish experiment parameters .
extraction pattern
Extraction Pattern
  • In the last few years , two bootstrapping algorithms have been developed to create semantic dictionaries by exploiting extraction patterns .
  • Extraction patterns represent lexico-syntactic expression that typically rely on shallow parsing and syntactic role assignment .
    • “ <subject> was hired . ”
  • A bootstrapping process looks for words that appear in the same extraction patterns as the seeds and hypothesize that those words belong to the same semantic category .
meta bootstrapping 1 2
Meta-Bootstrapping (1/2)
  • Meta-Bootstrapping process begins with a small set of seed words that represent a targeted semantic category (eg.” seashore ” is a location) and an unannotated corpus .
  • Step1 , MetaBoot automatically creates a set of extraction patterns for the corpus by applying syntactic templates .
  • Step2 , MetaBoot computes a score for each pattern based on the number of the seed words among its extractions .
    • The best pattern is saved and all of its extracted noun phrase are automatically labeled as the targeted semantic category .
meta bootstrapping 2 2
Meta-Bootstrapping (2/2)
  • MetaBoot then re-scores the extraction patterns , using the original seed words plus the newly labeled words , and the process repeats . (Mutual Bootstrapping)
  • When the mutual bootstrapping process is finished , all nouns that were put into the semantic dictionary are re-evaluated.
    • Each noun is assigned a score based on how many different patterns extracted it .
    • Only the five best nouns are allowed to remain in the dictionary .
  • Mutual bootstrapping process starts over again using the revised semantic dictionary
basilisk 1 2
Basilisk (1/2)
  • Step1 , Basilisk automatically creates a set of extraction patterns for the corpus and scores each pattern based on the number of seed words among its extraction .

Basilisk Put the best patterns into a Pattern Pool .

  • Step2 , All nouns extracted by a pattern in the pattern pool are put into a Candidate Word Pool .
    • Basilisk scores each noun based on the set of patterns that extracted it and their collective association with the seed words .
  • Step3 , the top 10 nouns are labeled as the targeted semantic class and are added to dictionary .
basilisk 2 2
Basilisk (2/2)
  • Then the bootstrapping process then repeats , using the original seed and the newly labeled words .
  • The major difference Basilisk and Meta-Bootstrapping :
    • Basilisk scores each noun based on collective information gathered from all patterns that extracted it .
    • Meta-Bootstrapping identify a single best pattern and assumes that everything it extracts belongs to the same semantic category .
  • In comparative experiment , Basilisk outperformed Meta-Bootstrapping .
experimental results 1 2
Experimental Results (1/2)
  • We create the bootstrapping corpus , by gathering 950 new texts from FBIS and manually selected 20 high-frequency words as seed words .
  • We run each bootstrapping algorithm for 400 iterations , generating 5 word per iteration . Basilisk generates 2000 nouns and Meta-Bootstrapping generates 1996 nouns .
experimental results 2 2
Experimental Results (2/2)
  • Next , we manually review 3996 words proposed by the algorithm and classify the words as StrongSubjective , Weak Subjective or Objective .

X - the number of words generated

Y - the percentage of those words

that were manually classified as

subjective

subjective classifier 1 3
Subjective Classifier (1/3)
  • To evaluate the subjective nouns , we train a Naïve Bayes classifier using the nouns as features . We also incorporated previously established subjectivity clues , and added some new discourse features .
  • Subjective Noun Features :
    • We define four features BA-Strong , BA-weak , MB-Strong , MB-Weak to represent the sets of subjective nouns produced by bootstrapping algorithm .
    • We create a three-valued feature based on the presence of 0 , 1 , >=2 words from that set .
subjective classifier 2 3
Subjective Classifier (2/3)
  • WBO Features :
    • Wiebe , Bruce and O’Hara (1999) , a machine learning system to classify subjective sentences .
  • Manual Features :
    • Levin 1993 ; Ballmer and Brennenstuhl 1981
    • Some fragment lemmas with frame element experiencer (Baker et al. 1998)
    • Adjectives manually annotated for polarity (Hatzivassiloglou and McKeown 1997 )
    • Some subjective clues list in (Wiebe 1990)
subjective classifier 3 3
Subjective Classifier (3/3)
  • Discourse Features :
    • We use discourse feature to capture the density of clues in the text surrounding a sentence .
    • First , we compute the average number of subjective clues and objective clues per sentence .
    • Next , we characterize the number of subjective and objective clues in the previous and next sentence as :

higher-than-expected (high) , lower-than-expected (low) , expected (medium) .

  • We also define a feature for sentence length .
classification result 1 3
Classification Result (1/3)
  • We evaluate each classifier using 25-fold cross validation on the experiment corpus and use paired t-test to measure significance at the 95% confidence level .
  • We compute Accuracy (Acc) as the percentage that match the gold-standard , and Precision (Prec) , Recall (Rec) with respect to subjective sentences .
    • Gold-standard : a sentence is subjective if it contains at least one private-state expression of medium or higher strength .
    • Objective class consist of everything else .
classification result 2 3
Classification Result (2/3)
  • We train a Naive Bays classifier using only the SubjNoun features . This classifier achieve good precision (77%) but only moderate recall (64%) .
  • We discover that the subjective nouns are good indicators when they appear , but not every subjective sentence contains a subjective noun .
classification result 3 3
Classification Result (3/3)
  • There is a synergy between these feature set : using both types of features achieves better performance than either one alone .
  • In Table 8 Row 1 , we use WBO + SubjNoun + manual + discourse feature . This classifier achieve 81.3% precision , 77.4% recall and 76.1% accuracy .
conclusion
Conclusion
  • We demonstrate that weakly supervised bootstrapping techniques can learn subjective terms from unannotated texts.
  • Bootstrapping algorithms can learn not only general semantic category , but any category for which words appear in similar linguistic phrase .
  • The experiment suggest that reliable subjective classification require a broad array of features .
ad