1 / 32

Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus

Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus. † Institute for Advanced Computer Studies and CLIP lab ‡ Human-Computer Interaction Lab Department of Computer Science, University of Maryland . *Human Language Technology Center of Excellence. .

carlow
Download Presentation

Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating High-Coverage Semantic Orientation LexiconsFrom Overtly Marked Words and a Thesaurus †Institute for Advanced Computer Studies and CLIP lab‡Human-Computer Interaction LabDepartment of Computer Science, University of Maryland. *Human Language Technology Center of Excellence. Saif Mohammad†, Cody Dunne‡, and Bonnie Dorr†∗

  2. Evaluative sentences Sony’s new digital camera is fabulous. The characters in the movie are flawed. Creative solutions are valued. Singapore has an immaculate transportation system. Our waters have never been more contaminated. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  3. Evaluative sentences Sony’s new digital camera is fabulous. The characters in the movie are flawed. Creativesolutionsare valued. Singapore has an immaculate transportation system. Our waters have never been more contaminated. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  4. Semantic orientation • Positive semantic orientation (SO) (or polarity) • Term is often used to convey favorable sentiment or evaluation of the target. • E.g.: excellent,happy,honest, … • Negative semantic orientation • Term is often used to convey unfavorable sentiment or evaluation of the target. • E.g.: poor,sad,dishonest, … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  5. Applications • Automatic product recommendation systems (Tatemura, 2000; Terveen1 et al., 1997) • Question answering (Somasundaran et al., 2007; Lita et al., 2005) • Summarizing multiple view points and opinions (Seki et al., 2004; Mohammad et al., 2008a) • Identifying flames(Spertus, 1997) • Appropriate ad placement(Jin et al. 2007) Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  6. Manually created lexicons • General Inquirer (GI) (Stone et al., 1966) • http://www.wjh.harvard.edu/inquirer • has labels for only about 3,600 entries • Pittsburgh subjectivity lexicon (PSL) (Wilson et al., 2005) • http://www.cs.pitt.edu/mpqa • draws from the General Inquirer and other sources • has labels for only for about 8,000 words. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  7. Automatically created lexicons • Hatzivassiloglou and McKeown (1997) • a supervised algorithm to determine the semantic orientation of adjectives. • Turney and Littman lexicon (TLL) (2003) • Exploit tendency to co-occur with a seed set • Need very large corpora (100 billion words) • Esuli and Sebastiani (2006) — SentiWordNet (SWN) • Attach labels to WordNet synsets • Use supervised classifiers • Need significant manual annotation Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  8. Semantic oppositeness scale many antonym pairs have opposite semantic orientation (one positive, one negative) good–bad;beautiful–ugly;honest–dishonest not antonymous antonymous big–large big–small Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  9. Detecting word-pair antonymy:Mohammad, Dorr, Hirst (2008) • Use affix patterns to identify seed pairs of strong antonyms. • Use a Roget-like thesaurus to identify near-synonyms of seed words. • Mark pairs of words near-synonymous to seed pairs as contrasting. • The degree of antonymy is proportional to their tendency to co-occur. • Created a list of more than 3 million strongly antonymous word pairs. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  10. Our approach • Identify a seed set of positive and negative words: • From edicts of marking theory • Identify their synonyms: • Use a Roget-like thesaurus • Mark as negative: • words synonymous with a negative seed • Mark as positive: • words synonymous to a positive seed Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  11. Step 1: Identify seed words • From marking theory: • Overtly marked words tend to be negative. • E.g., undo,unhappy, dishonest, immobile • Their unmarked counterparts tend to be positive. • E.g., do, happy, honest, mobile • Exceptions exist: • impartial—partial, unbiased—biased, unstuck—stuck Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  12. Affix patterns Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  13. Step 2: Identify synonyms of seed words Take synonyms from a Roget-like thesaurus • We used the Macquarie Thesaurus • Has 98,000 word-types Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  14. Thesaurus categories • All words classified into ~1000 categories be beautiful beings belief better big blood body breath calm care for careful cause certain change ability absence accept accompanied action affect affirm agree allow approach ask assemble attack attitude awareness choice clean clear collect colors comfort concern conflict connect continue control convex correct count courtesy … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  15. Example category entry 369HONESTY noun paragraph honesty incorruptness integrity probity sincerity … adj. paragraphhonest above board authentic bona fide legit … noun paragraph bona fides reliability soundness trueness trustiness … adj. paragraphreliable sound steadfast trustworthy trusty … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  16. Step 2: Identify synonyms of seed words 369HONESTY noun paragraph honesty incorruptness integrity probity sincerity … adj. paragraphhonest above board authentic bona fide legit … Words in each paragraph are near-synonyms. noun paragraph bona fides reliability soundness trueness trustiness … adj. paragraphreliable sound steadfast trustworthy trusty … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  17. Step 3: Mark as positive synonyms of positive seeds 369HONESTY Seed pair: honest — dishonest (positive) (negative) noun paragraph honesty incorruptness integrity probity sincerity … adj. paragraphhonest above board authentic bona fide legit … + + + + + Seed pair: reliable — unreliable (positive) (negative) noun paragraph bona fides reliability soundness trueness trustiness … adj. paragraphreliable sound steadfast trustworthy trusty … + + + + + Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  18. Step 4: Mark as negative synonyms of negative seeds 370DISHONESTY Seed pair: honest — dishonest (positive) (negative) noun paragraph crookedness dishonesty fraudulence improbity trickery … adj. paragraphcrooked dishonest knavish shady unjust … - - - - - … … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  19. Majority voting • All words in a paragraph assigned identical orientation. • If multiple seeds in the same paragraph: • simple voting determines orientation. Seed pairs: honesty — dishonesty (positive) (negative) 369HONESTY noun paragraph honesty incorruptness integrity probity sincerity … + corruptness—incorruptness (positive) (negative) - probity …— improbity (positive) (negative) + + sincerity.. — insincerity (positive) (negative) Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  20. Majority voting • All words in a paragraph have identical orientation. • If multiple seeds in the same paragraph: • simple voting determines orientation. 369HONESTY noun paragraph honesty incorruptness integrity probity sincerity … Positive orientation has majority, so all words in the paragraph are marked positive. + + + + + Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  21. Sense and word lexicons • Macquarie Semantic Orientation Lexicon (MSOL) • Assigns orientation to word—category combinations • Categories are coarse word senses • Most natural language text is not sense disambiguated • We create word lexicons from MSOL and SentiWordNet • By choosing for each word the orientation most common amongst its senses Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  22. Size of lexicons • SentiWordNet (SWN) • 56,200 entries (85.1 • sitive and 14.9% negative) • Affix seeds lexicon (ASL) • 5,031 entries (47.3% positive and 52.7% negative) • MSOL(ASL) • 51,157 entries (66.8% positive and 33.2% negative) • 3,643 multi-word expressions • MSOL(ASL and GI) • Uses both affix pairs and GI entries as seeds • 76,400 entries (39.9% positive and 60.1% negative) • Available for download:http://www.umiacs.umd.edu/~saif/WebPages/ResearchInterests.html#SemanticOrientation Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  23. Intrinsic evaluation:The percentage of GI entries that match those of the automatically generated lexicons. F-score Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  24. Extrinsic evaluation • Gold standard of phrases manually annotated with semantic orientation: • MPQA corpus (version 1.1) • positive phrases (1726) and negative phrases (4485) • A simple algorithm to determine the polarity of a phrase: • If target phrase has a negative word, then the phrase is marked negative. • If target phrase has no negative word and has at least one positive word, then it is marked positive. • Otherwise, the classifier refrains from assigning a tag. Even better accuracies: supervised classifiers and more sophisticated context features (Choi and Cardie, 2008). Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  25. Extrinsic evaluation:Performance of phrase polarity tagging.No semantic-orientation labeled data used. F-score Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  26. Extrinsic evaluation:Performance of phrase polarity tagging.Using GI labels. F-score Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  27. Orientation of thesaurus categories Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. Red: negative; Blue: positive; Size of node: intensity; Edge: oppositeness

  28. Polyanna Hypothesis People use positive expressions more frequently than negative expressions. (Boucher and Osgood, 1969; Kelly, 2000) Percentage of entries 5031 entries Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  29. Polyanna Hypothesis People use positive expressions more frequently than negative expressions. (Boucher and Osgood, 1969; Kelly, 2000) Percentage of entries 5031 entries 51157 entries Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  30. Summary • Created a high-coverage semantic orientation lexicon: • using only affix rules and a Roget-like thesaurus. • no manually annotated semantic orientation labels required. • The lexicon: • has about twenty times the number of entries in GI. • has entries for both single-words and common multi-word expressions. • more useful in phrase-polarity annotation than SentiWordNet, GI, or the Turney lexicon. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  31. Future work • Creating even better semantic orientation lexicons by combining: • our approach (affix rules and thesaurus) • with the Turney–Littman 2003 method (co-occurrence statistics). • Create orientation lexicons for resource-poor languages. • use a bilingual dictionary • use English thesaurus • use affix rules from both (multiple) languages. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

  32. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.

More Related