Identifying comparative sentences in text documents
Download
1 / 22

Identifying Comparative Sentences in Text Documents - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

Identifying Comparative Sentences in Text Documents. Nitin Jindal and Bing Liu University of Illinois SIGIR 2006. Introduction. Comparisons are one of the most convincing ways of evaluation. Much of such info is available on the Web (customer reviews), forum discussions, and blogs.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Identifying Comparative Sentences in Text Documents' - abel-holloway


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Identifying comparative sentences in text documents

Identifying Comparative Sentences in Text Documents

Nitin Jindal and Bing Liu

University of Illinois

SIGIR 2006


Introduction
Introduction

  • Comparisons are one of the most convincing ways of evaluation.

  • Much of such info is available on the Web (customer reviews), forum discussions, and blogs.

  • Useful for product manufacturers and potential customers (to make purchasing decisions).


Comparisons vs opinions
Comparisons vs. Opinions

  • Comparisons can be both objective or subjective.

  • Comparative sentences have different language constructs from typical opinion sentences.

  • Comparative sentences may contain some indicators.

    Car X is much better than Car Y

    Car X is two feet longer than Car Y


Related work
Related Work

  • Linguistics: based on grammars (syntax and semantics) and logic (gradability), which is more for human consumption than for automatic identification.

  • Opinion tasks: opinion extraction and classification problem, which is quite different from this comparison identification.


Comparatives linguistic
Comparatives (Linguistic)

  • Comparatives are used to express explicit orderings between objects with respect to the degree or amount to which they possess some gradable property.

    John is taller than he was

    =>

    John is tall to degree d


Comparatives linguistic1
Comparatives (Linguistic)

  • Two broad types:

    • Metalinguistic Comparatives: compare properties of one entity.

      Ronaldo is angrier than upset.

    • Propositional Comparatives: compare between two propositions. Three subcategories:


Comparatives propositional
Comparatives (Propositional)

  • Nominal Comparatives: (two sets of entities)

    Paul ate more grapes than bananas.

  • Adjectival Comparatives: (than, as good as)

    Ford is cheaper than Volvo.

  • Adverbial Comparatives: (occur after a verb phrase)

    Tom ate more quickly than Jane.


Superlatives
Superlatives

  • Adjectival Superlatives:

    John is the tallest person.

  • Adverbial Superlatives:

    Jill did her homework most frequently.

  • Equality: conjunctions like and, or, …

    John and Sue, both like sushi.


Pos involved
POS involved

  • NN: Noun

  • NNP: Proper Noun

  • VBZ: Verb, present tense, 3rd person singular

  • JJ: Adjective

  • RB: Adverb

  • JJR Adjective, comparatives

  • JJS: Adjective, superlative

  • RBR: Adverb, comparative

  • RBS: Adverb, superlative


Limitations of linguistic classification
Limitations of linguistic classification.

  • Non-comparatives with comparative words: many non-comparatives contain comparative words.

    In the context of speed, faster means better.

    John has to try his best to win this game.

  • Limited coverage: many comparatives contain no comparative words.

    In market capital, Intel is way ahead of Amd.

    Nokia Samsung, both cell phones perform badly on heat dissipation index.

    The M7500 earned a World bench score of 85, whereas Asus A3V posted a mark of 89.


Enhancements
Enhancements

  • First limitation: machine learning methods to distinguish comparatives and non-comparatives.

  • Second limitation:

    • User preferences:

      I prefer Intel to Amd = Intel is better than Amd

    • Implicit comparatives:

      Camera X has 2 MP, whereas camera Y has 5 MP.


Types of comparatives
Types of Comparatives

  • Non-Equal Gradable: greater or less than type, including user preferences.

  • Equative (Gradable): equal to type

  • Superlative (Gradable): greater of less than all others type

  • Non-Gradable:

    • A is similar to B; A has feature F1 while B has F2; A has feature F but B doesn’t


Tasks
Tasks

  • Identifying comparative sentences from a given text data set.

  • Extracting comparative relations from sentences. (Mining comparative sentences and relations, AAAI 2006)


Class sequential rules with multiple minimum supports
Class Sequential Rules with Multiple Minimum Supports

  • For sequential pattern mining, patterns to the left and class to the right.

  • Select patterns: keywords – POS (JJR, RBR, JJS, RBS) + Words (favor, prefer, win beat, but…) + Phrases (number one, up against)

  • The performance of only using keywords are P=32%, R=94%.


Support and confidence
Support and Confidence

  • Using the minimum support of 20% and minimum confidence of 40%, one of the discovered CSRs is:


Building the sequence db
Building the Sequence DB

this/DT camera/NN has/VBZ significantly/RB more/JJR noise/NN at/IN iso/NN 100/CD than/IN the/DT nikon/NN 4500/CD

{NN}{VBZ}{RB}{moreJJR}{NN}{IN}{NN} -> comparative

  • Sequences which exceeds 60% confidence threshold become rules. Minimum support = 10%.

  • 13 Manual rules with conjunctions as whereas/IN, but/CC, however/RB, while/IN, though/IN, although/IN, etc..


Classification learning
Classification Learning

  • Machine learning methods:

    Feature Set =

    {X | X is the sequential pattern in CSR X → y} ∪

    {Z | Z is the pattern in a manual rule Z → y}


Data preparation
Data Preparation

  • Consumer reviews on products such as digital cameras, DBD players, MP3 players and cellular phones.

  • Forum discussions on topics such as Intel vs. AMD, Coke vs. Pepsi, and Microsoft vs. Google.

  • News articles on topics such as automobiles, ipods, and soccer vs. football.




Experimental results 2
Experimental Results (2)

  • Review: R low P high -> short sentences, hard to find patterns

  • Articles and Forums: R high P low -> long sentences and find patterns too easily or find too many patterns.


Conclusion and future work
Conclusion and Future Work

  • Identifying comparative sentences.

  • Analyzing different types of comparative sentences.

  • Studying how to automatically classify subjective and objective comparisons.


ad