1 / 27

Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling

Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang. Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling. Outline. Introduction Problem Define Entity Discovery Entity Assignment Opinion Mining Empirical Evaluation

meir
Download Presentation

Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Entity Discovery and Assignment for Opinion Mining Applications(ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling

  2. Outline • Introduction • Problem Define • Entity Discovery • Entity Assignment • Opinion Mining • Empirical Evaluation • Conclusion & Future Work

  3. Introduction • Most opinion mining researches are based on product reviews because a review usually focuses on a specificproduct or entity and contains little irrelevant information. • However, in forum discussions and blogs, the situation is verydifferent, where the authors often talk about multiple entities (e.g.,products), and compare them.

  4. Introduction*This raises two important issues: • (1) how to discover the entities that are talked about in a sentence • the named entity recognition (NER) problem • Over-capitalization • Under-capitalization

  5. (2) how to assign entities to each sentence because in many sentences entity names are not explicitly mentioned, but are implied. • similar to pronoun resolution in NLP • harder due to ungrammatical sentences, and missing or wrong punctuations.

  6. Example 1: “(1) I bought Camera-A yesterday. (2) I took some pictures in the evening in my living room. (3) The images are very clear. (4) They are definitely better than those from my old Camera-B. (5) The battery is very good too.” • Example 2: “(1) (2) (3) (4) (5) The pictures of that camera were blurring for night shots, but for day shots it was ok”

  7. sentiment consistency : which says that consecutive sentiment expressions should be consistent with each other. • It would be ambiguous if this consistency is not observed in writing.

  8. Opinion Mining: • Two tasks are necessary: • (1) for a comparative sentence, we need to identify which entity is superior • (2) for the subsequent sentence, we need to determine whether its first clause (sentence 5 of Example 2) is positive, negative, or neutral.

  9. Problem Definition • Thread: consists of a start post and a list of follow-up posts or replies. • : A thread thus can be modeled as a sequence of posts • is the start post. • : Each post consists of a sequence of sentences • : Each sentence describes something on a subset of entities

  10. Problem statement: Given a set of threads T in a particular domain, two tasks are performed in this paper: • 1. Entity discovery: discover the set of entities E discussed in the posts of the threads • 2. Entity assignment: assign the entities in E that each sentence of each post in talks about.

  11. Entity Discovery • Step 1 – data preparation for sequential pattern mining • <{JJ, mad}{NN, everyone}{NN, doesnt}{VBP, have}{DT,a}{CD, ENTITYXYZ}{NN, phone}{NN, fetish} {JJ, ducky}> • Step 2 – Sequential pattern mining • <{IN}, {DT}, {NNP, ENTITYXYZ }, {is}> • Step 3 – Pattern matching to extract candidate entities • a/DT Nokia/NNP 7390/CD at/IN • <{DT}, {NNP, ENTITYXYZ}, {CD}>  Nokia • <{DT}, {NNP}, {CD, ENTITYXYZ}, {IN}> 7390

  12. Step 4 – Candidate pruning • … with/IN all/PDT the/DT Sony/NNP Ericsson/NNP walkman/NN phone/NN accessories/CDNNS (pruning) • Step 5 – Pruning using brand and model relation and syntactic patterns • discover relationships from the entities • Nokia, 7390  Nokia: brand 7390: Model • remove those entities discover in step 4 that never appear together with a <Band> or a <Model>, or never appear with a candidate in the syntactic patterns.

  13. Entity Assignment* Comparatives and Superlatives • Comparative Sentences • Non-equal gradable: “greater or less” • Equative: “equal to” • Non-gradable: compare two or more entities • Superlative Sentences: –est

  14. Entity Assignment *Sentiment Consistency • If he/she wants to introduce a new entity e, he/she has to state the name of the entity explicitly in a sentence , which can be • (1) a normal, • : normal , : normal  e • : normal , : comparative  e & new entity

  15. (2) is a comparative • : normal • non-equal gradable :positive (respectively negative) sentiment the superior (or inferior) entity • equative  the previous entity before . • non-gradable  the previous entity before . • : comparative  the entities in • (3) is a superlative sentence. • : normal  the superlative entity in • : comparative  the entities in

  16. Opinion Mining • Opinion Indicators • Opinion words and phrases • opinion lexicon • orientations depend on contexts • Negations • “not” without “not only…but also” • But-clauses • The orientation before “but” is opposite to that after “but”. • not contain “but also”

  17. *Specification for Opinion Indicators • we propose a specification language to enable the user to specify indicators, which are • (1)opinion words and phrases, • (2) negation words and phrases, • (3)but-like words and phrases, • (4) non-opinion phrases involving sentiment words, • a good deal of • (5) non-negation phrases involving negation words, • not only • (6) non-but phrases involving but-like words. • but also

  18. *Two Type of Specification • Specification of Individual Words • ex: like [VB] => Po

  19. Specification for Phrases “great => Po” “a great[T] + deal + of => NEU”

  20. Opinion Mining • Step 1 – Part-of-speech tagging • Step 2 – Applying indicator word rules • The picture quality is not[Ng]good[Po], reaction is too slow[Neu],but[But]the battery life is long[Neu]. • Step 3 - Applying phrase rules • The picture quality is not[Ng] good[Po], reaction is too slow[NE], but[But] the battery life is long[Neu].

  21. Step 4 - Handling negations • The picture quality is not[Ng] good[Negative], reaction is too slow[NE], but[But] the battery life is long[Neu]. • Step 5 - Aggregating opinions • Opinion aggregation : • postive:1 negative: -1 • sum up >0:postive, =0: neutral, <0: nagative

  22. Opinion Mining of Comparisons • more/most + Pos → Positive • more/most + Neg → Negative • less/least + Pos → Negative • less/least + Neg → Positive • Non-standard words • “In term of battery life, Camera-X is superior to Camera-Y” depend on the meaning • Identify comparative and superlative sentences • Discover superior entities

  23. Empirical Evluation

  24. Experimental Results • Entity Discovery NET: Named Entity Tagger CRF: Conditional Random Fields Method

  25. Entity Assignment

  26. Conclusion • This paper presented two problem: mining entities discussed in a set of posts and assigning entities to each sentence. • Our experimental results show that the proposed techniques are effective.

More Related