1 / 28

Aspect and Sentiment Unification Model for Online Review Analysis

Aspect and Sentiment Unification Model for Online Review Analysis. 2013.8.6 由 何博伟 分享. Aspect and Sentiment Unification Model for Online Review Analysis. 研究的问题 术语 / 背景知识 相关科研背景 Aspect discovery Domain adaptation of sentiment words Unified models of topic and sentiment 本文作者建立的模型

morrison
Download Presentation

Aspect and Sentiment Unification Model for Online Review Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aspect and Sentiment Unification Modelfor Online Review Analysis 2013.8.6 由何博伟分享

  2. Aspect and Sentiment Unification Modelfor Online Review Analysis • 研究的问题 • 术语/背景知识 • 相关科研背景 • Aspect discovery • Domain adaptation of sentiment words • Unified models of topic and sentiment • 本文作者建立的模型 • Sentence-LDA • Aspect and Sentiment Unification Model • 实验及结果 • 总结

  3. 论文做了什么 • In this paper, we tackle the problem of automatically discovering what aspects are evaluated in reviews and how sentiments for different aspects are expressed. • 自动发现一份在线用户评论中评价了哪方面的内容,以及不同方面的情感如何表达的问题。  or  ?

  4. 遇到的问题 • 大多数在线的用户留言都是非结构化的(不是一种填写表单的形式) 所以要将有用的东西挖出来 • 如何找到用户评价的是哪方面(aspect)的东西? • 对于不同方面的意见、情感是如何表达的? A user looking to buy a digital camera may want to know what a review says about the photoquality, brightnessof lens, and shutter speed of a Panasonic Lumix, not just whether the review recommends the camera. 手机待机时间长 笔记本电脑屏幕反光 餐馆服务细心周到

  5. 术语 • Topic/主题 A multinomial distribution over words that represents a coherent concept in text. • Aspect/评价对象 A multinomial distribution over words that represents a more specific topic in reviews, for example, “lens” in camera reviews. • Senti-aspect A multinomial distribution over words that represents a pair of aspect and sentiment, for example, “screen, positive" in a laptop review.

  6. 其他人的研究之Aspect discovery • 提取一组经常出现的名词短语(Noun phrases)作为观点的候选项,然后使用特定的过滤方法保留中肯(relevant)的一些。 缺点:过程复杂,易于出错 • 采用主题模型(Topic model),比如LDA(latent Dirichlet allocation)模型。 缺点:没有考虑到句子之间的关系,因而忽略了在不同句子中,相同方面(的观点)可能会有非常不一样的词汇用法这一事实。另外还需要考虑粒度的问题。 Our models have a simpler and more intuitive gen- erative process to discover evaluative aspects in reviews.

  7. 背景知识主题模型(Topic Model) • 主题模型是一种统计模型,用来挖掘一堆文档中出现的抽象的“主题”。 • 直观上地讲,给定一个关于具体主题的文档,我们期望某些特定的词出现在文档中的频率比较频繁:“狗”和“骨头”会更频繁地出现在关于狗狗的文章中,“猫”和“喵”较多地出现于关于猫的文档中,与此同时“的”和“是”在这两种文档中出现的概率是差不多的。 • 一篇文档通常是由多个主题构成的,每个主题占一部分比例。比如,可以有某篇文章90%讲汪星人,10%讲喵星人,那么这篇文章讲汪星人的词汇大约是喵星人词汇的9倍。

  8. 背景知识狄利克雷分配(Dirichlet Allocation) 一篇文档可以由主题的分布来构成,就是说,包含百分之多少的这个主题,百分之多少的那个主题。不同的文章,包含的主题成分不同,这个分布是不同的的,有些文章这个分布可能是相同、相似的。把所有文章都统计一遍,有多少是属于这个分布的,多少是属于那个分布的,统计出来一个新的分布,就是狄利克雷分配。 分布的分布,概率的概率。

  9. 背景知识Latent DirichletAllocation • 就是说上面那个狄利克雷分配不是显式的,是隐式的,要算出来的。在主题模型中,利用LDA的思想,就可以这样描述某一文本中单词的“发生方式”。 • 也就是说,要根据概率模型,“写”一篇文章出来! Choose parameter θ ~ p(θ); //按照狄利克雷分配,选出一个主题分布 For each of the N words w_n: //设文档中有N个单词,那么对每个单词做如下操作 Choose a topic z_n~ p(z|θ); //按照主题分部,选择一个主题 Choose a word w_n~ p(w|z); //按照这个主题,选择一个单词

  10. 其他人的研究之Domain adaptation of sentiment words • Domain-to-domain adaptation • aims to obtain sentiment words in one domain by utilizing a set of known sentiment words in another domain. • General-to-domain adaptation • takes a set of known generalsentiment words and learns domain-specific sentiment words ASUM从一组通用的情感词汇开始,寻找与指定评价对象(aspect)相关的情感词汇。

  11. 其他人的研究之Unified models of topic & sentiment Topic Sentiment Mixture (TSM) 模型: TSM模型将情感表现为一种语言模型,与aspect区分开,每个单词从主题(topic)或是情感(sentiment)中而来。 这样的区分不能解释一个主题与一个情感间的紧密联系。 对于ASUM来说,与之相反地,一组<z主题, 情感>作为一个语言模型单元呈现,在这种模型中,一个单词更倾向于同时与话题和情感相关联。

  12. 其他人的研究之Unified models of topic & sentiment • Multi-Aspect Sentiment (MAS) 模型: MAS模型与其他模型不同的一点在于,它主要关注将话题建模,使之与一组预先定义的aspect对应,这些aspects已经明确地由用户在评论中评价了。 该模型是建立在已知产品属性的基础上的, 比如对相机来说,有镜头,闪光灯,屏幕…这些属性 ASUM并不需要任何用户评价的训练集,这些训练集通常获取代价高昂。

  13. 其他人的研究之Unified models of topic & sentiment • Joint Sentiment/Topic (JST) 模型: JST与本论文中的模型最类似。在一个语言模型单元里,情感被集成于一个话题中。 JST不限制individual words, JST与ASUM不同之处在于individual words可能来源于不同的语言模型。 与此相反,ASUM约束,一个单句中的单词来源于相同的语言模型,因此推测出的每一个语言模型更注重于在文档局部范围内的共同出现

  14. 建模Sentence-LDA • 在LDA模型中,每个单词所处的位置通常被忽略,但这样做并不妥当. • “Words about an aspect tend to co-occur within close proximity to one another.” n. 亲近,接近;[数] 邻近 SLDA建模的基本约束:一个单句(sentence)中的所有单词(word)都是来源于一个主题(topic)的。 并不是绝对正确的,但是实践中这样的约束表现的很好。 Our goal is to discover topics that match the aspects discussed in reviews.

  15. 建模Sentence-LDA

  16. 建模Aspect and Sentiment Unification Model • ASUMis an extension of SLDA that incorporates both aspect and sentiment. ASUM模型的生成过程类似下列情况: 某用户决定对某餐厅写个评论表达自己的情感分布,其中70%满意,30%不满。然后他决定对于正面评价,50%写服务情况,25%写食品好不好吃,剩下25%说价格。然后他决定,每个单句里面表达两件事:在评价什么东西,感觉如何(比如,他觉得服务态度很友善)。

  17. 建模Aspect and Sentiment Unification Model

  18. 建模Aspect and Sentiment Unification Model The approximate probability of sentiment j in review d is The approximate probability of aspect k for sentiment j in review d is The approximate probability of word w in senti-aspect {k,j} is

  19. 实验及结果 • 实验分为4项内容: • 使用SLDA来找出aspects • 利用ASUM找出senti-aspects • 利用ASUM来估计出情感词汇(Aspect-specific sentiment words) • 测试ASUM情感分类的性能 • 实验数据来源: • Amazon网站上关于电子产品的用户评论(后面命名为ELECTRONICS) • Yelp餐厅的评论(后面命名为RESTAURANTS)

  20. 实验及结果Aspect Discovery • The first experiment is to automatically discover aspects in reviews using SLDA. 三个评价标准: 挖掘出来的aspects应当是连贯一致的 这些aspects应当精准到能捕捉用户评价的细节 这些aspects应当是用户评论中讨论地最多的

  21. 实验及结果Aspect Discovery Discovered aspects regarding cameras for SLDA and LDA. Accordingly, the aspects discovered by SLDA tend to account for the local positions of the words, which is an appropriate property for our goal. In contrast, LDA has a broader view that an aspect can be composed of any words in a review regardless of intra-sential word co-occurrences.

  22. 实验及结果Senti-Aspect Discovery • Our second experiment is to discover senti-aspects, aspects coupled with a sentiment (positive or negative). ASUM sentiment seed words

  23. 实验及结果Senti-Aspect Discovery

  24. 实验及结果Aspect-Specific Sentiment Words • “We introduce a simple method for employing the result of ASUM to automatically distinguish between positive and negative sentiment words for the same aspect. This increases the utility of ASUM by providing an organized result that shows why people express sentiment toward an aspect and what words they use.” • 计算表达不同情感的每个senti-aspects对的余弦相似度 • 如果相似度接近一个阈值,那么这两个senti-aspects被认为表达了同一个方向的东西(认为是一类)。 • 如果一个单词对于两个不同类的senti-aspects都较高的几率,那么这个单词就是一个common word(不带感情的) • 如果一个单词只在一个senti-aspect上有较高几率,那么这个单词就是一个特定aspect上的情感单词(表达感情的)

  25. 实验及结果Aspect-Specific Sentiment Words

  26. 实验及结果Aspect-Specific Sentiment Words • To determine the sentiment of a review(positive/negative), we use π , the probabilistic sentiment distribution in a review, such that a review is set to be positiveifpositive sentiment has the equal or a higher probability than negative sentiment, and set to be negative otherwise.

  27. 实验及结果Aspect-Specific Sentiment Words

  28. 结语 • In the quantitative evaluation of sentiment classification, ASUM outperformed other generative models and came close to supervised classification methods. “For future work, our models may be utilized for aspect-based review summarization. We can apply the models to other types of data such as editorials and art critiques, or use different seed words to capture different dimensions than sentiment.”

More Related