Opinion Mining : A Multifaceted Problem . Lei Zhang University of Illinois at Chicago . Some slides are based on Prof. Bing Liu’s presentation. Introduction .
Opinion Mining : A Multifaceted Problem
University of Illinois at Chicago
Some slides are based on Prof. Bing Liu’s presentation
Computational study of opinions, sentiments expressed in text
In the past.
People can express their opinions in reviews, forum discussions, blogs…
But this problem is NOTeasy.
“I bought an iPhone a few days ago. Itwas such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …”
Opinions, targets of opinions, and opinion holders
(ej, fjk, soijkl, hi, tl),
Each document focuses on a single entity and contains opinions from a single opinion holder.
“I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …”
Feature based summary:
Feature1: Touch screen
Feature2: battery life
Cell Phone 1
Cell Phone 2
“This past Saturday, I bought a Nokia phone and my girlfriend bought a Moto phone with Bluetooth. We called each other when we got home. The voice on my phone was not so clear, worse than my previous phone. The battery life was long. My girlfriend was quite happy with her phone. I wanted a phone with good sound quality.So my purchase was a real disappointment. I returned the phone yesterday.”
Nokia and Moto(Motorola) are entities.
Named entity recognition (NER)
Aims to identity entities such as names of persons, organizations and locations in natural language text.
Our problem is similar to NER problem, but with some differences.
The current dominant technique for addressing the NER problem
Hidden Markov Models (HMM)
Maximum Entropy Models (ME)
Support Vector Machines (SVM)
Conditional Random Field (CRF)
Rely on large sets of labeled examples. Labeling is labor-intensive and time-consuming.
Mainly clustering. Gathering named entities from clustered groups based on the similarity of context. The techniques rely on lexical resources (e.g., WordNet), on lexical patterns and on statistics computed on a large unannotated corpus.
low precision and recall for the result
Show promise for identifying and labeling entities. Starting with a set of seed entities, semi-supervised methods use either class specific patterns to populate an entity class or distributional similarity to find terms similar to the seeds.
Given positive set P and unlabelled set U, S-EM produces a Bayesian classifier C, which is used to classify each vector uU and to assign a probability p (+|u) to indicate the likelihood that u belongs to the positive class.
fs (d )=Md * log ( 1 + n )
Where n is the frequency count of candidate entity d in the corpus.
“The voice on my phone was not so clear, worse than my previous phone. The battery life was long”
(1) Dependency relation
“This camera takes great pictures”
Exploits the dependency relations of
Opinions and features to extract
Given a set of seed opinion words (no feature input), we can extract features and also opinion words iteratively.
(2) Part-whole relation pattern
A part-whole pattern indicates one object is part of another
object. It is a good indicator for features if the class concept
word (the “whole” part) is known.
(3) “No” pattern
a specific pattern for product review and forum posts. People often express their comments or opinions on features by this short pattern (e.g. no noise)
There is a mutual enforcement relation between opinion words, part-whole relation and “no” patterns and features. If an adjective modifies many correct features, it is highly possible to be a good opinion word. Similarly, if a feature candidate can be extracted by many opinion words, part-whole patterns, or “no” pattern, it is also highly likely to be a correct feature. The Web page ranking algorithm HITS is applicable.
(1)Extract features by dependency relation, part-whole pattern etc.
(2)Compute feature score using HITS without considering frequency.
(3)The final score function considering the feature frequency
S = S(f) * log (freq(f))
freq(f) is the frequency count of feature f. and S(f) is the authority score of feature f.
with better results. wi.o is the opinion orientation of wi. d(wi, f) is the distance from f to wi.
e.g. “ this cellphone is not good.”
(O1, O2, F, po, h, t),
where O1 and O2 are the object sets being compared based on their shared features F, po is the preferred object set of the opinion holder h, and t is the time when the comparative opinion is expressed.