An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Introduction • Blogs have recently emerged as a new grassroots publishing medium. • A key feature that distinguishes blog content from other Web content is their subjective nature. • Bloggers tend to express opinions and comments towards some given targets, such as persons, organizations or products.
Introduction • Under the TREC opinion finding task, only a handful of groups achieved an improvement over their baseline, using techniques such as NLP or SVM classifiers. • These proposed approaches either involve considerable manual efforts in collecting evidence for opinions, or lead to little improvement over a baseline that does not include any opinion finding feature.
Introduction • This paper proposes a statistical and light-weight automatic dictionary-based approach. • Also shows that despite its apparent simplicity, it provides statistically significant improvements over robust baselines, including the best TREC baseline run, without any manual effort.
The Statistical Dictionary-basedApproach to Opinion Retrieval • Automatically generates a dictionary from the collection without requiring manual effort. • Assigns a weight to each term in the dictionary, which represents how opinionated the term is. • Assigns an opinion score to each document in the collection using the top weighted terms from the dictionary as a query. • Appropriately combines the opinion score with the initial relevance score produced by the retrieval baseline.
Dictionary Generation • To derive the dictionary, we filter out too frequent or too rare terms in the collection. • We remove those terms because if a term appears too many or too few times in the collection, then it probably contains too little or too specific information so that it can not be generalized to different queries in indicating opinion.
Dictionary Generation • We firstly rank all terms in the collection by their within-collection frequencies in descending order. • The terms, whose rankings are in the range (s·#terms, u·#terms), are selected in the dictionary. • We apply s = 0.00007 and u = 0.001.
Term Weighting • D(Rel): relevant document set. • D(opRel): opinionated relevant document set. • For each term t in the opinion term dictionary, we measure wopn(t), the divergence of the term’s distribution in D(opRel) from that in D(Rel). • This divergence value measures how a term stands out from the opinionated documents, compared with all relevant documents. • The higher the divergence is, the more opinionated the term is.
Term Weighting • A commonly used measure for term weighting is the KL divergence from a term’s distribution in a document set to its distribution in the whole collection.
Term Weighting • KL divergence measure considers only the divergence from one distribution to the other, while ignoring how frequent a term occurs in the opinionated documents. • The weights of the terms in the opinion dictionary might be biased towards the terms with high KL divergence values, but containing low information in the opinionated document set D(opRel).
Term Weighting • Another method: Bo1 term weighting model, which measures how informative a term is in the set D(opRel) against D(Rel). λ= tfrel/Nrel
Generating the Opinion Score • We take the X (in the experiment, set X=100) top weighted terms from the opinion dictionary, and submit them to the retrieval system as a query Qopn. • The retrieval system assigns a relevance score to each document in the collection. • Such a relevance score reflects the extent to which the top weighted opinionated terms are informative in the document, capturing the overall opinionated nature of the document. • This is called the opinion score: Score(d, Qopn).
Score Combination • Linear combination: • Log. combination:
Experiment: Data • Dataset: Blog06 collection. • Use permalinks, which are the blog posts and their associated comments. • Each term is stemmed using Porter’s English stemmer, and standard English stopwords are removed.
Experiment: Baseline • InLB document weighting model: b=0.2337
Experiment: External Opinion Dictionary • We also manually generate a dictionary compiled from various external linguistic resources. • The dictionary contains approximately 12,000 English words, mostly adjectives, adverbs and nouns, which are supposed to be subjective. • In this paper, we denote the manually edited dictionary by the external dictionary, and we denote the automatically derived one by the internal dictionary.
Experiment: Evaluation Use Bo1 term weighting method. Set a=0.25, k=250.
Conclusions and Future Work • This paper has proposed an effective and practical approach to retrieving opinionated blog posts without the need for manual effort. • The use of the automatically generated internal dictionary provides a retrieval performance that is as good as the use of an external dictionary manually compiled from various linguistic resources.
Conclusions and Future Work In the future: • Extend the work to detecting the polarity or the orientation of the retrieved opinionated documents. • Study the connection of the opinion finding task to question answering. • Ex. Extracting the opinionated sentences within a blog post about a given target.