yan dang yulei zhang hsinchun chen n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Yan Dang Yulei Zhang Hsinchun Chen PowerPoint Presentation
Download Presentation
Yan Dang Yulei Zhang Hsinchun Chen

Loading in 2 Seconds...

play fullscreen
1 / 44

Yan Dang Yulei Zhang Hsinchun Chen - PowerPoint PPT Presentation


  • 178 Views
  • Uploaded on

A Lexicon Enhanced Method for Sentiment Classification. Yan Dang Yulei Zhang Hsinchun Chen. 1. 1. Outline. Introduction Literature Review Research Questions Research Design Experimental Study Conclusions and Future Directions Acknowledgement. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Yan Dang Yulei Zhang Hsinchun Chen' - olina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline

Introduction

Literature Review

Research Questions

Research Design

Experimental Study

Conclusions and Future Directions

Acknowledgement

introduction
Introduction

As an emerging communication platform, Web 2.0 leads the Internet to be more and more user interactive. People can express and share their opinions and concerns in the cyberspace.

Therefore, more and more user generated content containing rich opinion and sentiment information has appeared in the Internet. Understanding such opinion and sentiment information becomes more and more important for both service/product providers and users.

However, the opinion and sentiment information is often unstructured and/or semi-structured data in the Internet. For example, online product reviews are often unstructured, subjective, and hard to digest within a short time period.

3

introduction cont d
Introduction (cont’d)

Sentiment classification aims to analyze direction-based text, i.e. text containing opinions and emotions, to determine whether a text is objective or subjective, or whether a subjective text contains positive or negative sentiments.

Sentiment classification techniques can be used to analyze the opinion and sentiment information in the Internet.

Previous literature shows that two types of approaches have been utilized for sentiment classification: machine learning and semantic orientation.

In the machine learning approach, each of the classifiers is trained on a collection of representative data.

In contrast, the semantic orientation approach does not require prior training; instead, it measures how far a word is inclined to be positive or negative.

4

introduction cont d1
Introduction (cont’d)

However, to the best of our knowledge, few studies have combined both approaches for sentiment classification.

In this study, we propose a lexicon enhanced method for sentiment classification which combines the two dominated approaches for sentiment classification.

In particular, we use the set of sentiment words gathered using the semantic orientation approach as features in the machine learning approach. We refer to such features as “sentiment features” in this study.

To demonstrate our proposed method, we conduct experimental studies using five sets of online product reviews.

We evaluate different feature sets consisting of content-free, content-specific, and sentiment features.

5

literature review
Literature Review

In general, sentiment analysis is concerned with analysis of direction-based text, i.e. text containing opinions and emotions (Abbasi, Chen and Salem, 2008).

Sentiment classification studies attempt to determine whether a text is objective or subjective, or whether a subjective text contains positive or negative sentiments.

The common two class problem involves classifying sentiments as positive or negative (Pang et al., 2002; Turney, 2002).

Additional variations include classifying sentiments as opinionated/subjective or factual/objective (Wiebe et al., 2001; Wiebe et al., 2004).

Some studies attempted to classify emotions, including happiness, sadness, anger, horror etc., instead of sentiments (Grefenstette et al., 2004; Mishne, 2005; Subasic & Huettner, 2001).

6

sentiment classification methods
Sentiment Classification Methods

Two types of approaches have been utilized in previous sentiment classification studies: machine learning (e.g., Pang et al. 2002; Garmon, 2004; Mishne, 2005; Wilson et al., 2005; Abbasi et al., 2008) and semantic orientation (e.g., Turney, 2002; Riloff and Wiebe, 2003; Attardi and Simi, 2006; Devitt and Ahmad, 2007; Denecke, 2008).

The machine learning approach:

The machine learning approach involves text classification techniques. This approach treats the sentiment classification problem as a topic-based text classification problem (Liu, 2007). Any text classification algorithm can be employed, e.g., naïve Bayes, SVM, etc.

This approach was experimented by Pang et al. (2002) to classify movie reviews into two classes: positive and negative. The study compared naïve Bayes, Maximum Entropy, and SVM. A testbed of 700 positive reviews and 700 negative reviews was used. The highest classification accuracy (82.9%) was achieve using SVM with 3-fold cross validation.

7

sentiment classification methods cont d
Sentiment Classification Methods (cont’d)

The semantic orientationapproach:

The semantic orientation approach performs classification based on positive and negative sentiment words and phrases contained in each evaluation text (Liu, 2007).

It does not require prior training in order to mine the data.

Two types of techniques have been used in previous sentiment classification research using the semantic orientation approach: (1) corpus-based techniques, and (2) dictionary-based techniques (Liu, 2007).

8

sentiment classification methods cont d1
Sentiment Classification Methods (cont’d)

(1) Thecorpus-based techniques:

Corpus-based techniques try to find co-occurrence patterns of words to determine their sentiments. Studies using this type of techniques include: Turney, 2002; Riloff and Wiebe, 2003; Hatzivassiloglou and McKeown, 1997; Yu and Hatzivassiloglou, 2003; and Grefenstette et al., 2004.

Different studies used different strategies to determine sentiments.

Turney (2002) calculated a phrase’s semantic orientation to be the mutual information between the phrase and the word “excellent” (as positive polarity) minus the mutual information between the phrase and the word “poor” (as negative polarity). The overall polarity of an entire text was predicted as the average semantic orientation of all the phrases that contained adjectives or adverbs.

Riloff and Wiebe (2003) used a bootstrapping process to learn linguistically richpatterns of subjective expressions in order to classify subjective expressions from objective expressions. Starting with a set of objective patterns adopted from previous literature, the process used a pattern extraction algorithm to learn potential subjective patterns. The learned patterns were used to decide whether an expression is subjective or not.

9

sentiment classification methods cont d2
Sentiment Classification Methods (cont’d)

(2) The dictionary-based techniques:

Dictionary-based techniques use synonyms, antonyms and hierarchies in WordNet (or other lexicons with sentiment information) to determine word sentiments. Studies using this type of techniques include: Hu and Liu, 2004; Valitutti et al., 2004; Kim and Hovy, 2004; and Andreevskaia and Bergler, 2006.

SentiWordNet (Esuli and Sebastiani, 2006) is a lexical resource for sentiment analysis. It is built upon WordNet and has more sentiment related features than WordNet. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, and objectivity. It has been used as the lexicon in recent sentiment classification studies (e.g., Attardi and Simi, 2006; Devitt and Ahmad, 2007; Denecke, 2008; Fahrni and Klenner, 2008).

10

sentiment classification methods cont d3
Sentiment Classification Methods (cont’d)

Each of the two approaches has its own pros and cons.

The machine learning approach tends to be more accurate than the semantic orientation approach (Chaovalit and Zhou, 2005; Turney and Littman, 2003).

However, a machine learning model is tuned to the training corpus; therefore, training is needed if it is applied elsewhere (Turney and Littman, 2003).

In contrast, the semantic orientation approach has better generality. But its classification accuracy is often not as high as that of the machine learning approach.

Chaovalit and Zhou (2005) compared the two approaches on sentiment classification.

They conducted experiment on movie reviews with two polarities: positive and negative.

Their experimental results confirmed that the machine learning approach is more accurate but requires more time to train the model; in comparison, the semantic orientation approach is less accurate but is more efficient for real-time applications.

They achieved the highest accuracy of 85.54% with 3-fold cross validation using the machine learning approach and the highest accuracy of 77% using the semantic orientation approach. These results are comparable to or even better than previous findings.

11

sentiment classification features
Sentiment Classification Features

There are mainly two categories of features that have been adopted in previous sentiment classification studies using the machine learning approach: (1) content-free features, and (2) content-specific features.

Content-free features include lexical features, syntactic features, and structural features. In this study, we treat content-free features as the baseline features.

Lexical features are character-, or word-based statistical measures of lexical variation. They mainly include: character-based lexical features (Yule, 1938; Argamon et al., 2003; Gamon, 2004), vocabulary richness measures (Yule, 1944), and word-based lexical features (de Vel et al. 2001; Mishne, 2005 ; Zheng et al. 2006).

Syntactic features indicate the patterns used to form sentences. They mainly include: function words (Mosteller, 1964; Koppel et al.,2002; Koppel et al. 2006), punctuations (Baayen et al., 2002), and part-of-speech (Baayen et al., 2002; Argamon et al. 1998; Pang et al., 2002; Gamon, 2004; Nowson & Oberlander, 2006).

Structural features show the text organization and layout. Traditional structural features include greetings, signatures, the number of paragraphs and average paragraph length (de Vel et al. 2001; Zheng et al. 2006). Other structural features include technical features such as the use of various file extensions, fonts, sizes, and colors (Abbasi and Chen, 2005).

Content-specific features are comprised of important keywords and phrases on certain topics (Martindale & McKenzie, 1995; Zheng et al. 2006) such as word n-grams (Diederich et al. 2003; Abbasi and Chen, 2005; Nowson & Oberlander, 2006; Abbasi and Chen, 2008). Previous studies showed that content-specific features were helpful in improving text classification (Abbasi and Chen, 2005; Zheng et al. 2006), including sentiment classification (Abbasi and Chen, 2008).

12

sentiment classification features cont d
Sentiment Classification Features (cont’d)

In previous sentiment classification studies using the semantic orientation approach, the overall sentiment of a text is determined by the sentiments of a bunch of words and/or phrases appeared in the text. Different categories of words/phrases have been used to determine the overall sentiment of a text.

For example, Hatzivassiloglou and Wiebe (2000) used different types of adjectives appeared in a text.

Hu and Liu (2004) also used adjectives.

Chaovalit and Zhou (2005) used both adjectives and adverbs.

Turney (2002) used all the two-word phrases that contained adjectives or adverbs in a given text.

Denecke (2008) used adjectives, verbs, and nouns.

As mentioned before, in this study, we refer to such sentiment words and/or phrases as “sentiment features,” and incorporate them into the machine learning approach as an additional feature dimension.

13

feature selection
Feature Selection

In text classification studies, in order to reduce the complexity of the texts and make them easier to handle, the full text is often transformed to a vector of features which describes the contents of the text.

However, not all features are necessary or sufficient to learn the concept of interest. Many of them could be noisy or redundant. Using all the features often results in over fitting and poor predictions (Meiri and Zahavi, 2006).

Therefore, feature selection, that aims at identifying a minimal-sized subset of features relevant to the target concept, can be applied (Dash and Liu 1997).

The objective of feature selection is threefold: improving the prediction accuracy, providing faster and more cost-effective prediction, and providing a better understanding of the underlying process that generated the data (Li, Su et al. 2007).

14

feature selection cont d
Feature Selection (cont’d)

A feature selection method generates different candidates from the feature space and assesses them based on some evaluation criterion to find the best feature subset (Li, Su et al. 2007).

Each feature selection method contains two parts: the evaluation criterion and the generation procedure of candidates.

The evaluation criterion is used to assess the goodness of features or feature subsets.

The generation procedure determines how to explore different candidates to find the optimal ones. Graph-based search algorithms are often used to find the optimal features (Li, Su et al. 2007).

15

research gaps
Research Gaps

In previous sentiment classification research, most studies adopted either the machine learning approach or the semantic orientation approach.

To the best of our knowledge, few studies have combined the two approaches into one framework.

Accordingly, few studies have adopted both the content-free and content-specific features which are often used in machine learning approach and the sentiment features which are often used in semantic orientation approach.

16

research questions
Research Questions

How to combine the machine learning approach and the semantic orientation approach into one framework in sentiment classification?

In particular, will using the combination of content-free, content-specific, and sentiment features improve sentiment classification performance?

Will feature selection improve the sentiment classification performance of the proposed framework?

17

data acquisitio n
Data Acquisition

Data Collecting

Spidering programs are developed to collect the textual data from online sources as HTML pages.

Data Parsing

Parsing programs are developed to parse out the information from the raw HTML pages and store it into a database.

19

feature generation
Feature Generation

Content-Free and Content-Specific Feature Extraction

Content-free features (i.e., lexical features, syntactic features, and structural features ) and content-specific features (e.g., word n-grams) are extracted from the textual data collection.

To extract sentiment features, we first need to conduct Part-of-Speech (POS) tagging on the data collection. Once we get the POS tag for each word, we can calculate the sentiment score of the word by looking up a sentiment-based lexicon.

20

content free features f1
Content-Free Features (F1)

Lexical features:

Lexical features are character-, or word-based statistical measures of lexical variation.

Lexical features mainly include: character-based lexical features (Yule, 1938; Argamon et al., 2003; Gamon, 2004), vocabulary richness measures (Yule, 1944), and word-based lexical features (de Vel et al. 2001; Mishne, 2005 ; Zheng et al. 2006).

Examples of character-based lexical features are total number of characters, characters per sentence, characters per word, and the usage frequency of individual letters.

Examples of vocabulary richness measures include the number of words that occur once (hapax legomena) and twice (hapax dislegomena), and some other statistical measures defined by Yule (1944).

Examples of word-based lexical features are total number of words, words per sentence, word length distribution.

In this study, we adopt the character-based lexical features used in de Vel (2000), Forsyth and Holmes (1996), and Ledger and Merriam (1994); the vocabulary richness features used in Tweedie and Baayen (1998); and the word-length frequency features used in Mendenhall (1887) and de Vel et al. (2000).

In total, we use 87 lexical features.

21

content free features f1 cont d
Content-Free Features (F1) (cont’d)

Syntactic features:

Syntactic features indicate the patterns used to form sentences. Syntactic features are important because they can indicate people’s different habits of organizing sentences (Zheng et al., 2006).

Function words and punctuation are often used as syntactic features.

Function Words:

Different sets of function words, ranging from 12 to 122, have been tested in various studies (Baayen et al., 1996; Burrows, 1989; de Vel et al., 2001; Holmes & Forsyth, 1995; Tweedie & Baayen, 1998).

However, there is no generally accepted good set of function words for different applications.

Here, we adopt a large set of 150 function words used in Zheng et al., (2006). Zheng et al., (2006) also focused on Web forum messages, although they did authorship classification instead of gender classification.

Punctuations:

We adopt the 8 punctuation features suggested by Baayen et al. (1996).

Therefore, in total, we use 158 syntactic features.

content free features f1 cont d1
Content-Free Features (F1) (cont’d)

Structural features:

Structural features show the text organization and layout. They are especially useful for online text (de Vel et al. 2001).

Traditional structural features include greetings, signatures, the number of paragraphs and average paragraph length (de Vel et al. 2001; Corney et al., 2002; Zheng et al. 2006).

For example, De Vel (2000) introduced several structural features specifically for e-mail.

Zheng et al. (2006) used 14 structural features in their authorship classification study for Web forum messages.

Other structural features include technical features such as the use of various file extensions, fonts, sizes, and colors (Abbasi and Chen, 2005).

For example, in their studies (e.g., Abbasi & Chen, 2005; Abbasi & Chen, 2008; Abbasi, Chen, & Nunamaker, 2008), Abbasi and his collaborators used 62 structural features focusing on word structure (e.g., number of paragraphs, sentences per paragraph) and technical structure (e.g., font colors, font sizes, use of images).

In this study, we adopt some of the structural features used in previous studies. We choose several commonest features that can be applied to a broad number of general online review sites.

In total, we use 5 structured features:

Total number of sentences in a review,

Total number of paragraphs in a review,

Number of sentences per paragraph in a review,

Number of characters per paragraph in a review, and

Number of words per paragraph in a review.

In this study, we do not use a big number of structural features (e.g., font colors and font sizes) since some online review sites do not have the related features/functions such as allowing users to change the font colors and font sizes.

content specific features f2
Content-Specific Features (F2)

Content-specific features can be:

Either a relatively small number of features identified by manual selection (Zheng et al., 2006; Li et al., 2006), or

A large number of n-grams learned from the collection (Peng et al., 2003; Abbasi & Chen, 2005; Abbasi & Chen, 2008; Abbasi, Chen and Salem, 2008).

In this study, we use unigrams and bi-grams as content-specific features.

The unigrams and bi-grams are collected from the whole data collection.

After removing the semantically empty stop-words, we keep the unigrams and bi-grams appearing more than five times in the whole collection as the content-specific features.

24

sentiment features f3
Sentiment Features (F3)

The following diagram shows the process to extract sentiment features (F3).

The details are described in the following slides.

sentiment features f3 cont d
Sentiment Features (F3) (cont’d)

Extracting adjectives, adverbs and verbs

In order to extract the sentiment features, we first conduct Part-of-Speech (POS) tagging on the whole data collection.

We use Stanford POS tagger (http://nlp.stanford.edu/software/tagger.shtml) to do the tagging.

Then we select all the adjectives, adverbs, and verbs as the sentiment features.

Adjectives are most widely used to denote semantic orientations; adverbs, verbs, and nouns have also been used to express sentiments (Nasukawa and Yi, 2003).

Many studies used adjectives as sentiment features (e.g., Hatzivassiloglou and Wiebe 2000; Hu and Liu, 2004; Fahrni and Klenner, 2008). Some studies have used both adjectives and adverbs (e.g., Chaovalit and Zhou 2005; Benamara et al. 2007). Other studies have also added verbs and nouns (e.g., Nasukawa and Yi, 2003; Bethard et al., 2004).

In this study, we choose to use adjectives, adverbs, and verbs as the sentiment features. We do not include nouns since they are more context dependent.

26

sentiment features f3 cont d1
Sentiment Features (F3) (cont’d)

Obtaining the prior-polarity sentiment scores of adjectives, adverbs and verbs

To determine the sentiment scores of the extracted adjectives, adverbs and nouns, we look up a sentiment-based lexicon.

We use SentiWordNet as our lexicon.

SentiWordNet (Esuli and Sebastiani, 2006) is a lexical resource for sentiment analysis. It assigns to each synset of WordNet three sentiment scores: positivity, negativity, and objectivity. It has been used as the lexicon in sentiment classification studies.

For example, Devitt and Ahmad (2007) used SentiWordNet for polarity detection in financial news.

Attardi and Simi (2006) used SentiWordNet to examine the opinion words (with orientation strength greater than 0.4) in order to retrieve opinionated documents.

Denecke (2008) used SentiWordNet to determine the polarities of adjectives which then lead to the document polarities for multilingual sentiment analysis.

Since each word in SentiWordNet has multiple senses, we calculate the average polarity scores (i.e., positive, negative, and objective scores) for its adjective, adverb, and verb senses separately using the prior-polarity formula adopted by previous literature (Fahrni and Klenner, 2008; Denecke, 2008).

27

sentiment features f3 cont d2
Sentiment Features (F3) (cont’d)

For example, the word “bad” has 14 synsets in the sense (i.e., POS) of adjective and 2 synsets in the sense of adverb. It dose not have any synset in the sense of verb. Therefore, we get the following scores based on the above formula:

  • The prior-polarity formula (Fahrni and Klenner, 2008; Denecke, 2008):

Where pos {adjective, adverb, verb}, i {positive, negative, objective}, and k denotes the synsets of a given word in a particular sense.

28

sentiment features f3 cont d3
Sentiment Features (F3) (cont’d)

Calculating sentiment features ofadjectives, adverbs and verbs

Based on the prior-polarity formula, for each word in a given POS sense (adjective, adverb or verb), we get three scores regarding its positivity, negativity, and objectivity respectively.

To determine its final score as a sentiment feature, we develop a strategy shown in the next slide.

The idea is that since each word in a given POS sense has an objective score, we use this score to first filter out all the less subjective words. Specifically, we use the mid point of the 0-1 score scale (i.e., 0.5), to differentiate the subjective words (in certain POS senses) and the objective words (in certain POS senses). And only the words (in certain POS senses) with objective scores smaller than or equal to 0.5 are kept as sentiment features.

Then, we compare the values of the positive score and the negative score (both ranging from 0 to 1) of each remaining word (in a given POS sense). If the positive score is greater than the negative score, we treat the word in the given sense as a positive sentiment feature and add it into our sentiment feature set with the polarity value being its positive sentiment score. Thus all values related to the positive sentiment feature are greater than zero. Otherwise, we treat the word in the given sense as a negative sentiment feature and add it into our sentiment feature set with the polarity value being the negate of its negative sentiment score. Thus all values related to the negative sentiment feature are smaller than zero. In this way, we differentiate the positive sentiment features from the negative sentiment features.

In addition, for the words (in certain POS senses) with the absolute values of the positive scores being equal to those of the negative scores, we exclude them from sentiment features, since they do not show clear polarity tendency.

sentiment features f3 cont d4
Sentiment Features (F3) (cont’d)

Sentiment Feature Calculation Strategy:

For each word in a given POS sense (denoted as “word=pos”), we calculate the related sentiment feature using the follow strategy.

If Score(word=pos)objective > 0.5

We consider the given word in the particular pos sense is objective. Therefore, we exclude it from our sentiment feature set.

Else

If Score(word=pos)positive > Score(word=pos)negative

We add (word=pos, |Score(word=pos)positive|) in to our sentiment feature set.

If Score(word=pos)positive < Score(word=pos)negative

We add (word=pos, -|Score(word=pos)negative|) in to our sentiment feature set.

If Score(word=pos)positive = Score(word=pos)negative

We exclude it from our sentiment feature set.

  • For example, according to the above strategy, we get the features: (“bad”=adjective, -0.64) and (“bad”=adverb, -0.56).

30

feature sets
Feature Sets

Three feature sets are built based on different types of features in an incremental way:

Feature set F1,

Feature set F1+F2,

Feature set F1+F3 and

Feature set F1+F2+F3.

31

feature selection1
Feature Selection

Feature selection may improve the classification performance by selecting an optimal subset of features.

We use information gain heuristic due to its effectiveness reported in previous text classification studies (Koppel and Schler, 2003; Abbasi & Chen, 2008).

After applying feature selection on the large feature sets, we build the following selected two feature sets:

Selected feature set F1+F2, and

Selected feature set F1+F2+F3.

32

classification and evaluation
Classification and Evaluation

In total, we generate 6 different feature sets:

Feature set F1,

Feature set F1+F2,

Feature set F1+F3,

Feature set F1+F2+F3,

Selected feature set F1+F2, and

Selected feature set F1+F2+F3.

We aim to examine which feature set can achieve the highest performance in sentiment classification.

We choose SVM as the classifier because of its often reported best performance and has been adopted by many previous text classification studies (Abbasi & Chen, 2008; Abbasi, Chen and Salem, 2008; Zheng et al., 2006; Li et al., 2006).

For evaluation, we use the standard classification performance metrics used in previous information retrieval and text classification studies: accuracy, precision, recall, and F-measure.

33

slide34

Evaluation Metrics

  • Overall correctness:
  • Correctness for class i:

Class 1: Positive text

Class 2: Negative text

34

34

experimental study
Experimental Study

To examine the proposed lexicon enhanced method in sentiment classification, we compared the performances of different feature sets using SVM as the classifier because of its reported performance in previous sentiment analysis studies (Abbasi, Chen and Salem, 2008).

There are five different online product review data sets including: digital cameras, books, DVDs, electronics, and kitchen appliances, respectively. The latter four categories are from Blitzer's multi-domain sentiment data set (Blitzer, Dredze and Pereira, 2007).

For each test bed, we randomly chose 90% of reviews as training data and the remaining 10% as testing data for the train/test split. We used 10-fold cross-validation to conduct the evaluation.

experimental results and discussions
Experimental Results and Discussions

The following table show the number of features for each testbed. We can see that F1+F2, F1+F2+F3 are very big. Therefore, we conduct feature selection on these two feature sets and generate selected F1+F2 and selected F1+F2+F3.

discussions
Discussions

The values of all four measures increased as additional types of features were added. The feature set F1+F2+F3 outperformed both feature sets F1+F2 and F1+F3 respectively, each of which in turn outperformed the feature set F1. The increases from the feature set F1 to the feature set F1+F3 and from the feature set F1+F2 to the feature set F1+F2+F3 indicated the considerable contribution of the newly introduced F3 features.

Although in most cases, the increases from the feature set F1 to the feature set F1+F2 and from the feature set F1+F3 to the feature set F1+F2+F3 were relatively larger, they were caused by a much larger number of F2 features which are domain dependent. Therefore, we believe the F3 features introduced in this study can play an important role in improving sentiment classification performance.

In addition, except for the data set on digital cameras, all the other test beds showed that feature selection increased the classification performance on all four measurement dimensions.

discussions1
Discussions

The best performances for all five test beds were achieved when combining all three types of features and when conducting feature selection for the four test beds from Blitzer's multi-domain sentiment data set.

The fact that feature selection did not improve the performance on the digital camera test bed is not surprising. Compared to the other four, this test bed has a smaller number of reviews, and in order to generalize the feature selection model achieved from the training data to the testing data, a large number of training data points is often needed.

discussion
Discussion

In sum, the results show that adding the newly introduced sentiment features which are often used in the existing semantic orientation approach, and the content-free and content-specific features that come from the existing machine learning approach can improve sentiment classification performance significantly.

The better performance can be attributed to the rich polarity information introduced by the sentiment features (including adjectives, adverbs, and verbs). In addition, conducting feature selection was also helpful.

examples of selected f3 features
Examples of Selected F3 Features

We further examined the Selected feature set F1+F2+F3 which achieved the best performance.

As an example, we list some important sentiment features in the selected feature set F1+F2+F3 of the kitchen appliance test bed, which had achieved the highest overall accuracy among all the test beds.

The listed features are from the best performed fold (with 87% overall accuracy) in the 10-fold cross-validation.

* The POS notations are based on the Penn Treebank part-of-speech tags as adopted in the Stanford POS Tagger. JJ means adjective; JJR means adjective and comparative; and RB means adverb.

conclusions and future directions
Conclusions and Future Directions

In this study, we proposed a lexicon enhanced method by combining the two approaches into one framework. Specifically, we used the sentiment words from the existing semantic orientation approach as an additional dimension of features (referred to as “sentiment features” in this study) for the machine learning based classifier.

The sentiment features were extracted using the dictionary-based technique because of its efficiency compared with the corpus-based technique.

In total, three types of features were adopted in our proposed method, including: content-free features (F1) and content-specific features (F2) from the existing machine learning approach, and sentiment features (F3) from the existing semantic orientation approach. Among them, F1 and F3 features are domain independent, while F2 features are not.

conclusions and future directions1
Conclusions and Future Directions

A direction for further study would be to refine the lexicon and extend the sentiment feature extraction procedure.

Further research can also explore other sentiment feature generation methods, such as corpus-based techniques, and compare their performance.

In addition, feature selection on large feature sets has shown to improve the classification performance on relatively large data sets. Comparing different feature selection algorithms to find the best one for our proposed sentiment classification method could be an additional future research direction.

Moreover, although we used English language review data in this study, the proposed method can also be applied to other languages, and a multilingual sentiment-based lexicon needs to be developed in the future.

acknowledgement
Acknowledgement

This work was supported by the NSF Computer and Network Systems (CNS) Program, CNS-0709338, September 2007 - August 2010.