Experimental study on sentiment classification of chinese review using machine learning techniques
Download
1 / 17

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques - PowerPoint PPT Presentation


  • 148 Views
  • Uploaded on

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques. Jun Li and Maosong Sun Department of Computer Science and Technology Tsinghua University, Beijing, China IEEE NLP-KE 2007. Outline. Introduction Corpus Features Performance Comparison

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques' - giacomo-birney


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Experimental study on sentiment classification of chinese review using machine learning techniques

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques

Jun Li and Maosong Sun

Department of Computer Science and Technology

Tsinghua University, Beijing, China

IEEE NLP-KE 2007


Outline
Outline Review using Machine Learning Techniques

  • Introduction

  • Corpus

  • Features

  • Performance Comparison

  • Analysis and Conclusion


Introduction
Introduction Review using Machine Learning Techniques

  • Why do we perform the task ?

    • Much of the attention has centered on feature based sentiment extraction

    • Sentence-level analysis is useful, but it involves complex processing and usually format dependent (liu et al www05)

  • Sentiment Classification using machine learning techniques

    • based on the overall sentiment of a text

    • Easily transfer to new domains with a training set.

    • Applications:

      • Split reviews into the sets of positive and negative

      • Monitor bloggers mood trend

      • Filter subjective web pages


Corpus
Corpus Review using Machine Learning Techniques

  • From www.ctrip.com

  • Average length 69.6 words with std 89.0

    • 90% of the reviews are less than 155 words

    • including some English words


Review rating distribution score threadhold
Review rating distribution & score threadhold Review using Machine Learning Techniques

  • 4.5 and up are considered positive, 2.0 and below are considered negative.

  • 12,000 reviews as training set, 4,000 reviews as test set


Features text representation
Features Review using Machine Learning Techniques– text representation

  • Text representation schemes

    • Word-Based Unigram (WBU), widely used

    • Word-Based Bigram (WBB)

    • Chinese Character-Based Bigram (CBB)

    • Chinese Character-Based Trigram (CBT)

Table 1. Statistics of training set with four text representation schemes


Features representation in a graph model
Features Review using Machine Learning Techniques– representation in a graph model

Features representation (n=2) in a graph model.

D

f1

f2

fk-1

x1

x2

x3

xk

xk-1


Features weight
Features - weight Review using Machine Learning Techniques


Performance comparison methods
Performance Comparison - methods Review using Machine Learning Techniques

  • Support Vector Machines (SVM)

  • Naïve Bayes (NB)

  • Maximum Entropy (ME)

  • Artifical Neural Network (ANN)

    • two layers feed-forward

  • Baseline: Naive Counting

    • Predict by comparsion of number of sentiment words.

    • Heaivly depends on the sentiment dictionary

    • micro-averaging F1 0.7931, macro-averaging F1 0.7573.


Performance comparison wbu
Performance Comparison - WBU Review using Machine Learning Techniques

SVM, NB, ME, ANN using WBU as features with different feature weights


Performance comparison wbu1
Performance Comparison - WBU Review using Machine Learning Techniques

Four methods using WBU as features


Performance comparison wbb
Performance Comparison - WBB Review using Machine Learning Techniques

Four methods using WBB as features


Performance comparison cbb cbt
Performance Comparison Review using Machine Learning Techniques– CBB & CBT

Four methods using CBB as features

Four methods using CBT as features


Performance comparison
Performance Comparison Review using Machine Learning Techniques


Analysis
Analysis Review using Machine Learning Techniques

  • On the average, NB outperforms all the other classifiers using WBB and CBT

    • N-gram based features relaxes conditional independent assumption of Naive Bayes Model

    • capture real integral semantic content

  • People like to use combination of words to express positive and negative sentiment.


Conclusion
Conclusion Review using Machine Learning Techniques

  • (1) On the average, NB outperforms all the classifiers when using WBB, CBT as text representation scheme with bool weighing under different feature dimensionality reduced by chi-max, and is more stable than others.

  • (2) Compared with WBU, WBB and CBB have more strong meaning as semantic unit for classifiers.

  • (3) at most time, tfidf-c is much better for SVM and ME.

  • (4) Considering SVM achieve the best performance under all conditions and is the most popular method. We recommend using WBB, CBB to represent text with tfidf-c as feature weighting to obtain a better performance relative to WBU.


Thank you
Thank you! Review using Machine Learning Techniques

Q & A

Dataset and software is avaiable at http://nlp.csai.tsinghua.edu.cn/~lj/


ad