experimental study on sentiment classification of chinese review using machine learning techniques
Download
Skip this Video
Download Presentation
Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques

Loading in 2 Seconds...

play fullscreen
1 / 17

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques. Jun Li and Maosong Sun Department of Computer Science and Technology Tsinghua University, Beijing, China IEEE NLP-KE 2007. Outline. Introduction Corpus Features Performance Comparison

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques' - giacomo-birney


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
experimental study on sentiment classification of chinese review using machine learning techniques

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques

Jun Li and Maosong Sun

Department of Computer Science and Technology

Tsinghua University, Beijing, China

IEEE NLP-KE 2007

outline
Outline
  • Introduction
  • Corpus
  • Features
  • Performance Comparison
  • Analysis and Conclusion
introduction
Introduction
  • Why do we perform the task ?
    • Much of the attention has centered on feature based sentiment extraction
    • Sentence-level analysis is useful, but it involves complex processing and usually format dependent (liu et al www05)
  • Sentiment Classification using machine learning techniques
    • based on the overall sentiment of a text
    • Easily transfer to new domains with a training set.
    • Applications:
      • Split reviews into the sets of positive and negative
      • Monitor bloggers mood trend
      • Filter subjective web pages
corpus
Corpus
  • From www.ctrip.com
  • Average length 69.6 words with std 89.0
    • 90% of the reviews are less than 155 words
    • including some English words
review rating distribution score threadhold
Review rating distribution & score threadhold
  • 4.5 and up are considered positive, 2.0 and below are considered negative.
  • 12,000 reviews as training set, 4,000 reviews as test set
features text representation
Features – text representation
  • Text representation schemes
    • Word-Based Unigram (WBU), widely used
    • Word-Based Bigram (WBB)
    • Chinese Character-Based Bigram (CBB)
    • Chinese Character-Based Trigram (CBT)

Table 1. Statistics of training set with four text representation schemes

features representation in a graph model
Features – representation in a graph model

Features representation (n=2) in a graph model.

D

f1

f2

fk-1

x1

x2

x3

xk

xk-1

performance comparison methods
Performance Comparison - methods
  • Support Vector Machines (SVM)
  • Naïve Bayes (NB)
  • Maximum Entropy (ME)
  • Artifical Neural Network (ANN)
    • two layers feed-forward
  • Baseline: Naive Counting
    • Predict by comparsion of number of sentiment words.
    • Heaivly depends on the sentiment dictionary
    • micro-averaging F1 0.7931, macro-averaging F1 0.7573.
performance comparison wbu
Performance Comparison - WBU

SVM, NB, ME, ANN using WBU as features with different feature weights

performance comparison wbu1
Performance Comparison - WBU

Four methods using WBU as features

performance comparison wbb
Performance Comparison - WBB

Four methods using WBB as features

performance comparison cbb cbt
Performance Comparison – CBB & CBT

Four methods using CBB as features

Four methods using CBT as features

analysis
Analysis
  • On the average, NB outperforms all the other classifiers using WBB and CBT
    • N-gram based features relaxes conditional independent assumption of Naive Bayes Model
    • capture real integral semantic content
  • People like to use combination of words to express positive and negative sentiment.
conclusion
Conclusion
  • (1) On the average, NB outperforms all the classifiers when using WBB, CBT as text representation scheme with bool weighing under different feature dimensionality reduced by chi-max, and is more stable than others.
  • (2) Compared with WBU, WBB and CBB have more strong meaning as semantic unit for classifiers.
  • (3) at most time, tfidf-c is much better for SVM and ME.
  • (4) Considering SVM achieve the best performance under all conditions and is the most popular method. We recommend using WBB, CBB to represent text with tfidf-c as feature weighting to obtain a better performance relative to WBU.
thank you
Thank you!

Q & A

Dataset and software is avaiable at http://nlp.csai.tsinghua.edu.cn/~lj/

ad