1 / 24

Sentiment analysis

Sentiment analysis. Or, how to find happiness. Why do we want sentiment info?. Useful input for detection Brand sentiment Useful input for prediction Stock market, box office revenues, political outcomes Potentially for social uprisings, terrorist incidents.

elon
Download Presentation

Sentiment analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sentiment analysis Or, how to find happiness.

  2. Why do we want sentiment info? • Useful input for detection • Brand sentiment • Useful input for prediction • Stock market, box office revenues, political outcomes • Potentially for social uprisings, terrorist incidents

  3. What do you really want to know?

  4. Brand satisfaction

  5. Quality of life

  6. Abstract predictor

  7. Three considerations for a sentiment analysis system • Data cleaning • One piece of the puzzle • Simple works best

  8. Data cleaning (Because it’s a dirty world)

  9. Data cleaning: on Twitter… • Spam accounts • Bots (Weather, sport, etc…) Answer: a) http://trst.me/ (from infochimps) b) Make your own system

  10. Data cleaning: from sentences to words • Tokenize the sentence(s) into words. (This may not be as easy as it seems). • Maybe do stopping/stemming, depending on application. • Pick a threshold of times we have to see a word in our training set, below which we ignore it. • Build a dictionary of words. Answer: a) Twokenize.py b) Write your own

  11. One piece of the puzzle

  12. Always make it part of a system • When it’s wrong (and this is quite often) it will be very obviously wrong • People don’t need to see this • This doesn’t actually detract from the utility of the system

  13. Success: • Tracking political polls. • Predicting box office revenues. • Predicting the stock market.

  14. Simple works best (for now)

  15. The quick version • Use supervised/semi-supervised learning method. • For most cases I would recommend Naïve Bayes on the Bag of Words representation. Very simple to implement and near-best performance. • If you don’t have any examples of happy/sad tweets (for your purpose), use known keywords, such as emoticons.

  16. :)

  17. ^_^

  18. :(

  19. <3

  20. :/

  21. Things that don’t really help (Generally less than 2% improvement) • More advanced classifiers (eg SVMs) • Part of Speech tagging • Parse trees • Semi-supervised methods if you have very large amounts of data

  22. The formula for happiness

  23. Basic positive/negative Twitter sentiment word list • http://alexdavies.net/projects/twitter-sentiment-word-lists/

  24. Thanks.

More Related