1 / 35

Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources

Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources. Supervisor Dr. Verena Rieser. Presented By Eshrag Refaee. OSACT 27 May 2014. Outline. 1. Introduction The concept of subjectivity and sentiment analysis (SSA)

osman
Download Presentation

Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Subjectivity and Sentiment Analysis of Arabic Tweets with Limited Resources Supervisor Dr. Verena Rieser Presented By EshragRefaee OSACT 27 May 2014

  2. Outline 1. Introduction • The concept of subjectivity and sentiment analysis (SSA) • Motivations and challengesof SSA for Arabic • Previous work on SSA of Arabic social networks 2. Experimental setup • Twitter corpus: collection and annotation • Evaluation metrics • Machine learners 3. Results and Error Analysis 4. Summary and future work

  3. Subjectivity and Sentiment analysis (SSA) • Definition: Analysing and understanding people’s sentiments, evaluations, opinions, attitudes, and emotions from written text.

  4. Hierarchical Model of Subjectivity and Sentiment analysis (SSA)

  5. SSA andSocial Networks • The growing importance of sentiment analysis coincides with the growth of social media such as micro-blogs.

  6. SSA andTwitter • Twitter (Statistic Brain, 2014) • March 2012, Twitter now available in Arabic (Twitter Blog, 2012)

  7. About Arabic • Arabic is the language of over 422 million people • Arabic language can be classified into three major levels (Habash, 2010): • Classic Arabic (CA) • Modern standard Arabic (MSA) • Arabic Dialects (AD). Used in social networks side-by-side

  8. Challenges with Respect to Arabic • Limited availability of NLP resources for DA. • Noisy features. • No large-scale Arabic Twitter corpus annotated for SSA publically available. • Sparse labelled data. • BUT: Lots of unlabelled data!

  9. Challenges With Respect to Twitter • ‘Bad language’ (Eisenstein, J. 2013) • Unclear sentiment indicator • Dynamic nature/ topic-shifting (Go et al, 2009). المساواة في قمع الحريات الشخصية عدل Equality in supressing personal freedom is justice ew , ugh instead of disgusting bro instead of brother

  10. Previous Work on SSA of Arabic Tweets • Word-based features. • SVMshown to perform best (large feature sets) • Evaluation: • 10-fold cross-validation • Held-out test set from same corpus •  No test for unseen topics/ scalability for topic shift!

  11. Outline 1. Introduction • Motivations and challengesof subjectivity and sentiment analysis (SSA) for Arabic • Previous work on SSA of Arabic social networks 2. Experimental setup • Twitter corpus: collection and annotation • Evaluation metrics • Machine learners 3. Results and Error Analysis 4. Summary and future work

  12. Methodology and Approach Features Human annotators Gold-standard labelled tweets Un-labelled tweets Arabic ALP tools Model evaluation Manually-annotated held-out test set Train machine learning scheme: SVM classifier

  13. Arabic Twitter SSA Corpora • Data Collection • Twitter Search Application Programming Interface (API) • Search criteria • Keywords, locations, etc. • Pre-processing • Normalising user-names, URLS, digits, query-terms.

  14. Arabic Twitter SSA Corpora: Gold Standard Data Set • Manually annotated for sentiment analysis (total=3,309) • 2 native speaker annotators (weighted Kappa=0.76)

  15. Arabic Twitter SSA Corpora: Held-out Test Set • 963 tweets were manually annotated for evaluating the trained models.

  16. Arabic Twitter SSA Corpora • Examples of annotated tweets

  17. Features Extraction

  18. SSA Classification:Problem Formulations

  19. Machine Learning Classifiers • Support Vector Machines (SVM): Sequential Minimal Optimization-SMO (Platt, 1999) • Majority baseline: ZeroR SVM aims to identify the Optimal hyperplanethat linearly separates data instances with the maximum margin (Hsu et al, 2003)

  20. Evaluation Metrics • F-measure • Accuracy: • Significant differences: T-test with p<0.05

  21. Outline 1. Introduction • Motivations and challengesof subjectivity and sentiment analysis (SSA) for Arabic • Previous work on SSA of Arabic social networks 2. Experimental setup • Twitter corpus: collection and annotation • Evaluation metrics • Machine learners 3. Results and Error Analysis 4. Summary and future work

  22. Results and evaluation

  23. Error Analysis: • The most predictive word uni-grams in the two datasets as evaluated by Chi-Squared

  24. Error Analysis • The most predictive word uni-grams in the two datasets as evaluated by Chi-Squared

  25. Current Work Noisy labels: #hashtags & A large-scale Arabic Twitter SSA Corpus: DISTANT supervision (DS) data set Un-labelled tweets Features Automatically-labelled tweets Arabic ALP tools Model evaluation: Manually-annotated test set Train machine learning scheme: Learn SVM classifier **Refaee and Rieser (2014). Can we Read Emotions from a smiley face? Emoticon-based distant supervision for subjectivity and sentiment analysis of Arabic Twitter feeds. In the 5th International Workshop on Emotion, Social Signals, Sentiment and Linked Open Data.

  26. Please come and see my poster on May 29, Time 11:45-13:25 Session: social media processing P 32 No. 317

  27. Thanks Looking forward to hear your feedback … Or contact me through Eaar1@hw.ac.uk @eshragR

  28. DS for SSA of social networks in other languages

  29. Example of annotation disagreement

  30. Methodology and Approach Noisy labels: #hashtags & Features Un-labelled tweets Arabic ALP tools Automatically-labelled tweets Model evaluatio: Manually-annotated test set Train machine learning scheme: Learn SVM classifier

  31. Approach and methodology

  32. Experimental settings • Pre-processing Remove re-tweets Normalize Latin characters , digits, URLs, user-names, hashtags Replace > 2 repetitive characters consecutively with only 2 Apply light Arabic stemmer Remove stop words • Problem formulations Two-stage binary classification: subjective vs. objective; positive vs. negative One-stage multi-class classification: positive vs. negative vs. neutral

  33. DS for SSA of social networks in other languages

  34. DS for SSA of social networks in other languages 73.81%

More Related