1 / 12

Weather and Tweets UCML 2013

Weather and Tweets UCML 2013. Members: Vinh Dang, Wai I Iong, Matthew Dudley, Jiyuan Li. Background. Analyzing tweets related to the weather whether it has a positive, negative, or neutral sentiment. whether the weather occurred in the past, present, or future.

gates
Download Presentation

Weather and Tweets UCML 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weather and Tweets UCML 2013 Members: Vinh Dang, Wai I Iong, Matthew Dudley, Jiyuan Li

  2. Background • Analyzing tweets related to the weather • whether it has a positive, negative, or neutral sentiment. • whether the weather occurred in the past, present, or future. • and what kind of weather the tweet references.

  3. The data • Training set: (http://www.kaggle.com/c/crowdflower-weather-twitter) • contains tweets, locations, and a confidence score for each of 24 possible labels. • about 78000 attributes

  4. The data Labels: • s1 + s2 + s3 + s4 + s5 = 1 • w1 + w2 + w3 + w4 = 1 • k1 + k2 + … + k15 may be greater than 1wd

  5. The data • Testing set: • contains the id, tweet, state and location • no “sentiment”, “when”, or “kind” labels • which is where we are heading to • about 42000 attributes

  6. Data Preprocessing • Data “normalizing” • convert html code into character (Ex: &gt → >) • examples: • convert all the hyperlinks in testing set into “{link}” • examples: • Tokenizing For example: “What a bright sunny!” “[what, a, bright, sunny, !]” • SQLite (for storing data)

  7. Methodology • Bags of Words • tf-idf • Approach: 1) Regression SVM (SVR) 2) Ridge Regression

  8. Error Measurement

  9. Result • Our result: • SVR RMSE = 0.26149 • Ridge RMSE = 0.16997 • Others: • The winner: 0.14314 • Start line (all zeros): 0.31957

  10. Result • A better approach (Testing data VS. Actual results) • Review of Labels

  11. Reference • CrowdFlower (2013) “Partly Sunny with a Chance of Hashtags.”, Kaggle, Retrieved from http://www.kaggle.com/c/crowdflower-weather-twitter. • Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm • Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

  12. Question? The End

More Related