1 / 18

Tweetool (0. 1 100 version) Final Report

A Twitter Recommend System based on Topic Modeling. Tweetool (0. 1 100 version) Final Report. Yilei Qian Computer Science University of Southern California qianyilei.usc@gmail.com. Ideas. Following too many points on Twitter Too many news every day

sofia
Download Presentation

Tweetool (0. 1 100 version) Final Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Twitter Recommend System based on Topic Modeling Tweetool(0. 1 100 version)Final Report YileiQian Computer Science University of Southern California qianyilei.usc@gmail.com

  2. Ideas • Following too many points on Twitter • Too many news every day • Cannot find the interested and valued news • Don’t know the name which user want to follow • Need someone to recommend who to follow • Need someone to recommend the hottest news • Use topic modeling to re-rank all the user

  3. Traditional Method

  4. Traditional Method

  5. Traditional Method

  6. Topic Modeling

  7. Topic Modeling

  8. Topic Modeling • a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. • Always used in natural language processing. • Reference Papers: • Steyvers,m. and Griffiths, T., “Probabilistic topic models,” Hand book of latent semantic analysis • Blei, D.M and Ng, A.Y and Jordan, M.I, “Latent Dirichlet Allocation”, The Journal of Machine Learning Research 2003

  9. Label based LDA • Step: • Build the LDA Model • Train the model instance by train document • Run the LDA for all the data based on trained model instance • Problem: • Punctuation marks. E.g. “”,.={}() … • Frequent words. E.g I , you…. • Other Noise

  10. Result Generate • By Angle Value = • By Distance Value =

  11. 13-Dimension Topics Art & Design Book Business Charity Entertainment Family Fashion Food & Drink Health Music News Science & Technology Sports

  12. Languages & Tools • Web UI: HTML + AJAX(Unfinished) +CSS(unfinished)+Twitter REST API • Android UI: Java, Android 2.1(unfinished) • Server Side: Java 1.6, Servlet 2.0, Spring 3.0, Hibernate 3.3 • Twitter API: Twitter4j 2.2.1 (300 request per hour) • Server: Tomcat 7.08 • Database: MySQL 5.5 • Data Package: JSON • Develop Platform: Eclipse 3.4 • Total code lines: 2000(+) + 2421 + 462 = 5000(+) • Subversion: • http://tweetool-yilei.googlecode.com/svn/trunk/tweetool-yilei-read-only

  13. Architecture Mobile Device HTML Servlets APPLICATION CONTEXT Work Flow Work Flow Work Flow Hibernate DAO Twitter fetch DB LLDA Tweetool

  14. Distributed Crawler & Computing

  15. Problems(endless T_T) • High noise in topic model • Few words, Odd marks, Abbreviation • Unfamiliar with Twitter API, A lot of bugs • Transaction Problems • The Ugly UI • Poor performance • Don’t have enough time. Many functions are unfinished • Tweetool system should be reconstructed !!! • Environment: 7000+Users 22,0000+Tweets

  16. Future Work • Try to finish it • Debug • Build a better train file • Add feedback function • Better topics classification

  17. Web UI (Design Version)

  18. Android UI Title Titile News News News Function Button Function Button Function Button Function Button Main Menu News Menu

More Related