1 / 49

Twitter Based Research

Twitter Based Research. Benny Bornfeld Mentors Professor Sheizaf Rafaeli Dr. Daphne Raban. Where research meets Bigbird. Research. Twitter. My Research & Tools. Big Data. Twitter. Research. Big Data. About Twitter. Facts Established in 2006 ~140 million active users

gloriacruz
Download Presentation

Twitter Based Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Twitter Based Research Benny Bornfeld Mentors Professor SheizafRafaeli Dr. Daphne Raban

  2. Where research meets Bigbird Research Twitter My Research & Tools Big Data

  3. Twitter Research Big Data

  4. About Twitter • Facts • Established in 2006 • ~140 million active users • ~340 million messages per day • Superlatives • “the stream of the world’s collective consciousness” • “the first rough draft of history”

  5. How does it work? Followers

  6. Retweet ReTweet ReTweet Tweet ReTweet ReTweet Tweet Tweet ReTweet ReTweet

  7. Reply

  8. Twitter is used for many different purposes

  9. Power Law distribution

  10. Research Twitter Research Big Data

  11. What is Twitter? Social network! Social Network? Mass Media?

  12. Replace surveys?

  13. Twitter based predictions I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper”

  14. Twitter as a social learning platform

  15. Influence What’s the influence of twitter on society? Why the revolution will not be tweeted? Malcolm Gladwell VS Technological determinism Clay Shirky

  16. Influence in Twitter • How do we measure influence? • Number of followers? • Centrality? • Creating action/reaction? • Viral spreading?

  17. The Message vs the Carrier approaches

  18. Twitter Research Big Data

  19. Online social networks research fields

  20. Big Data in SN Research • Pros: • Exploratory research (vs confirmatory research) • Avoid the sampling reliability issue (power law) • Collect what people are actually saying • Non intrusive • Allow analysis of many dimensions • Catch irregular events

  21. Big Data in SN Research • Cons: • Lots of noise • It is sometimes hard to map the data to your research question • Cost of collecting the data • Lack of tools/knowledge on how to store and analyze the data • May come on the expense of theory

  22. Where Research meets Bigbird Research Twitter My Research & Tools Big Data

  23. Influence the capacity or power of persons or things to be a compelling force on or produce effects on the actions, behavior, opinions, etc., of others

  24. InfluenceIn online social networks Tweet ReTweet Sentiment Valence

  25. The research question • Which is more viral? Which is more likely to spread in a social network (Twitter) ? Messages of negative or positive sentiment valence

  26. The Data • Collected ~2 million tweets about new movies • Why movies: • People have opinions about movies • People share their opinions about movies • Can compare to other researches (benchmarks)

  27. Collecting the Tweets • Twitter provides an API for collecting tweets • Up to mid 2010, full data streams were available for free, currently, the rate is very limited (~150/hour) • Full data streams (fire hose) are available via a company called GNIP

  28. Tweets Collecting architecture My App My App HTTP Streaming JSON Collect App JSON parser RULESFILTER Files DB PowerTrack Architecture

  29. Data Fields User Data: Message Data: #followers #following #number of tweets klout tweet rate creation date language name description location sender content type (original/RT) post time Device computedfields # of RT Total Exposure Sentiment

  30. Reading Tasks • Handle partial messages • Handle broken messages • Handle duplicate messages • Handle special characters

  31. Clean the data • Non related messages [build your dream house] • Spammers • Gibberish messages • Normalize the data (e.g. Tweets/Time)

  32. Tools for data analysis • Sorting • Filtering • Counting • Histograms • Sentiment analysis

  33. Tweets view

  34. Users view

  35. Classifying users

  36. Classifying users with cluster analysis

  37. Sentiment Analysis • Classify each message to positive/neutral/negative • Classification methods • Manual (~10 sec tweet) • Automatic

  38. Sentiment Analysis : Some challenging Tweets examples • Just saw #Footloose with my sisters. The movie fab, and I even spotted my karaoke machine! Did you dolls catch it? • Paranormal Activity 3 seems almost as scary as a level 9 magikarp • My kids want to see Jack and Jill. Its making it hard to love them.

  39. Automatic classifications

  40. Naïve Bayes classifier Machine learning – supervised learning + + + + + + + + + + + POS + + + - + + + + + + + + + NEG + + + + + + + NEG + + + + + + POS + + + + + + + + + + NEU POS NEU

  41. Naïve Bayes classifier Machine learning – supervised learning + + + + + + + + + + + Training POS + + + + + + + + + + + NEG + + + + + + + NEG + + + + + + POS + + + + + + + + + + NEU POS Testing NEU

  42. Naïve Bayes classifier NGRAM = 2 + + + + + + + + + POS NEG NEG POS NEU POS NEU

  43. Manual classification The Dictionaries

  44. Test Results

  45. references • Why the revolution will not be tweeted? • Clay Shirky: How social media can make history [ted] • Looking At The World Through Twitter Data • Twitter mood predicts the stock market • Six Provocations for Big Data • Susan Blackmore on memes and "temes“ [ted]

More Related