1 / 24

Analysis of twitter feeds and blogs

Analysis of twitter feeds and blogs. Language and Computation Group 18 th November 2011. Communications of the ACM, October 2011. Conclusion 1. See also: danah boyd , Kate Crawford, “Six Provocations for Big Data”, September

paytah
Download Presentation

Analysis of twitter feeds and blogs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of twitter feeds and blogs Language and Computation Group 18th November 2011

  2. Communications of the ACM, October 2011

  3. Conclusion 1 See also: danahboyd, Kate Crawford, “Six Provocations for Big Data”, September 2011: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1926431

  4. Conclusion 2

  5. Conclusion 3

  6. Conclusion 4

  7. Conclusion 5

  8. For quotes see next slide…

  9. “relatively high amount of hype” • “even when the predictions were better than chance, they were not competent compared to the trivial method of predicting through incumbency.” • “We simply tried to repeat the (reportedly successful) methods that others have used in the past, and we found that the results were not repeatable.” • “Hoping that the errors in sentiment analysis ‘somehow’ cancel themselves out is not defensible.” • “Spammers and propagandists write programs that create lots of fake accounts and use them to tweet intensively, amplifying their message.” • “Predicting elections with accuracy should not be supported without some clear understanding of why it works”. • “Learn from the professional pollsters … identify likely voters and get an unbiased representative sample of them”

  10. Table on next slide:

  11. Tweets with searching-related terminology:

  12. Examines relationship between emotional reactions and public opinion • Seeks to offer insight into how public opinion is formed • Based on analysis of posts from Usenet online forum • Evaluation of emotional content is based on counting of words in ANEW – Affective Norm for English Words • Nevertheless, this still begs the question of sample bias • How typical are Usenet users of the general population?

  13. See next slide…

  14. SPARQL query language use cases: • “Give me a stream of locations where my product is being mentioned right now.” • “Give me all people that have said negative things about my product.” • “Give me all URLs that people recommend with relation to my product.” • “What competitors are being mentioned with my product.” • 511,147 tweets about iPad (June 3rd – June 8th 2010): • http://wiki.knoesis.org/index.php/Twarql

  15. Use of agent-based prediction market • Each agent extracts users sentiments from a different social medium • Reflects it beliefs by trading in the marked • Belief-Desire-Intentions paradigm • Agent will intend to do what it believes and will achieve its goals given its beliefs about the world • Avoids problems with human agents • Poor estimation at either end of probability spectrum • Agents do not manipulate the market • Do not require recruitment and incentives • Bothos et al, IEEE Intelligent Systems, November/December 2010

  16. See next slide…..

  17. Presents methodology for predicting individual retweets in Twitter • Input to the model is the tweeter, a retweeter and the content of the tweet • Output of model is the probability of a retweet of a tweet by the retweeter • Probabilistic collaborative filtering prediction models used, called Matchbox • Crawled twitter from June 10th 2010 to July 29th 2010, finding 20,000,000 retweets

  18. Top correlations on next slide.....

  19. Understanding of specific groups useful for commercial and political organisations • Four key tasks: • Discover or extract the group itself • Develop a profile from group descriptors and defining group characteristics • Understand group’s sentiment and ability to influence other individuals or groups • Study group composition • Privacy and security are important concerns

  20. One reason for low concordance is use of U for you or rly for really. Also, frequent typos, and use of Internet acronyms such as rly for really. Sentence fragments, and pronoun drops such as busy now instead of I’m busy now. END OF SLIDES

More Related