1 / 46

Discovering Geographical Topics In The Twitter Stream

Discovering Geographical Topics In The Twitter Stream. PRESENTED BY TEAM-9. Liu,Zhi Karthik Kumar Rangineni. Discovering Geographical Topics In The Twitter Stream. Content. Introduction Related Work Model Experiment Conclusion.

china
Download Presentation

Discovering Geographical Topics In The Twitter Stream

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Geographical Topics In The Twitter Stream PRESENTED BY TEAM-9 Liu,Zhi Karthik Kumar Rangineni

  2. Discovering Geographical Topics In The Twitter Stream Content • Introduction • Related Work • Model • Experiment • Conclusion University of North Texas, Computer Science & Engineering

  3. Discovering Geographical Topics In The Twitter Stream Introduction Widely spread usage of Micro-blogging services • Twitter was used extensively in a number of events and emergencies, • ranging from elections, earthquakes and tsunamis to Events specification. • Recently, Twitter, and other online social networking services such as • Foursquare, Facebook and Yelp, have started supporting location • services in their messages. University of North Texas, Computer Science & Engineering

  4. Discovering Geographical Topics In The Twitter Stream Introduction It is done in two ways : 1. Explicit Method 2. Implicit Method • The above functionalities allows researchers to address an exciting • set of questions: • 1) How is information created and shared across geographical locations, • 2) How do spatial and linguistic characteristics of people vary across regions, • 3) How to model human mobility. University of North Texas, Computer Science & Engineering

  5. Discovering Geographical Topics In The Twitter Stream Introduction • Drawbacks of previous methods: • Complicated • Over simplified • This is a challenge task to discover topics and identify user’s interests from • these geo-tagged messages due to the sheer amount of data and diversity • of language variations used on these location sharing services. • Usage of Twitter in paper • Presenting an Algorithm by modeling diversity in tweets considering • Topical diversity • Geographical diversity, and • Interest distribution of the user. University of North Texas, Computer Science & Engineering

  6. Discovering Geographical Topics In The Twitter Stream Introduction • The Author customized the model to be sufficiently sparse to allow for • a large scale in terms of users and locations. • The analysis of data still poses a considerable challenge due to its size and • due to the integration of a range of different attributes. To my knowledge • this is the first paper to address both scale, location and language modeling • in an integrated fashion. • Furthermore, they designed an accurate and scalable inference algorithm. • The algorithm allows them to discover language patterns and to extract • user’s interests from geo-tagged messages. University of North Texas, Computer Science & Engineering

  7. Discovering Geographical Topics In The Twitter Stream Introduction • The Algorithm allows us to discover language patterns and to extract • user’s interests from geo-tagged messages. • Discovery of language patterns and user’s interests. • In addition, factors that influence the language used in a tweet with a • particular location. Example : Tweet in a particular region University of North Texas, Computer Science & Engineering

  8. Discovering Geographical Topics In The Twitter Stream Introduction • The choice of words is clearly influenced by the topic of the tweet. • Location specific language will cause the same event to be reported quite • differently in different locations • Different geographical regions have different language variations • and topics have different chances of being discussed in these regions. University of North Texas, Computer Science & Engineering

  9. Discovering Geographical Topics In The Twitter Stream Prior Work • Prior work falls into two groups: • Some work only on the models of certain aspects of the problem described and ignoring the remainder. • However, no regional language models are learned and user preferences are also not taken into account. • Thus, models developed for such data are usually limited and cannot easily be applied to content rich social media. University of North Texas, Computer Science & Engineering

  10. Discovering Geographical Topics In The Twitter Stream • At the other end they found rather complex • models, however, without the ability to scale to industrial size. • For instance proposed a model to predict locations of users in Twitter. • Then this model has a global topic matrix and each region has different • variation of this matrix. However, the inference algorithm is complex. • Furthermore, the problem of over-parametrization makes it nontrivial • to perform inference accurately. • Furthermore, previous models ignore user preferences. University of North Texas, Computer Science & Engineering

  11. Discovering Geographical Topics In The Twitter Stream Author Contribution • Author proposed a model that is : • User preference Model • Flexible enough to embed all reasonable components of content and • geographical locations, • Handling real-world datasets consists millions of documents and users. • Usage of • statistical topic models and • sparse coding techniques • Used for uncovering different language patterns and common interests shared • across the world. University of North Texas, Computer Science & Engineering

  12. Discovering Geographical Topics In The Twitter Stream Related work • There are two ways of related research work The first is a range of papers which use geographical language modeling in general. 2. The second is a set of works which are specifically tuned for Twitter data. • Interest of author in choosing model and interest that combine • Geographical modeling and • Language modeling • to discover topics from geographical regions. University of North Texas, Computer Science & Engineering

  13. Discovering Geographical Topics In The Twitter Stream Some of related works by other representatives are : • Mei proposed a model based on Probabilistic Latent Semantic Indexing (PLSA). • It assumes that each word is either drawn from a universal background topic or • from a location and time dependent language model. • Later, Wang introduce a fully Bayesian generative model to incorporate locations. • No usage of real latitudes and longitudes, • Having Fixed number of regional Labels • Assumption of each term is associated with a location label. • Sizov proposed a similar model to Wang . Rather than using a multinomial • Distribution to generate locations they replace it with two Gaussian • distributions for generating latitude and longitude respectively. • Drawback: Usage of Flickr restricted to the greater London area. University of North Texas, Computer Science & Engineering

  14. Discovering Geographical Topics In The Twitter Stream • Hao proposed a model built upon Wang. However, they introduce the notion • of global topics and local topics where more general terms are • grouped into global topics and terms related to local events going to local • topics. The inference is performed by Gibbs Sampling. Hao evaluated their • model based on anecdotal results and some heuristic measurements. Although there exists such attempts of modeling language patterns and geographical locations, most prior work does not consider users at all. University of North Texas, Computer Science & Engineering

  15. Discovering Geographical Topics In The Twitter Stream Model • Notation University of North Texas, Computer Science & Engineering

  16. Discovering Geographical Topics In The Twitter Stream Model • Mixture of conponents University of North Texas, Computer Science & Engineering

  17. Discovering Geographical Topics In The Twitter Stream Model • Three parts of each tweet University of North Texas, Computer Science & Engineering

  18. Discovering Geographical Topics In The Twitter Stream Model • Intuitions: • Words used in a tweet depend on both the location and topic of the tweet. • Different geographical regions have different language variations. Topics have different chances to be discussed in different regions (e.g. bullfights in India are unlikely to occur; likewise Spaniards are unlikely to discuss Divali). • Users tend to appear in a handful geographical locations. University of North Texas, Computer Science & Engineering

  19. Discovering Geographical Topics In The Twitter Stream Model • Draw a latent region index University of North Texas, Computer Science & Engineering

  20. Discovering Geographical Topics In The Twitter Stream Model • Draw a topic index University of North Texas, Computer Science & Engineering

  21. Discovering Geographical Topics In The Twitter Stream Model • Draw a location University of North Texas, Computer Science & Engineering

  22. Discovering Geographical Topics In The Twitter Stream Model • For each token w in wd draw University of North Texas, Computer Science & Engineering

  23. Discovering Geographical Topics In The Twitter Stream Model • A graphical representation of the model University of North Texas, Computer Science & Engineering

  24. Discovering Geographical Topics In The Twitter Stream Model • Sparse Modeling • Location independent distribution + Prevalent in a given location University of North Texas, Computer Science & Engineering

  25. Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm • EM step: • E: Iteratively draw latent region assignments and topic assignments for all tweets • M: Maximize the log likelihood of the model with respect to model parameters by fixing all region and topic assignments obtained in the E-step University of North Texas, Computer Science & Engineering

  26. Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm • For each tweet, a latent region r is firstly drawn from the following distribution, conditioned on the old topic assignments: University of North Texas, Computer Science & Engineering

  27. Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm • Sampling the topic assignment z for the same tweet, conditioned on the newly sampled r: University of North Texas, Computer Science & Engineering

  28. Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm • How to get the value of parameters University of North Texas, Computer Science & Engineering

  29. Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm University of North Texas, Computer Science & Engineering

  30. Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm University of North Texas, Computer Science & Engineering

  31. Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm University of North Texas, Computer Science & Engineering

  32. Discovering Geographical Topics In The Twitter Stream Model • Geographical Location Modeling • One region • Multiple region University of North Texas, Computer Science & Engineering

  33. Discovering Geographical Topics In The Twitter Stream Model • Geographical Location Modeling • Bayesian treatment University of North Texas, Computer Science & Engineering

  34. Discovering Geographical Topics In The Twitter Stream Model • Implementation Notes University of North Texas, Computer Science & Engineering

  35. Discovering Geographical Topics In The Twitter Stream Model • Implementation Notes University of North Texas, Computer Science & Engineering

  36. Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction • Baseline • Topics • Topics + Region • Full Model University of North Texas, Computer Science & Engineering

  37. Discovering Geographical Topics In The Twitter Stream Experiments • Evaluation Metric: University of North Texas, Computer Science & Engineering

  38. Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering

  39. Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering

  40. Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering

  41. Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering

  42. Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering

  43. Discovering Geographical Topics In The Twitter Stream Conclusions • In this paper the problem of modeling geographical topical patterns on Twitter by introducing a novel sparse generative model, which utilizes both statistical topic models and sparse coding techniques to provide a principled method for uncovering different language patterns and common interests shared across the world. • This approach is vital for applications such as behavior targeting, user profiling, content recommendation and topic tracking and the method can be easily extended in a number of ways University of North Texas, Computer Science & Engineering

  44. Discovering Geographical Topics In The Twitter Stream Contributes • An additive generative model of content and locations that incorporates multiple facets of micro-blogging environments in an integral fashion. • Sparse coding techniques and Bayesian treatments are smoothly embedded in this modeling, resulting in an efficient and effective implementation. • This model outperforms several state-of-the-art algorithms in the task of location predictions and it demonstrates interesting patterns in real-world datasets. University of North Texas, Computer Science & Engineering

  45. Discovering Geographical Topics In The Twitter Stream Reference: • A. Ahmed, E. P. Xing, W. W. Cohen, and R. F. Murphy. Structured correspondence topic models for mining captioned figures in biological literature. In Proceedings of KDD 2009, pages 39–48, New York, NY, USA. ACM. • A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2:183–202, March 2009. • C. Chemudugunta, P. Smyth, and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In NIPS 2006, pages 241–248. • Z. Cheng, J. Caverlee, K. Lee, and D. Sui. Exploring millions of footprints in location sharing services. In ICWSM 2011. • E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In Proceedings of KDD 2011, pages 1082–1090, New York, NY, USA. ACM. • J. Eisenstein, A. Ahmed, and E. Xing. Sparse additive generative models of text. In Proceedings of ICML 2011, pages 1041–1048, New York, NY, USA, June. ACM. • J. Eisenstein, B. O’Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In Proceedings of EMNLP 2010, pages 1277–1287, Stroudsburg, PA, USA. Association for Computational Linguistics. University of North Texas, Computer Science & Engineering

  46. Discovering Geographical Topics In The Twitter Stream Thanks! University of North Texas, Computer Science & Engineering

More Related