1 / 27

TweetCount : Urban Insights by Counting Tweets

Explore how Twitter data can classify urban land use profiles and predict tweet counts, geolocation, and location indicative words for enhanced insights.

sethcanales
Download Presentation

TweetCount : Urban Insights by Counting Tweets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TweetCount: Urban Insights by Counting Tweets Krumm, J., Kun, A.L. and Varsanyi, P., 2017, September.  In Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing (pp. 403-411). MasoudFatemi fatemi@cs.uef.fi February 2019

  2. Twitter • An online news and social networking service. • There are 974 million existing Twitter accounts (www.cbsnews.com). • 500 million Tweets/day(internlivestat.com). • Approximately 4 million tweets spanning 4 months in mid-2015 https://twitter.com

  3. Question(1) • What tweet counts can tell us about the land use profile, and how to classify land regions into different profile by using tweet counts? NewYork City 17 × 23 = 391 cells 1km × 1km

  4. Question(2) • How tweet counts change over time in different land use profiles and how these changes are related between different profiles? 72 different features

  5. Land Use Classification with Tweet Counts Using a dataset of residence and business locations from Bing Maps Blacks: R0-B0, very few # Gray: R2-B2, medium # Red: R3-B3, most #

  6. Land Use Classification with Tweet Counts • Tweet counts as classification features for the seven profiles. • Seven one-vs-all binary classifier (FastRank Decision Tree).

  7. Interactions Between Land Use Profiles (1) Which profiles are best at predicting tweet counts among the seven profiles? RMS: Root Mean Square 

  8. Conclusion • Anonymous geotagged tweets can reveal interesting insights about the land use profiles. • Tweet count features can be used to classify an urban area into different regions. • Land use regions are best predicted by past values of their own tweet counts and certain profiles are more predictive than others.

  9. Geolocation Prediction in Social Media Data by Finding Location Indicative Words Han, B., Cook, P. and Baldwin, T., 2012.  Proceedings of COLING 2012, pp.1045-1062. MasoudFatemi fatemi@cs.uef.fi February 2019

  10. Overview Focus on user-level geolocation and estimating user’s location at city level. Automatically identify “location indicative words” (LIWs). What empirical properties do we observe in LIWs, and what feature selection methods best capture those properties? Can we boost the accuracy of geolocation prediction through targeted identification of LIWs?

  11. Geolocation Task Scope and Formulation (1) Why geolocation prediction is important? Geolocation prediction using IP-based methods. Prediction using text content. We approach geolocation as a text classification task.

  12. Geolocation Task Scope and Formulation (2) Five key components to a geolocation prediction system: Representation: Earth Grid vs. City Model: Generative vs. Discriminative. Data: Two Geo-Tagged Datasets (NA and WORLD). Features: All Unigrams vs. Location Indicative Words Evaluation Metrics: Acc, Acc@161, Mean & Median Error

  13. Finding Location Indictive Words (1) Three Different Classes:  1. local words (1-local), used in a single city:    yinz, dippy, hoagie 2. semi-local words (n-local) used in subset of cities: ferry, Chinatown, tram 3.common words ( common ):  twitter, iphone, today

  14. Finding Location Indictive Words (2) Decoupling City Frequency and Word Frequency: TF×ICF

  15. Finding Location Indictive Words (3)  2. Information Gain Ratio (IGR).​  3. Maximum Entropy-based Feature Weights (ME)

  16. Experiments & Analysis (1) Comparison of Feature Selection Methods NA: 214k features WORLD: 96k features

  17. Experiments & Analysis (2) Improved Accuracy with Location Indicative Words

  18. Experiments & Analysis (3) Comparison with Benchmarks

  19. Experiments & Analysis (4) The Confidence of Geolocation Prediction

  20. Conclusion • We have investigated various methods for applying feature selection to identify LIWs for the task of text-based geolocation. • Using LIWs leads to an improvement over using a full feature set for a variety of evaluation metrics. • Using LIWs outperforms the previous state-of-the-art on a standardised dataset, and is much faster.

  21. Thank you MasoudFatemi fatemi@cs.uef.fi February 2019

  22. Interactions Between Land Use Profiles (2)

  23. Geolocation Task Scope and Formulation • Representation: Earth Grid vs. City • - points, clustered based on grids and population centers. •        - publicly-available Geoname dataset as the basis. •        - 3,709 cities throughout the world. • Model: Generative vs. Discriminative. •       - generative multinomial naive Bayes

  24. Geolocation Task Scope and Formulation • Data: two geo-tagged datasets the regional North America geolocation dataset (NA). A novel dataset that covers the entire globe ( WORLD).

  25. Geolocation Task Scope and Formulation • Data: two geo-tagged datasets

  26. Geolocation Task Scope and Formulation • Features: All unigrams vs. Location Indicative Words • Evaluation Metrics: •       - Accuracy •       - Accuracy@161 (100 miles) •       - Mean and Median Error

More Related