290 likes | 316 Views
Explore how Twitter data can classify urban land use profiles and predict tweet counts, geolocation, and location indicative words for enhanced insights.
E N D
TweetCount: Urban Insights by Counting Tweets Krumm, J., Kun, A.L. and Varsanyi, P., 2017, September. In Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing (pp. 403-411). MasoudFatemi fatemi@cs.uef.fi February 2019
Twitter • An online news and social networking service. • There are 974 million existing Twitter accounts (www.cbsnews.com). • 500 million Tweets/day(internlivestat.com). • Approximately 4 million tweets spanning 4 months in mid-2015 https://twitter.com
Question(1) • What tweet counts can tell us about the land use profile, and how to classify land regions into different profile by using tweet counts? NewYork City 17 × 23 = 391 cells 1km × 1km
Question(2) • How tweet counts change over time in different land use profiles and how these changes are related between different profiles? 72 different features
Land Use Classification with Tweet Counts Using a dataset of residence and business locations from Bing Maps Blacks: R0-B0, very few # Gray: R2-B2, medium # Red: R3-B3, most #
Land Use Classification with Tweet Counts • Tweet counts as classification features for the seven profiles. • Seven one-vs-all binary classifier (FastRank Decision Tree).
Interactions Between Land Use Profiles (1) Which profiles are best at predicting tweet counts among the seven profiles? RMS: Root Mean Square
Conclusion • Anonymous geotagged tweets can reveal interesting insights about the land use profiles. • Tweet count features can be used to classify an urban area into different regions. • Land use regions are best predicted by past values of their own tweet counts and certain profiles are more predictive than others.
Geolocation Prediction in Social Media Data by Finding Location Indicative Words Han, B., Cook, P. and Baldwin, T., 2012. Proceedings of COLING 2012, pp.1045-1062. MasoudFatemi fatemi@cs.uef.fi February 2019
Overview Focus on user-level geolocation and estimating user’s location at city level. Automatically identify “location indicative words” (LIWs). What empirical properties do we observe in LIWs, and what feature selection methods best capture those properties? Can we boost the accuracy of geolocation prediction through targeted identification of LIWs?
Geolocation Task Scope and Formulation (1) Why geolocation prediction is important? Geolocation prediction using IP-based methods. Prediction using text content. We approach geolocation as a text classification task.
Geolocation Task Scope and Formulation (2) Five key components to a geolocation prediction system: Representation: Earth Grid vs. City Model: Generative vs. Discriminative. Data: Two Geo-Tagged Datasets (NA and WORLD). Features: All Unigrams vs. Location Indicative Words Evaluation Metrics: Acc, Acc@161, Mean & Median Error
Finding Location Indictive Words (1) Three Different Classes: 1. local words (1-local), used in a single city: yinz, dippy, hoagie 2. semi-local words (n-local) used in subset of cities: ferry, Chinatown, tram 3.common words ( common ): twitter, iphone, today
Finding Location Indictive Words (2) Decoupling City Frequency and Word Frequency: TF×ICF
Finding Location Indictive Words (3) 2. Information Gain Ratio (IGR). 3. Maximum Entropy-based Feature Weights (ME)
Experiments & Analysis (1) Comparison of Feature Selection Methods NA: 214k features WORLD: 96k features
Experiments & Analysis (2) Improved Accuracy with Location Indicative Words
Experiments & Analysis (3) Comparison with Benchmarks
Experiments & Analysis (4) The Confidence of Geolocation Prediction
Conclusion • We have investigated various methods for applying feature selection to identify LIWs for the task of text-based geolocation. • Using LIWs leads to an improvement over using a full feature set for a variety of evaluation metrics. • Using LIWs outperforms the previous state-of-the-art on a standardised dataset, and is much faster.
Thank you MasoudFatemi fatemi@cs.uef.fi February 2019
Geolocation Task Scope and Formulation • Representation: Earth Grid vs. City • - points, clustered based on grids and population centers. • - publicly-available Geoname dataset as the basis. • - 3,709 cities throughout the world. • Model: Generative vs. Discriminative. • - generative multinomial naive Bayes
Geolocation Task Scope and Formulation • Data: two geo-tagged datasets the regional North America geolocation dataset (NA). A novel dataset that covers the entire globe ( WORLD).
Geolocation Task Scope and Formulation • Data: two geo-tagged datasets
Geolocation Task Scope and Formulation • Features: All unigrams vs. Location Indicative Words • Evaluation Metrics: • - Accuracy • - Accuracy@161 (100 miles) • - Mean and Median Error