TweetCount : Urban Insights by Counting Tweets

TweetCount: Urban Insights by Counting Tweets Krumm, J., Kun, A.L. and Varsanyi, P., 2017, September. In Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing (pp. 403-411). MasoudFatemi fatemi@cs.uef.fi February 2019

Twitter • An online news and social networking service. • There are 974 million existing Twitter accounts (www.cbsnews.com). • 500 million Tweets/day(internlivestat.com). • Approximately 4 million tweets spanning 4 months in mid-2015 https://twitter.com

Question(1) • What tweet counts can tell us about the land use profile, and how to classify land regions into different profile by using tweet counts? NewYork City 17 × 23 = 391 cells 1km × 1km

Question(2) • How tweet counts change over time in different land use profiles and how these changes are related between different profiles? 72 different features

Land Use Classification with Tweet Counts Using a dataset of residence and business locations from Bing Maps Blacks: R0-B0, very few # Gray: R2-B2, medium # Red: R3-B3, most #

Land Use Classification with Tweet Counts • Tweet counts as classification features for the seven profiles. • Seven one-vs-all binary classifier (FastRank Decision Tree).

Interactions Between Land Use Profiles (1) Which profiles are best at predicting tweet counts among the seven profiles? RMS: Root Mean Square

Conclusion • Anonymous geotagged tweets can reveal interesting insights about the land use profiles. • Tweet count features can be used to classify an urban area into different regions. • Land use regions are best predicted by past values of their own tweet counts and certain profiles are more predictive than others.

Geolocation Prediction in Social Media Data by Finding Location Indicative Words Han, B., Cook, P. and Baldwin, T., 2012. Proceedings of COLING 2012, pp.1045-1062. MasoudFatemi fatemi@cs.uef.fi February 2019

Overview Focus on user-level geolocation and estimating user’s location at city level. Automatically identify “location indicative words” (LIWs). What empirical properties do we observe in LIWs, and what feature selection methods best capture those properties? Can we boost the accuracy of geolocation prediction through targeted identification of LIWs?

Geolocation Task Scope and Formulation (1) Why geolocation prediction is important? Geolocation prediction using IP-based methods. Prediction using text content. We approach geolocation as a text classification task.

Geolocation Task Scope and Formulation (2) Five key components to a geolocation prediction system: Representation: Earth Grid vs. City Model: Generative vs. Discriminative. Data: Two Geo-Tagged Datasets (NA and WORLD). Features: All Unigrams vs. Location Indicative Words Evaluation Metrics: Acc, Acc@161, Mean & Median Error

Finding Location Indictive Words (1) Three Different Classes: 1. local words (1-local), used in a single city: yinz, dippy, hoagie 2. semi-local words (n-local) used in subset of cities: ferry, Chinatown, tram 3.common words ( common ): twitter, iphone, today

Finding Location Indictive Words (2) Decoupling City Frequency and Word Frequency: TF×ICF

Finding Location Indictive Words (3) 2. Information Gain Ratio (IGR). 3. Maximum Entropy-based Feature Weights (ME)

Experiments & Analysis (1) Comparison of Feature Selection Methods NA: 214k features WORLD: 96k features

Experiments & Analysis (2) Improved Accuracy with Location Indicative Words

Experiments & Analysis (3) Comparison with Benchmarks

Experiments & Analysis (4) The Confidence of Geolocation Prediction

Conclusion • We have investigated various methods for applying feature selection to identify LIWs for the task of text-based geolocation. • Using LIWs leads to an improvement over using a full feature set for a variety of evaluation metrics. • Using LIWs outperforms the previous state-of-the-art on a standardised dataset, and is much faster.

Thank you MasoudFatemi fatemi@cs.uef.fi February 2019

Interactions Between Land Use Profiles (2)

Geolocation Task Scope and Formulation • Representation: Earth Grid vs. City • - points, clustered based on grids and population centers. • - publicly-available Geoname dataset as the basis. • - 3,709 cities throughout the world. • Model: Generative vs. Discriminative. • - generative multinomial naive Bayes

Geolocation Task Scope and Formulation • Data: two geo-tagged datasets the regional North America geolocation dataset (NA). A novel dataset that covers the entire globe ( WORLD).

Geolocation Task Scope and Formulation • Data: two geo-tagged datasets

Geolocation Task Scope and Formulation • Features: All unigrams vs. Location Indicative Words • Evaluation Metrics: • - Accuracy • - Accuracy@161 (100 miles) • - Mean and Median Error

TweetCount : Urban Insights by Counting Tweets

TweetCount : Urban Insights by Counting Tweets

Presentation Transcript

More Counting by Mapping

Skip Counting Counting by 2, 5, and 10

Skip Counting by 5’s

Subtraction by counting on

Let’s practice counting by 10.

Tweets

Counting by Tens with Dimes

Tweets

Counting By 7s By Holly Goldberg Sloan

Counting by Weighing

10.1 – Counting by Systematic Listing

Historical Tweets

TKAM Tweets

Nearest Neighbors by Neighborhood Counting

10.1 – Counting by Systematic Listing

Existential Tweets

Skip Counting Counting by 2, 5, and 10

Cell counting market insights, forecast to 2025

More Counting by Mapping

Subtraction by counting on

Urban Planning Software and Services Market Insights