Towards Passive Political Opinion Polling using Twitter

Towards Passive Political Opinion Polling using Twitter Nicholas Thapen Imperial College London Moustafa Ghanem Middlesex University BCS SGAI Social Medial Analytics Workshop Cambridge, UK 10th Nov 2013

Introduction • Hundreds of millions of tweets sent daily • Unlike other social media, tweets are mainly for public record • 66% of U.S social media users engage in some political behaviour (Pew Research poll 2013) • Twitter used extensively for political discussion • Most politicians have Twitter accounts and use them to communicate their view point

Motivation 1: Towards Passive Opinion Polling • Opinion polling is expensive, takes a while to set up, and is prone to framing effects • Can we use Twitter data to infer • voting intention and/or • public concern about particular issues • Can we use it as a useful adjunct to polls that potentially avoids framing effects? What are the challenges?

Motivation II: (Side Effect) What are politicians discussing • What are the dynamics of the political discussions on Twitter • What are politicians discussing, are they in line with what the public see as important issues? • What are the stand points of different politicians on key issues? Are they in line with their party line? • What are politicians from one party saying about other parties? • These are all interesting questions that we initially did not have in mind, but came as a side effect of collecting and analysing our data

Related Work • Quite a few studies on predicting elections using Twitter • Mostly measuring volume of tweets mentioning parties or candidates and extrapolating to vote shares • Some use sentiment and sentiment-weighted tweet volumes • Varying accuracies reported in the literature (1.65% - 17% )!! • Different assumptions on collecting data with most analysis being retrospective (potentially biased??) • Lack of methodologies on how we should collect and analyse Twitter data and evaluate results • Not many studies looking at • Validating methodology on regular basis • Conducting other forms of opinion polling with Twitter, especially on inferring top issues that would influence voting behaviour

Personal Experience prior to this work • Near Real-time analysis of 1st ever Televised Egyptian Presidential Debate • Collecting and Analysing Tweets on the day of the 1st Stage of Egyptian Presidential Elections in 2011

Egyptian Televised Debate • Volume of Tweets not necessarily good measure of popularity, Moussa (Green) did a mistake (Called Iran an Arab Country) which caused a flurry of sarcastic comments

Topic Analysis

Debate Sentiment • Could only infer that people were becoming more opinionated about both candidates • (with caveats on how sentiment is calculated)

Real-time analysis of 1st ever Televised Egyptian Presidential Debate • Purple Candidate (H. Sabahi) gaining increasing mentions as the debate progresses.

1st Stage Egyptian Election • Looks right for most candidates, but massive errors for two of them!!

1st Stage Analysis from Jaddaleyahttp://www.jadaliyya.com/pages/index/5716/egypts-presidential-elections-and-twitter-talk

Seems like Voodoo to me! • How do we collect data? • How much data? • What are we measuring? • How are we measuring? • How do we validate one set of results? • How do we validate methodology? • May disappoint you by not providing concrete answers to the above, but let’s try to understand some of the challenges.

ApproachUK Politics • Establish ground truth from traditional polls for specific (historical?) period • Voting intentions • Key political issues public concerned about • Collect and analyse sample Twitter data for same period • Infer Voting Intention using different approaches • Infer Key issues discussed using different approaches • Compare inferred results with ground truth • Understand sources of errors • understand challenges better

Data – Polls (Ground Truth) • Collected list of voting intention polls compiled by U.K Political Report website (all polls June 2012 to June 2013) • Also accessed Ipsos MORI Issues Index archive. This is a monthly survey conducted since 1974 asking unprompted question about issues of concern to the public

Ipsos MORI Categories • To keep our analysis tractable, we consolidated the most frequent issues appearing in the poll data in the past year into 14 main categories, as well as an ‘Other’ category intended to catch all other issues. The categories chosen are: • Crime, Economy, Education, Environment, EU, Foreign Affairs, Government Services, Health, Housing, Immigration, Pensions, Politics, Poverty, Unemployment

Challenges in Collecting Twitter Data (Real-time vs. Historical Data) • Ideally want historical data • Twitter streaming API allows collecting data based on queries (forward collection from when you start collecting), can put limits on number of tweets collected but usually not a problem • Difficult to collect historical Twitter data because of Twitter API limitations • Not possible to query data older than 3 days • Can retrieve older data by tweet ID only • Can retrieve only up to 3200 tweets from a known user timeline • Otherwise have to use a commercial historical data archive, which is not cheap!

Challenges in Collecting Twitter Data (Representative Sample?) • How do we collect representative Twitter sample? • Ideally, should query by UK geo-location • Would allow considering geographic distribution of voting intention and key issues • However • Not too many people tag their location and may be biased • Can generate too much data • Not sure if they are voters (or even interested in politics) or if they are tourists, ….

Data - Twitter • Two data set collected usingTwitter’s REST API v1.1 • Data Set 1: MP Data set • All available tweets sent by 423 U.K M.Ps as of June 2013 (689,637 in total) – list of M.P accounts available at www.tweetminster.co.uk • Data set mainly used to analyse political language on Twitter, build and evaluate topic classifiers and evaluate sentiment analysis • Data Set 2: • All available tweets from a sample of 600 users who had mentioned names of party leaders (filtered M.Ps, news, …) • This sample was adopted as one representing political discussion on Twitter (1,431,348 in total) collected in August 2013

Other Issues/Challenges • How do we infer topic mappings for Ipsos MORI categories • Semi-supervised Iterative Keyword Classifier • Bayesian topic classifer • How do we infer voting intention? • Tweet volume mentioning each party • Sentiment-weighted tweet volume (using open access sentiment tool – Sentistrength)

Evaluating Sentiment Analysis(Sentistrength) • Using the MP dataset, we extracted tweets where political parties are mentioned (name, nick names, shortenings, etc) • Excluded tweets mentioning more than one party producing a data set with 48,140 tweets (Labour: 23,070; Conservative: 18,034; Liberal Democrats: 7,036). • Applied Sentistength (mainly based on keyword lookups)

Evaluation using annotated tweets • 30 tweets were selected at random from each of the nine groups and manually annotated as ‘Positive ’or ‘Negative’ based on the language used and the sense meant. • Results are far from perfect.

Sentiment Analysis amongst M.Ps • Despite the inaccuracies (Low precision and recall), when Looking at how MPs from each party mention other parties, the results still make sense and could be usable.

Voting IntentionTweet Volume • First experiment looked at tweet volumes mentioning each political party in July 2013 • Proportion of tweets mentioning each party is compared to the polling average for that month

Voting Intention(Sentiment Weighted)

Voting Intention Over time • Applying same methodology over time reveals huge variability • Due to data collection methodology have far fewer tweets from months prior to July 2013

Topic Detection in Political TweetsSnowballing Keyword Classifier • Initial keywords for each of 14 topics manually compiled consulting Wikipedia (Health: {Health, NHS, Doctors, Nurses, Hospitals, …}) • Initial keyword matching found topic labels for around 1/6 of the M.P tweet dataset • Snowballing • In order to extend the keyword set we looked for 1,2,3-grams that were frequent inside each topic class but rare outside it • Fisher’s Test was used to calculate a score and sort list and then manually scanned for appropriate keywords • Over 5 iterations increased tweets categorized from 16% to 30%

Iterative extension of keyword set

Iterative extension - detail • Suppose we are determining whether ‘NHS’ is a good keyword for the ‘Health’ topic • We build up a contingency table of the frequency of ‘NHS’ occurring vs other words in the tweets we have tagged as ‘Health’ vs the rest • Fisher’s test reveals how unlikely this contingency table is to appear by chance • The more unlikely, the more significant the difference between frequency inside and outside the ‘Health’ topic

Effect on individual categories

Evaluation against manually annotated sample distribution of 300 tweets

Bayesian Topic Detection • Bag of words Bayesian classifier for each topic • Training data = output classes from keyword matching stage • Evaluated against a manually annotated sample of 300 tweets and found 78% F1 measure

Bayesian Topic Detection 2 • Basic Bayesian equations: • In our case the features are word counts in the tweets. Priors calculated by sampling and annotating 300 tweets.

Precision, Recall, F-Measure against manually annotated 300 tweets

Evaluation of Topic Distribution

Comparison with Polls

What different parties are saying

Comparison over time • 30 day smoothing was applied, and both time series were mean-centred and divided by their standard deviation (for easier comparison).

Comparison over time for different topics

Not always good match

Further work • Naïve Bayesian sentiment analysis developed by using partisan tweets as training data on positive and negative political terms • When tested initial implementation against SentiStrength over 300 annotated tweets, this improved F-measure from 61% to 68%

Summary and Conclusions • Overall, promising results from comparisons of Twitter data with opinion polling on voting intention and topics of concern • A lot more to do! Issues of sampling, demographic weighting and finding appropriate analysis techniques

Summary and Conclusions • How do we collect data? • How much data? • What are we measuring? • How are we measuring? • How do we validate one set of results? • How do we validate methodology?

Towards Passive Political Opinion Polling using Twitter