textual and quantitative analysis towards a new e mediated social science n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Textual and Quantitative Analysis: Towards a new, e-mediated Social Science PowerPoint Presentation
Download Presentation
Textual and Quantitative Analysis: Towards a new, e-mediated Social Science

Loading in 2 Seconds...

play fullscreen
1 / 62

Textual and Quantitative Analysis: Towards a new, e-mediated Social Science - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

Textual and Quantitative Analysis: Towards a new, e-mediated Social Science. Khurshid Ahmad, Lee Gillam, and David Cheng Department of Computing, University of Surrey. Outline. Think Tank Rationality, Bounded Rationality and Sentiment News Analysis and Sentiment Analysis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Textual and Quantitative Analysis: Towards a new, e-mediated Social Science' - quito


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
textual and quantitative analysis towards a new e mediated social science

Textual and Quantitative Analysis: Towards a new, e-mediated Social Science

Khurshid Ahmad,

Lee Gillam, and David Cheng

Department of Computing, University of Surrey

outline
Outline
  • Think Tank
  • Rationality, Bounded Rationality and Sentiment
  • News Analysis and Sentiment Analysis
  • A method for identifying and extracting sentiment
  • Experiments and Evaluation
  • Conclusions and Future Work
slide3

THINK TANK

What is the connection between these pairs of terms:

HAPPY & SAD

MORE & LESS

NORTH & SOUTH

AHEAD & BEHIND

HIGHER & LOWER

LOUDER & QUIETER

IN PROFIT & IN LOSS

OPERATIONAL & BROKEN

MORE EXPENSIVE & LESS EXPENSIVE

AT UNIVERSITY & AWAY FROM UNIVERSITY

METRO

Thursday, June 28, 2005, pp 5.

slide4

THINK TANK

We rely on reviews and opinion polls of various kinds:

  • Film & TV reviews; Book reviews; Resort reviews
  • Bank reviews; Automobile Review; White good reviews;
  • Consumer surveys; ‘write your own’ reviews;
  • Newspaper editorials; Editors’ choice.

METRO

Thursday, June 28, 2005, pp 5.

slide5

THINK TANK

  • We rely on the sentiment of the reviewers, editors, investment experts, and ……
  • We do know the cost of durables, shares, holidays.
  • A reasonable price is rejected if the reviews are poor; an exorbitant price is acceptable if the reviews are good;
  • Bad reviews stick in the mind for longer than good reviews.

METRO

slide6

THINK TANK

  • We rely on the sentiment of the more vociferous in the society sometimes
  • The vociferous may call black white, and white black;
  • The vociferous may repudiate facts and purvey fiction.

METRO

slide7

THINK TANK

An internal war may be due to bounded rationality: given certain structural conditions – emergent anarchy, economic scarcity, weakening state structures due to globalization – elites and groups make rational decisions to pursue their aims by violent means. Within the bounded context of their decision-making parameters, going to war may be entirely rational.

Jackson, Richard (2004). ‘The Social Construction of Internal War’ In (Ed.) Richard Jackson. (Re)Constructing Cultures of Violence and Peace. Rodopi: Amsterdam/New York.

slide8

THINK TANK

  • We rely on the sentiment of safety expressed by our near and dear, and the media
  • The dears may have been mugged or burgled: the falling crime rate does not alleviate the fear of crime  reassurance gap

METRO

slide9

THINK TANK

A new bank has just been launched: Punter Smith has passed his judgement on the bank. Which of the two columns tells us that he likes the new outfit?

Turney, Peter D. (2002). “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proc of the 40th Ann. Meeting of the Ass. for Comp. Linguistics (ACL). Philadelphia, July 2002, pp. 417-424. (Available at http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf).

slide10

THINK TANK

How can a machine detect the positive/negative sentiment from texts?

We look at the collocation of words like excellent & poor in text corpus.

The point wise mutual information is computed between word1 & word2:

Semantic orientation of phrase is given as:

Turney, Peter D. (2002). “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proc of the 40th Ann. Meeting of the Ass. for Comp. Linguistics (ACL). Philadelphia, July 2002, pp. 417-424. (Available at http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf).

slide11

THINK TANK

How can a machine detect the positive/negative sentiment from texts?

We look at the collocation of words like excellent & poor in a number of texts.

slide12

THINK TANK

How can a machine detect the positive/negative sentiment from texts?

We look at the collocation of words like excellent & poor in a number of texts.

Note subjectivity: The analyst has chosen the pivotal words poor & excellent.

How well can the method be adapted to other domains?

Adaptive Information Extraction? For automatic choosing the pivots!

slide13

THINK TANK

Japanese yen/US dollar exchange rate (decreasing solid line);

US consumer price index (increasing solid line);

Japanese consumer price index (increasing dashed line),

1970:1 − 2003:5, monthly observations

Why is it that Japanese consumer price index is

following the same trend as the US CPI?

slide14

THINK TANK

The return series – the first difference values of US $/Japanese Yen exchange (Price t – Price t-1) between 1970-2003, monthly data

slide15

High Volatility Clusters

THINK TANK

The volatility series – the four-week moving average of the square of the changes in the values of US $/Japanese Yen exchange (Price t – Price t-1) between 1970-2003.

slide16

THINK TANK

  • Robert Engle’s contribution:Volatility may vary considerably over time: large (small) changes in returns are followed by large (small) changes.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates

of the variance of United Kingdom inflation. Econometrica Vol 50, pp 987—1007.

slide17

THINK TANK

Engle and Ng have developed the concept of the news impact curve.

  • To condition at time t on the information available at t − 2 and thus consider the effect of the shock ε t−1 on the conditional variance htin isolation.
  • The conditional variance is affected by the latest information, “the news” ε t−1:
    • The symmetric case: Both positive and negative news has the same effect.
    • The assymetric case: a positive and an equally large negative piece of “news” do not have the same effect on the conditional variance.

Engle, R. F. and Ng, V. K (1993). Measuring and testing the impact of news on volatility, Journal of Finance Vol. 48, pp 1749—1777.

slide18

THINK TANK

Asymmetric case

Symmetric case

Engle, R. F. and Ng, V. K (1993). Measuring and testing the impact of news on volatility, Journal of Finance Vol. 48, pp 1749—1777.

rationality bounded rationality and sentiment
Rationality, Bounded Rationality and Sentiment
  • News Effects
    • I: News Announcements Matter, and Quickly;
    • II: Announcement Timing Matters
    • III: Volatility Adjusts to News Gradually
    • IV: Pure Announcement Effects are Present in Volatility
    • V: Announcement Effects are Asymmetric – Responses Vary with the Sign of the News;
    • VI: The effect on traded volume persists longer than on prices.

Andersen, T. G., Bollerslev, T., Diebold, F X., & Vega, C. (2002). Micro effects of macro announcements: Real time price discovery in foreign exchange. National Bureau of Economic Research Working Paper 8959, http://www.nber.org/papers/w8959

rationality bounded rationality and sentiment1
Rationality, Bounded Rationality and Sentiment

The following statements based entirely on statistical analysis of quantitative data:

  • Bad news in “good times” should have an unusually large impact
  • In a purely ‘good times’ sample “bad news should have unusually large effects,”
rationality bounded rationality and sentiment2
Rationality, Bounded Rationality and Sentiment
  • On average, the effect of macroeconomic news often varies with its sign. In particular, negative surprises often have greater impact than positive surprises.

Andersen, T. G., Bollerslev, T., Diebold, F X., & Vega, C. (2002). Micro effects of macro announcements: Real time price discovery in foreign exchange. National Bureau of Economic Research Working Paper 8959, http://www.nber.org/papers/w8959

rationality bounded rationality and sentiment3
Rationality, Bounded Rationality and Sentiment
  • So, where is the news? It is not the news but the timing of the announcement  the timings are used as an information proxy.

Andersen, T. G., Bollerslev, T., Diebold, F X., & Vega, C. (2002). Micro effects of macro announcements: Real time price discovery in foreign exchange. National Bureau of Economic Research Working Paper 8959, http://www.nber.org/papers/w8959

rationality bounded rationality and sentiment4
Rationality, Bounded Rationality and Sentiment
  • Firm-level Information Proxies:
    • Closed-end fund discount (CEFD);
    • Turnover ratio (in NYSE for example) (TURN)
    • Number of Initial Public Offerings (N-IPO);
    • Average First Day Returns on R-IPO
    • Equity share S
    • Dividend Premium
    • Age of the firm, external finance, ‘size’(log(equity))…….
  • Each sentiment proxy is likely to include a sentiment component and as well as idiosyncratic or non-sentiment-related components. Principal components analysis is typically used to isolate the common component.
  • A novel composite index built using Factor Analysis:
    • Sentiment = -0.358CEFDt+0.402TURNt-1+0.414NIPOt

+0.464RIPOt+0.371 St-0.431Pt-1

Baker, M., and Wurgler, J. (2004). "Investor Sentiment and the Cross-Section of Stock Returns," NBER Working Papers 10449, Cambridge, Mass National Bureau of Economic Research, Inc.

rationality bounded rationality and sentiment5
Rationality, Bounded Rationality and Sentiment
  • So, where is the news and financial data? There is plenty of it but in a noisy state.
  • Today’s news and figure may contradict yesterdays or, worse still, reinforce false hopes and prejudices.
  • The financial news and data are truly organic data – not manufactured in a laboratory
the surrey society grids project
The Surrey Society Grids Project

A 24-node data and compute cluster (64 cpus) interfaced to a ‘real world’ data stream (Reuters News and Financial Time series Feed) for capturing, analysing and fusing quantitative and ‘qualitative’ data.

A small but well-formed grid – for creating a data nursery

Reuters Feed: 2 dedicated data lines, PC and Sun for feed management and associated networking

surrey society grid s architecture

Text and Time Series Service

Streaming

Textual Data

Distribute Tasks

1

2

Send Service

Request

Streaming

Numeric Data

Notify user

about results

Main Cluster

Receive Results

4

3

GRID Cluster

24 Slaves

Surrey Grid

Surrey Society Grids Architecture
  • Given an allocated task, the corresponding data is retrieved from the data providers by the slave machines.
  • The main cluster monitors the slave machines until they have completed their tasks, and subsequently combines the interim results.
  • The final result is sent back to the client machine.
surrey society grids streaming data
Surrey Society Grids: Streaming Data

STREAMING ECONOMIC/POLITICAL NEWS-

Reuters; Yahoo; Bloomberg, BBC! Al Jazeera

surrey society grid performance
Surrey Society Grid: Performance
  • Increasing the throughput
    • We have created a 24 node grid infrastructure, which can provide access to upto 64 processors simultaneously
    • Processing the (complete) RCV1 corpus: 181 million words in 806,791 texts
surrey society grid performance1
Surrey Society Grid: Performance

Automatic extraction and annotation of sentiment bearing words in a 1,000,000 word text corpus –four days output from Reuters news feed – using automatically extracted key words and an automatically extracted local grammar for pattern identification.

surrey society grid algorithms and methods
Surrey Society Grid: Algorithms and Methods
  • We have developed a for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.
surrey society grid algorithms and methods1
Surrey Society Grid: Algorithms and Methods
  • Interface the grid to local news media (e.g. Bradford Argus & Burnley Express) and local data repositories – crime statistics (crime surveys and police data), ethnicity compliance data, housing queues, field data
surrey society grid social science data
Surrey Society Grid:Social Science Data?

Language and text are constitutive (and not merely representational): but ‘society is not reducible to language and linguistic analysis (Hodgson 2000:62). Discourses are broader than language, being constituted notjust in texts, but also in definite institutional and organizational practices’ (Jackson 2004). But text is all we have after the event, the interview, the survey

surrey society grid social science data2
Surrey Society Grid:Social Science Data?
  • There is no visible technique in social science research methodology that can improve the researchers productivity in collecting and analysing large volumes of speech and text.
  • Social scientists survey, and occasionally interview, interesting individuals in various social groups – analyse the survey form and quantify.
  • So what about the data collected in the field. Data is buried in tombs never to be taken out again.
  • Most text, if ever, is hand-coded by the social science researcher and then the proxy of the interpretation of the codes is presented as objective analysis.
surrey society grid a case study
Surrey Society Grid:A Case Study
  • We present a method for systematically identifying sentiment bearing phrases in large volumes of streaming texts – a local grammar comprising templates to extract the phrases with a minimal number of false positives.
  • The sentiments are aligned with quantitative (time-varying) information and results co-integrated and tested for Granger causality
  • The grammar itself is constructed automatically from a corpus of domain specific texts
surrey society grid a case study1
Surrey Society Grid:A Case Study
  • Of all the contested boundaries that define the discipline of sociology, none is more crucial than the divide between sociology and economics […] Talcott Parsons, for all [his] synthesizing ambitions, solidified the divide. “Basically,” […] “Parsons made a pact ... you, economists, study value; we, the sociologists, will study values.”
  • If the financial markets are the core of many high-modern economies, so at their core is arbitrage: the exploitation of discrepancies in the prices of identical or similar assets.
  • Arbitrage is pivotal to the economic theory of financial markets. It allows markets to be posited as efficient without all individual investors having to be assumed to be economically rational.

MacKenzie, Donald. 2000b. “Long-Term Capital Management: a Sociological Essay.” In (Eds) in Okönomie und Gesellschaft, Herbert Kaltoff, Richard Rottenburg and Hans-Jürgen Wagener. Marberg: Metropolis. Pp 277-287.

rationality bounded rationality and sentiment6
Rationality, Bounded Rationality and Sentiment
  • A financial economist can analyse quantitative data using a large body of methods and techniques in statistical time series analysis on “fundamental data”, related, for example, to fixed assets of an enterprise, and on “technical data”, for example, share price movement;
  • The economist can study the behaviour of a financial instrument, for example individual shares or currencies, or aggregated indices associated with stock exchanges, by looking at the changes in the value of the instrument at different time scales – ranging from minutes to decades;
  • Financial investors/traders are trying to discover the market sentiment, looking for consensus in expectations, rising prices on falling volumes, and information/assistance from back-office analysts;
  • The efficient market hypothesis suggests that quirks caused by sentiments can be rectified by the supposed inherent rationality of the majority of the players in the market
rationality bounded rationality and sentiment7
Rationality, Bounded Rationality and Sentiment
  • Recent developments in financial economics, signified by the emergence of derivatives and arbitrage, show the triumph of rational reasoning: such instruments/strategies were created on the basis of mathematical models (Black and Scholes 1972), and the trading can be monitored using the self same models (Miller 1990);
  • The assumption of overarching rational behaviour has been reviewed by Herbert Simon (1978/1992) and Daniel Kahnneman (2003), and arguments have been presented in favour of a model of bounded rationality where the actors in a given social situation prefer to ignore facts and trust their own version of reality and the efficient market mechanisms fail to operate;
news analysis and sentiment analysis
News Analysis and Sentiment Analysis
  • Qualitative research methods are being used in financial economics, and in sociological studies of financial markets, for systematically studying the hopes and fears of the traders, investors, and regulators in the analysis of the behaviour of the markets.
  • Since 2000, the analysis of news wire has become selective and targeted.
  • Some researchers choose news related to economic and financial topics
    • news about employment
    • distinguish between scheduled and non-scheduled news announcements;
news analysis and sentiment analysis1
News Analysis and Sentiment Analysis
  • Some pre-select keywords that indicate change in the value of a financial instrument – including metaphorical terms like above, below, up and down – and use them to ‘represent’ positive/negative news stories.
  • Some use the frequency of collocational patterns for assigning a ‘feel-good/bad’ score to the story
    • ‘Good’ news stories appear to comprise collocates like revenues rose, share rose;
    • ‘Bad’ news stories contain profit warning, poor expectation;
    • ‘Neutral’ stories contain collocates such as announces product, alliance made;
  • The ‘sentiment’ of the story is then correlated with that of a financial instrument cited in the stories and inferences made.

DeGennaro, R., and R. Shrieves (1997): ‘Public information releases, private informationarrival and volatility in the foreign exchange market’. Journal of Empirical Finance Vol. 4, pp 295–315. ;

Koppel, M and Shtrimberg, I. (2004). ‘Good News or Bad News? Let the Market Decide’. In AAAI Spring Symposium on Exploring Attitude and Affect in Text. Palo Alto: AAAI Press. pp. 86-88;

a method for identifying and extracting sentiment
A method for identifying and extracting sentiment
  • No proxies – but the real data
  • We adopt a text-driven and bottom-up method: starting from a collection of texts in a specialist domain, together with a representative general language corpus,
  • A five-step algorithm for identifying discourse patterns with more or less unique meanings, without any overt access to an external knowledge base
an algorithm for identifying and extracting sentiment
An algorithm for identifying and extracting sentiment
  • Select training corpora: a randomly sampled special language corpus and a general language corpus.
  • Extract key words;
  • Extract key collocates;
  • Extract local grammar using collocation analysis and relevance feedback;
  • Assert the grammar as a finite state automaton.
experiments and evaluation of sentiment analysis method
Experiments and Evaluation of sentiment analysis method

I. Select training corpora

Training-Corpus

  • The British National Corpus, comprising 100-million tokens distributed over 4124 texts (Aston and Burnard 1998);
  • Reuters Corpus Volume 1 (RCV1) comprising news texts produced in 1996-1997 and contains 181 million words distributed over 806,791 texts
experiments and evaluation of sentiment analysis method1
Experiments and Evaluation of sentiment analysis method
  • II. Extract key words
    • The frequencies of individual words in the RCV1 were computed using System Quirk;
    • For describing how our method works we will use a randomly selected component of the corpus – the output of February 1997, henceforth referred to as the RCV1-Feb97 corpus;
    • The RCV1-Feb97 corpus containing 14 Million words distributed 63,364 texts.
experiments and evaluation of sentiment analysis method5
Experiments and Evaluation of sentiment analysis method

IV. Extract local grammar using collocation and relevance feedback

experiments and evaluation of sentiment analysis method6
Experiments and Evaluation of sentiment analysis method

V. Assert the grammar as a finite state automaton

  • The (re-) collocation patterns can then be asserted as a finite state automata for each of the movement verbs and spatial preposition metaphors
experiments and evaluation of sentiment analysis method7
Experiments and Evaluation of sentiment analysis method
  • V. Assert the grammar as a finite state automaton
    • The (re-) collocation patterns can then be asserted as a finite state automata for each of the movement verbs and spatial preposition metaphors
experiments and evaluation of sentiment analysis method8
Experiments and Evaluation of sentiment analysis method
  • V. Assert the grammar as a finite state automaton
    • The (re-) collocation patterns can then be asserted as a finite state automata for each of the movement verbs and spatial preposition metaphors
experiments and evaluation of sentiment analysis method9
Experiments and Evaluation of sentiment analysis method
  • V. Assert the grammar as a finite state automaton
    • The (re-) collocation patterns can then be asserted as a finite state automata for each of the movement verbs and spatial preposition metaphors
experiments and evaluation of sentiment analysis method10
Experiments and Evaluation of sentiment analysis method
  • The local grammar is used sentences that contain sentiment bearing phrases and can automatically annotate the phrases.
  • The graph shows the filtering power of the local grammar patterns: identifies between 1,000 to 10,000 sentiment words hourly in a corpus of between 10,000 to 100,000 tokens per hour to find between 10 to 100 ‘true’ sentiment bearing sentences
experiments and evaluation of sentiment analysis method11
Experiments and Evaluation of sentiment analysis method

Changes in the total number of positive/negative words together with those that are used in the local grammars (filtered positive / negative words) and total number of words.

experiments and evaluation of sentiment analysis method12
Experiments and Evaluation of sentiment analysis method

Changes in the total number of positive/negative words together with those that are used in the local grammars (filtered positive / negative words) and total number of words.

experiments and evaluation of sentiment analysis method13
Experiments and Evaluation of sentiment analysis method
  • Increasing the throughput
    • We have created a 24 node grid infrastructure, which can provide access to upto 64 processors simultaneously
    • Processing the (complete) RCV1 corpus (181 million words in 806,791 texts) on a single machine (a Dell PowerEdge 2650) takes 53300 seconds
    • Using 16 processors we gain a throughput increase by a factor of 15 (3572 seconds);
    • Using 64 processors, the time is halved again (1683 seconds).
conclusions and future work
Conclusions and Future Work
  • Though we have devised programs that can learn unambiguous patterns of use of positive or negative sentiment, a sentence is always used in the context of other sentences and the context may change if the inference is made on the basis of one sentence only;
  • One can argue that a new text is a response to some or all of the existing texts, and in that sense each text is contextualised within a network of other texts - even if all the existing texts unambiguously expressed a positive sentiment, a new text with strong negative sentiment may invalidate all of the positive sentiment.
conclusions and future work1
Conclusions and Future Work
  • Range of quantitative analysis techniques includes wavelet analysis (Ahmad et al 2004), fuzzy-logic knowledge bases (Poopola et al 2004), and case-based reasoning;
  • These techniques may be used to create a confidence index – or sentiment index;
  • These techniques can be extended to the new areas like
    • the reassurance gap in policing
    • totalising war discourse that leads to ethnic/racial conflicts
conclusions and future work2
Conclusions and Future Work
  • Quantitative analysis methods developed in the Surrey Society Grids project can be used in the analysis of on-line or accessible data such as crime statistics, for sociology of crime, and labour force surveys, based on race/ethnicity for anthropology;
  • The fusion of the results of the textual and quantitative analysis can, in turn, be used to automatically produce a crime confidence index,for measuring the fear of crime, and a conflict index,for measuring ethnic/racial tension in a community;