Predicting market movements from breaking news to emerging social media
1 / 53

Predicting Market Movements: From Breaking News to Emerging Social Media - PowerPoint PPT Presentation

  • Uploaded on

Predicting Market Movements: From Breaking News to Emerging Social Media. Dr. Hsinchun Chen Director, Artificial Intelligence Lab University of Arizona [email protected] Acknowledgements: NSF CRI; NSF EXP-LA; DOD DTRA, CTFP, NPS; (ARFL WMD, CIA, FBI).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Predicting Market Movements: From Breaking News to Emerging Social Media' - alana-battle

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Predicting market movements from breaking news to emerging social media

Predicting Market Movements: From Breaking News to Emerging Social Media

Dr. Hsinchun Chen

Director, Artificial Intelligence Lab

University of Arizona

[email protected]


Predicting markets
Predicting Markets

  • Markets: international markets, emerging markets, import/export markets, financial market, stock market, commodity market, retail market

  • Economics (macro), international relations (trade, geopolitics), finance (international/banking/stock), accounting (market return), marketing (sales/retailing)

  • US (NSF SBE, social behavioral economics; governments, think tanks), Europe/Asia  Business school research in not science (cannot be funded by NSF in US)!

  • Economics, finance, accounting, political science, social science, marketing, computer science (small, no funding in US!), MIS (business intelligence)

  • Geopolitical/econ/finance/accounting models/theories, market metrics/parameters, analytical techniques, results interpretations, predicating markets

  • EMH (efficiency market hypothesis), RWT (random walk theory), CAPM (capital asset pricing model), quant/algorithm trading

Research opportunities
Research Opportunities

  • Sophisticated econ/finance/accounting/marketing models/theories, established analytical techniques and metrics (numeric), abundant structured databases (financial metrics, economic indicators, stock quotes)

  • New, diverse unstructured (text) web-enabled business data sources, e.g., 10K/10Q SEC reports, mass media news, local news, Internet news, financial blogs, investor forums, tweets…

  • Topic extraction, named entity recognition, sentiment/affect analysis, multilingual language models, social network analysis, statistical machine learning, temporal data/text mining, time-series analysis…

Nerds on wall street
Nerds on Wall Street

“Future technological stars…(1) Advanced electronic market tools; (2) Understanding both quantitative and qualitative information…”

“The Text Frontier, Collective Intelligence, Social Media, and Market Monitors”

“Stocks are stories, bonds are mathematics.”

David Leinweber, 2009




Business intelligence analytics
Business Intelligence & Analytics

  • $3B BI revenue in 2009 (Gartner, 2006)

  • The Data Deluge (The Economists, March 2010); internet traffic 667 Exabytes by 2013, Cisco; Total amount of information in 2010, 1.2 Zettabyte (KB-MB-GB-TB-PB-EB-ZB-YB)

  • $9.4B BI software M&A spending in 2010 and $14.1B by 2014 (Forrester)

  • IBM spent $14B in BI in five years; $9B BI revenue in 2010 (USA Today, November 2010); 24 acquisitions, 10,000 BI software developers, 8,000 BI consultants, 200 BI mathematicians  Acquired i2/COPLINK in 2011

Business intelligence analytics1
Business Intelligence & Analytics

  • BI: “skills, technologies, applications, and practices used to help an enterprise better understand its business and market.”

  • Technologies: data warehousing; Extraction, Transformation, and Load(ETL); Business Performance Management (BPM); visual dashboards; and advanced knowledge discovery using data and text mining

  • BI 2.0: web intelligence, web analytics, web 2.0, social media analytics, opinion mining; cloud computing and web services; real-time monitoring and mining; enterprise performances (marketing/accounting/finance/healthcare)

Az biz intel

  • Mass media, social media contents

  • Text & social media analytics techniques

  • Finance/accounting/marketing models (Tetlock/Columbia, Antweiler/UBC, Das/Santa Clara)  NYU (Dhar), Arizona (Dhaliwal, Kelly, Jiang, Lusch, Yong), National Taiwan U (Li, Hong, Lu)

  • Bag of words, named entities, proper nouns, topics (1, 2-, 3- grams)

  • Sentiment/valence, lexicons, machine learning, stakeholder analysis, EFLS analysis

  • Time series models, spike detection, decaying function, trading windows, targeted sentiment

  • Econometrics/regression models (R-sqr, p-value), 10-fold validation (F, accuracy), simulated trading (cost, frequency, exit)


  • Evolution of online WOM through new-product lifecycle

    • WOM communication starts early in preproduction, becomes highly active before movie release, then diminishes gradually

    • Valence has a clear decreasing trend over time, indicating that WOM becomes more negative after movie release

    • Subjectivity, number of sentences and number of valence words stay stable over time

Literature review stock performance prediction
Literature Review: Stock Performance Prediction

  • Theoretical perspectives on stock behavior

    • Efficient market hypothesis (Fama 1964)

      • Price of a stock reflects all available information

      • Market reacts instantaneously; impossible to outperform

    • Random walk theory (Malkiel 1973)

      • Price of a stock varies randomly over time

      • Future prediction, outperforming the market is impossible

    • Pessimistic assessments of the predictability of stock behavior refuted through empirical studies

      • Lo and MacKinlay 1988; Jaffe et al 1989; Pesaran and Timmermann 1995

Literature review stock performance prediction1
Literature Review: Stock Performance Prediction

  • Predominant approaches to stock prediction

    • Fundamentalists utilize fundamental and financial measures of economy, industry, and firm

      • Economy and sector indicators, financial ratios of the firm

        • Fama-French three factors model (Fama and French 1993)

          • Market return, market capitalization, book to market ratio

        • Currency exchange rates, interest rates, dividends

    • Technicians utilize historical time-series information of the stock and market behavior

      • Historical price, volatility, trading volume

    • Various machine learning models applied

      • Regression, ANN, ARIMA, support vector machines

Literature review stock performance prediction2
Literature Review: Stock Performance Prediction

  • In addition to financial and stock variables, researchers have incorporated firm-related news article measures

    • Developed trend-based language models for news articles

      • Lavrenko et al. 2000

    • Categorized press releases (good, bad, neutral)

      • Mittermayer 2004

    • Examined various textual representations of news articles

      • Schumaker and Chen, 2009a; 2009b

  • But few have incorporated firm-related web forums

    • Thomas and Sycara (2000) utilize text classifications of discussions on Raging Bull to inform stock trading strategies

Literature review firm related web forums and stock
Literature Review:Firm-Related Web Forums and Stock

  • Studies relating web forums and stock behavior

    • Examined firm-related web forums on major web portals

  • Early studies focused on activity, without content analysis

    • Supported market efficiency; only concurrent relationships identified

      • Wysocki 1998; Tumarkin and Whitelaw 2001

    • Subsequently challenged; forum activity predicted stock behavior

      • Antweiler and Frank 2002; 2004; Das and Chen 2007

  • Analysis advanced to measure opinions in discussions

    • ‘Bullishness’ classifiers to distinguish investment positions

      • Antweiler and Frank 2004; Das and Chen 2007

    • Classified buy, hold, or sell positions with 60 – 70% accuracy

  • Identified predictive relationships between forum discussion sentiment and subsequent stock returns, volatility, trading volume

  • Shortcomings

    • Retrospective analyses, shareholder perspective of major forums

  • Az fintext numbers text
    AZ FinText: numbers + text

    • Techniques: bag of words, named entities, proper nouns, past stock prices + SVR

    • Testbed: S&P 500 5 weeks, Oct-Nov 2005, 2,809 news, 10M stock quotes,

    • GICS industry classification

    • Evaluation: Return, vs. Quant funds; 20-minute prediction

    Az fintext in the news
    AZ FinText in the news

    Thursday, June 10, 2010

    AI That Picks Stocks Better Than the Pros

    A computer science professor uses textual analysis of articles to beat the market.

    WSJ Technology News and Insights

    June 21, 2010, 1:45 PM ET

    Using Artificial Intelligence to Digest News, Trade Stocks

    Az stock tracker i mass social media topic volume sentiment
    AZ STOCK TRACKER I: mass, social media, topic, volume, sentiment

    Data collection

    Topic extraction



    • Topic

    Mutual information phrase extractor

    Traffic dynamics

    Web Forums





    Topic correlation and evolution

    Sentiment correlation and evolution



    • Sentiment

    • Author


    Sentiment identification

    Sentiment grader

    Active topics and sentiments


    Market prediction



    • Message

    User generated contents ugc conversations of 30 000 wal mart constituents and 500 000 responses
    User-Generated Contents (UGC): sentimentConversations of 30,000 Wal-Mart Constituents and 500,000 Responses

    Post dynamics
    Post Dynamics sentiment

    Sentiment trend
    Sentiment Trend sentiment

    Market modeling
    Market Modeling sentiment

    • Correlation

      • Sentiment expressed in the forum contemporaneously correlates significantly with stock return

      • Disagreement, volume, and length expressed in the forum also hold significant correlations with volatility and trading volume

    Market predictive results cont d
    Market Predictive Results (cont’d) sentiment

    • Predictive regression (t-1)

    • The significant measures of forum discussions identified in contemporaneous regressions maintain their significance in the predictive regression models

    • Additionally, sentiment expressed in the web forum holds a significant relationship with the trading volume on the following day

      • Positive sentiment reduces trading volume; negative sentiment induces trading activity

    Experimental design description of prediction models
    Experimental Design: sentiment Description of Prediction Models

    Experimental design description of prediction models1
    Experimental Design: sentiment Description of Prediction Models

    Experimental design description of prediction models2
    Experimental Design: sentiment Description of Prediction Models

    • Baseline Model – Baseline-FF

      • Fundamental variables: Fama-French model

    • Baseline Model – Baseline-Tech

      • Technical variables: Lagged stock returns, volatility, trading volume, day-of-week dummies

    • Baseline Model – Baseline-Comp

      • Comprehensive: all fundamental and technical variables

    Where t = days (t = 1, 2, …, n);day of the week (d = 1, …, 4)

    Experimental design description of prediction models3
    Experimental Design: sentiment Description of Prediction Models

    • Forum models

      • Comprehensive baseline variables plus forum-level measures

    Where t = days (t = 1, 2, …, n);day of the week (d = 1, …, 4); stakeholder clusters (s = 1, 2, …, c)

    Experimental design description of prediction models4
    Experimental Design: sentiment Description of Prediction Models

    • Stakeholder models

      • Comprehensive baseline variables plus stakeholder group-level forum measures

    Where t = days (t = 1, 2, …, n);day of the week (d = 1, …, 4); stakeholder clusters (s = 1, 2, …, c); index k = (((c - 1) * 6) + 15)

    Experimental design social media data
    Experimental Design: sentimentSocial Media Data

    • A 17 month period was utilized for analysis and experimentation

      • November 1, 2005 to March 31, 2007

      • First five months were utilized to calibrate the initial stock return prediction models

        • November1, 2005 – March 31, 2006

        • Calibrated models applied for prediction during each trading day in the next month

      • Each subsequent month, new models were calibrated using five previous months of time-series variables, for stock return prediction during the next month of trading

      • In total, stock return prediction was performed daily for one year (250 trading days)

        • April 1, 2006 – March 31, 2007

    Results and discussion
    Results and Discussion sentiment

    • Hypothesis testing results

    Results and discussion1
    Results and Discussion sentiment

    • Wal-Mart stock return prediction model results

      • Baseline models using fundamental and technical variables

        • Results across 250 trading days forecasted

      • Baselines for simulated trading (initial investment of $10,000):

        • Holding Wal-Mart stock for the year results in $10,096

        • Holding S&P 500 for the year results in $11,012

    Results and discussion2
    Results and Discussion sentiment

    • Wal-Mart stock return prediction model results

      • Incorporating the Wakeup Wal-Mart web forum

        • Results across 250 trading days forecasted

    Pair-wise t-test; improvement over best baseline model at * p < 0.10 ** p < 0.05

    Introduction sentiment

    • Forward-looking statements (FLS) refer to

      • Projections, forecasts, or other predictive statements

      • Made by firm management

      • Section 21E of the Securities Exchange Act (1934)

    • Extended forward-looking statements (EFLS)

      • Statements that may have implications for a firms future development

      • Similar to FLS, but broader

      • Including information from information intermediaries (e.g., newspapers, newswires) and individuals (e.g., blogs)

    Recognizing efls
    Recognizing EFLS sentiment

    • EFLS: Extends FLS to include statements about firm’s future performance from other sources such as financial press, analysts’ reports, and individuals

    AZ STOCK TRACKER III: sentiment


    Summary of annotation results
    Summary of Annotation Results sentiment

    • High kappa values (>0.7) on risks supports the coding scheme being empirically valid

    • Agreement upper bound

      • 89% to 91% (for ALL, POS, and NEG)

    • Reference Standard Dataset:

      • 2539 sentences in total

    Note: (95% CI) from 1,000 Bootstrappings

    Efls impacts hypotheses development
    EFLS Impacts: Hypotheses Development sentiment

    • Theoretical framework (Easley and O’Hara, 2004)

      • There are signals for stock k ()

      • ()

      • : The relative amount of private-versus-public information

    Public Signals

    Private Signals

    Hypotheses development cont d
    Hypotheses Development (Cont’d.) sentiment

    • Hypothesis 1: Firms with lower EFLS intensity are associated with higher expected return.

    Hypotheses development cont d1
    Hypotheses Development (Cont’d.) sentiment

    • Hypothesis 2: Firms with lower EFLS intensity are associated with the higher stock volatility.

      • If and then >0

      • Intuition: if there are enough signals and the fraction of informed investors is larger than 41%, then firms with lower amounts of EFLS  Higher Volatility

    Firm level performance evaluation cont d
    Firm-Level Performance Evaluation (Cont’d.) sentiment

    • Empirical Model 1:

    • Empirical Model 2:

    Hypothesis 1 Predicts Negative b1

    Hypothesis 2 Predicts b1 ≠ 0

    Experiment two firm level evaluation
    Experiment Two: Firm-Level Evaluation sentiment

    • Research Testbed: January 1986 to May 2008, 1,134,321 Wall Street Journal news articles

      • Merged with CRSP, Compustat, and IBES

      • Stock prices lower than $5 at the end of a month were removed (Cohen and Frazzini 2008; Fang and Peress 2009)

      • 1,274,711 firm-months, spanning 269 months

    Expected return and efls intensity
    Expected Return and EFLS Intensity sentiment

    ***, **, * indicate statistical significance at the 0.01, 0.05, and 0.1 levels, respectively.

    Volatility and efls intensity
    Volatility and EFLS Intensity sentiment

    ***, **, * indicate statistical significance at the 0.01, 0.05, and 0.1 levels, respectively.

    Take away and wip 20
    Take-Away and WIP (20%) sentiment

    • Mass and social media texts provide additional signals for market prediction (in addition to numbers)

    • Message volume important; aggregate sentiment may not (EMH)

    • Business sentiment processing difficult; may require additional content pre-processing (stakeholder; EFLS)

    • Predicting return hard; predicting volatility easier (VIX Chicago Board)

    • Large-scale stock news tracking and text analytics can be automated

    • Trading windows; decay function; targeted sentiment; extensive trading periods (up/down); industry and news category (oil/banking); firm & index size (Russell/NYSE); emerging markets (China)

       All the firms (10K), all the news (1M each), all the time ???

       Trading strategy ???

    Basic Information sentiment

    Data Sources for US Public Companies


    Static Figures/Dashboards

    Company Information Database





    Company Name

    Company Keywords

    Data Collection

    Predefined Data Sources

    Dynamic Data Sources

    Yahoo Finance Forums



    Search Engines

    Company Websites

    Stock Exchange

    10K Report



    Data Processing


    Interactive Applications

    Performance Indicators

    Topics & Sentiments

    Time Series / Burst

    Risk Model




    Analytic Approaches


    Simulated Trading

    Single Media Analysis

    Cross Media Analysis



    AZ BIZ INTEL System Design

    Hsinchun sentiment Chen, Ph.D.

    Artificial Intelligence Lab, University of Arizona

    [email protected]