with google trends
Download
Skip this Video
Download Presentation
Predicting the Present

Loading in 2 Seconds...

play fullscreen
1 / 42

Predicting the Present - PowerPoint PPT Presentation


  • 478 Views
  • Uploaded on

With Google Trends. Predicting the Present. Hyunyoung Choi Hal Varian June 2009. Problem statement. Government agencies and other organizations produce monthly reports on economic activity Retail Sales House Sales Automotive Sales Unemployment Problems with reports

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Predicting the Present' - Leo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
with google trends
With Google TrendsPredicting the Present

Hyunyoung Choi

Hal Varian

June 2009

slide2

Problem statement

  • Government agencies and other organizations produce monthly reports on economic activity
      • Retail Sales
      • House Sales
      • Automotive Sales
      • Unemployment

Problems with reports

      • Compilation delay of several weeks
      • Subsequent revisions
      • Sample size may be small
      • Not available at all geographic levels

Google Trends releases daily and weekly index of search queries by industry vertical

      • Real time data
      • No revisions (but some sampling variation)‏
      • Large samples
      • Available by country, state and city

Can Google Trends data help predict current economic activity?

      • Before release of preliminary statistics
      • Before release of final revision

2

slide3

Categories in Google Trends by Query Shares

Note: Queries from 2009-01-01 to 2009-04-30 & Growth Comparison w/ the same time window

3

slide5

Geography

Time window

Category

5

slide6

Property Management

Home Insurance

Home Inspections & Appraisal

Real Estate Agencies

Home Financing

Rental Listings & Referrals

Subcategories under Real Estate by Query Shares

6

depicting trends
Depicting trends
  • Google Trends measures normalizedquery share of particular category of queries – controls for overall growth
  • Often useful to look at year-on-year changes to eliminate seasonality.
  • Illustrate correlations and covariates.
  • Improving predictions
  • Forecast time series using its own lagged values and add Trends data as a predictor.
  • Statistical significance?
  • Improved fit?
  • Improved forecasts?
  • Identify turning points?

9

slide11

Forecasting primer

  • Basic forecasting models
      • Autoregressive: value at time t depends on
        • Value at time t-1
      • Seasonal adjustment: value at time t depends on
        • Value at time t-12
        • For monthly data
      • Transfer function: value at time t depends on
        • Other contemporaneous or lagging variables
      • Seasonal autoregressive transfer model: Value at time t depends on
        • Value at time t-12 (seasonality)‏
        • Value at time t-1 (recent behavior)‏
        • Other lagging or contemporaneous variables (such as Google Trends data)‏
      • Typical question of interest
        • How much more accurate forecasts can you get from additional variables over and above the accuracy you get with the history of the time series itself?

11

11

model
Model

New Home Sales

  • Recent Search Activity on
  • Real Estate Agencies
  • Rental Listings & Referrals
  • Home Inspections & Appraisal
  • Property Management
  • Home Insurance
  • Home Financing

Exogenous Variables

Housing affordability with Average/Median Home Price

Time Series

Recent Trend with New Home Sales at t-1

Seasonality with New Home Sales at t-12

Google Trends

slide13

Predicting the present

New Residential Sales from US Census

Google Trends Real Estate by Category

  • Monthly release 24 – 28 days after the month
  • Seasonally adjusted
  • National and Regional aggregate
  • Home Inspections & Appraisal
  • Home Insurance
  • Home Financing
  • Property Management
  • Rental Listings & Referrals
  • Real Estate Agencies

13

slide15

Analysis and Forecasting

Model:

Yt = 446.1 + 0.864 * Yt - 1 – 4.340 * us378.1 + 4.198 * us96.2 – 0.001 * AvgPt – 1

Yt : New house sold at t-th month

AvgPt – 1: Average Sales Price of New One-Family Houses Sold at (t-1)-th month

us378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1st week

us96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2nd week

July 2008

Actual = 515K

Predicted = 442.98K

Z-score = 2.53

August 2008 Prediction = 417.52K

15

slide16

Analysis and Forecasting

  • Observations
      • Since 2005 new house sales have been decreasing, with little seasonality
      • Google Trends captures seasonality & recent trends
      • Positive association with Real Estate Agencies (96)
      • Negative association with Rental Listings & Referrals (378) and Average Price

16

slide18

Adventure Travel

Bus & Rail

Cruises & Charters

Attractions & Activities

Car Rental & Taxi Services

Hotels & Accommodations

Air Travel

Vacation Destinations

Subcategories under Travel by Query Shares

18

slide19

Travel to Hong Kong

Visitors Arrival Statistics from Hong Kong Tourism Board

Google Trends Travel by Category

  • Monthly summaries release with 1 month lag
  • Reports Country/Territory of Residence of visitors
  • Data available 2004-2008
  • Hotels & Accommodations
  • Air Travel
  • Car Rental & Taxi Services
  • Cruises & Charters
  • Attractions & Activities
  • Vacation Destinations
    • Australia
    • Caribbean Islands
    • Hawaii
    • Hong Kong
    • Las Vegas
    • Mexico
    • New York City
    • Orlando
  • Adventure Travel
  • Bus & Rail

19

slide21

Analysis and Forecasting

Model:

log(Yi,t) = 0.664 + 0.113 * log(Yi,t-1) + 0.828 * log(Yi,t-12) + 0.001 * Xi,t,2 + 0.001 * Xi,t,3

+ 0.005 * FXrate i,t + ηi, + ei,t

ei,t ~ N(0, 0.09382), ηi ~ N(0, 0.02282)‏

Yi,t = Arrival to Hong Kong at month t and from i-th country

Xi,t,1 = Google Trend Search at 1st week of month t and from i-th country

Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th country

Xi,t,3 = Google Trend Search at 3rd week of month t and from i-th country

FXrate i,t = Hong Kong Dollar per one unit of i-th country’s local currency at month t. Average of first week’s FX rate is used as a proxy to FX rate per each month.

21

slide23

Analysis and Forecasting

  • Conclusion
      • Arrival at time t is positively associated with arrival at time t-1 and arrival at time t-12.
        • It shows strong seasonality and autocorrelation
      • Arrival at time t is positively associated with searches on [Hong Kong].
      • Arrival at time t is positively associated with FX rates.
        • When the local currency appreciates relative to Hong Kong Dollar, visitors to Hong Kong increase.

23

slide25

US Auto Sales by Make

US Auto Sales by Make

Google Trends under Vehicle Brands Category

  • Monthly summaries released 1 week after end of month
  • Data available by Car Sales, Truck Sales and Total Sales for each make
  • Data available from 2003-2008
  • Source: Automotive News Data Center
  • Google Trends subcategory Vehicle Brands.
  • Weekly Search query index
  • Total 31 verticals in this subcategory
    • 27 verticals matching to Monthly Sales available

25

25

slide26

Google Categories under Vehicle Brands

NOTE: Area represents the queries volume from first half year 2008 and the color represents queries yearly growth rate

26

slide27

Auto Sales by Make (Top 9 Make by Sales) Monthly Sales vs. Google Trends at Second Week of each month

27

27

slide28

Analysis and Forecasting

Fixed effects model:

log(Yi,t) = 2.4276 + 0.2552 * log(Yi,t-1) + 0.4930 * log(Yi,t-12)

+ 0.0005 * Xi,t,2 + 0.0014 * Xi,t,2 + ai * Makei + ei,t

ei,t ~ N(0, 0.13472) , Adjusted R2 = 0.9829

Yi,t = Auto Sales of i-th Make at month t

Xi,t,1 = Google Trend Search at 1st week of month t and from i-th make

Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th make

Makei =Dummy variable for Auto Make

ai = Coefficient to capture the mean level of Auto Sales by Make

ANOVA Table

Df Sum Sq Mean Sq F value Pr(>F)

trends1 1 12.89 12.89 710.3542 < 2e-16 ***

trends2 1 0.05 0.05 2.7987 0.09455 .

log(s1) 1 1532.95 1532.95 84452.7530 < 2e-16 ***

log(s12) 1 24.07 24.07 1325.9741 < 2e-16 ***

as.factor(brand) 26 3.34 0.13 7.0696 < 2e-16 ***

Residuals 1480 26.86 0.02

28

28

slide30

Analysis and Forecasting

  • Conclusion
      • Sales at time t are positively associated with Sales at time t-1 and Sales at time t-12.
        • Sales show strong seasonality and autocorrelation
      • Monthly Sales are positively correlated to the first and second weeks search volume of each month.
        • If the search volume increase by 1%, the sales volume will increase by an average of 0.19%.

30

30

slide32

YoY Growth in Initial Claims & Google Search

According to the NBER, the current recession started December 2007.

National unemployment rate passed 5% in mid 2008 and search queries on [Welfare and Unemployment] also increased at same time.

strong autocorrelation in initial claims
Strong Autocorrelation in Initial Claims

Time Series

Autocorrelation Function

time window for analysis
Time Window for Analysis

Recession Starts

Window For Long Term Model

Window For Short Term Model

model39
Model

Signif. codes: 0.001 ‘***’ 0.05 ‘**’ 0.01 ‘*’

Reference ARIMA(0,1,1) X (1,0,0)12 Model

ARIMA(0,1,1) X (1,0,0)12 Model With Google Trends

Model Fit improved significantly – smaller Standard deviation, high log likelihood and smaller AIC

Initial Claims are positively correlated with searches on Jobs and Welfare.

long term model prediction comparison with mae
Long Term Model: Prediction Comparison with MAE

With Google Trends, the out-of-sample prediction MAE decreases by 16.84%.

Prediction with rolling window from 1/11/2009 to 4/12/2009

Prediction Error at t:

Mean Absolute Error:

short term model prediction comparison with mae
Short Term Model: Prediction Comparison with MAE

With Google Trends, the out-of-sample prediction MAE decreases by 19.23%.

Prediction errors are within the same range as LT Model.

Fit improvement is better with ST Model.

summary
Summary

Google Trends significantly improves out-of-sample prediction of state unemployment, up to 18 days in advance of data release.

Mean absolute error for out-of-sample predictions declines by 16.84% for LT Model and 19.23% for ST Model.

Further work

Can examine metro level data

Other local data (real estate)‏

Combine with other predictors

Detect turning points?

ad