With google trends
Download
1 / 42

Predicting the Present - PowerPoint PPT Presentation


  • 472 Views
  • Uploaded on

With Google Trends. Predicting the Present. Hyunyoung Choi Hal Varian June 2009. Problem statement. Government agencies and other organizations produce monthly reports on economic activity Retail Sales House Sales Automotive Sales Unemployment Problems with reports

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Predicting the Present' - Leo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
With google trends l.jpg

With Google Trends

Predicting the Present

Hyunyoung Choi

Hal Varian

June 2009


Slide2 l.jpg

Problem statement

  • Government agencies and other organizations produce monthly reports on economic activity

    • Retail Sales

    • House Sales

    • Automotive Sales

    • Unemployment

      Problems with reports

    • Compilation delay of several weeks

    • Subsequent revisions

    • Sample size may be small

    • Not available at all geographic levels

      Google Trends releases daily and weekly index of search queries by industry vertical

    • Real time data

    • No revisions (but some sampling variation)‏

    • Large samples

    • Available by country, state and city

      Can Google Trends data help predict current economic activity?

    • Before release of preliminary statistics

    • Before release of final revision

2


Slide3 l.jpg

Categories in Google Trends by Query Shares

Note: Queries from 2009-01-01 to 2009-04-30 & Growth Comparison w/ the same time window

3



Slide5 l.jpg

Geography

Time window

Category

5


Slide6 l.jpg

Property Management

Home Insurance

Home Inspections & Appraisal

Real Estate Agencies

Home Financing

Rental Listings & Referrals

Subcategories under Real Estate by Query Shares

6




Depicting trends l.jpg
Depicting trends

  • Google Trends measures normalizedquery share of particular category of queries – controls for overall growth

  • Often useful to look at year-on-year changes to eliminate seasonality.

  • Illustrate correlations and covariates.

  • Improving predictions

  • Forecast time series using its own lagged values and add Trends data as a predictor.

  • Statistical significance?

  • Improved fit?

  • Improved forecasts?

  • Identify turning points?

9



Slide11 l.jpg

Forecasting primer

  • Basic forecasting models

    • Autoregressive: value at time t depends on

      • Value at time t-1

    • Seasonal adjustment: value at time t depends on

      • Value at time t-12

      • For monthly data

    • Transfer function: value at time t depends on

      • Other contemporaneous or lagging variables

    • Seasonal autoregressive transfer model: Value at time t depends on

      • Value at time t-12 (seasonality)‏

      • Value at time t-1 (recent behavior)‏

      • Other lagging or contemporaneous variables (such as Google Trends data)‏

    • Typical question of interest

      • How much more accurate forecasts can you get from additional variables over and above the accuracy you get with the history of the time series itself?

11

11


Model l.jpg
Model

New Home Sales

  • Recent Search Activity on

  • Real Estate Agencies

  • Rental Listings & Referrals

  • Home Inspections & Appraisal

  • Property Management

  • Home Insurance

  • Home Financing

Exogenous Variables

Housing affordability with Average/Median Home Price

Time Series

Recent Trend with New Home Sales at t-1

Seasonality with New Home Sales at t-12

Google Trends


Slide13 l.jpg

Predicting the present

New Residential Sales from US Census

Google Trends Real Estate by Category

  • Monthly release 24 – 28 days after the month

  • Seasonally adjusted

  • National and Regional aggregate

  • Home Inspections & Appraisal

  • Home Insurance

  • Home Financing

  • Property Management

  • Rental Listings & Referrals

  • Real Estate Agencies

13



Slide15 l.jpg

Analysis and Forecasting

Model:

Yt = 446.1 + 0.864 * Yt - 1 – 4.340 * us378.1 + 4.198 * us96.2 – 0.001 * AvgPt – 1

Yt : New house sold at t-th month

AvgPt – 1: Average Sales Price of New One-Family Houses Sold at (t-1)-th month

us378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1st week

us96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2nd week

July 2008

Actual = 515K

Predicted = 442.98K

Z-score = 2.53

August 2008 Prediction = 417.52K

15


Slide16 l.jpg

Analysis and Forecasting

  • Observations

    • Since 2005 new house sales have been decreasing, with little seasonality

    • Google Trends captures seasonality & recent trends

    • Positive association with Real Estate Agencies (96)

    • Negative association with Rental Listings & Referrals (378) and Average Price

16



Slide18 l.jpg

Adventure Travel

Bus & Rail

Cruises & Charters

Attractions & Activities

Car Rental & Taxi Services

Hotels & Accommodations

Air Travel

Vacation Destinations

Subcategories under Travel by Query Shares

18


Slide19 l.jpg

Travel to Hong Kong

Visitors Arrival Statistics from Hong Kong Tourism Board

Google Trends Travel by Category

  • Monthly summaries release with 1 month lag

  • Reports Country/Territory of Residence of visitors

  • Data available 2004-2008

  • Hotels & Accommodations

  • Air Travel

  • Car Rental & Taxi Services

  • Cruises & Charters

  • Attractions & Activities

  • Vacation Destinations

    • Australia

    • Caribbean Islands

    • Hawaii

    • Hong Kong

    • Las Vegas

    • Mexico

    • New York City

    • Orlando

  • Adventure Travel

  • Bus & Rail

19



Slide21 l.jpg

Analysis and Forecasting

Model:

log(Yi,t) = 0.664 + 0.113 * log(Yi,t-1) + 0.828 * log(Yi,t-12) + 0.001 * Xi,t,2 + 0.001 * Xi,t,3

+ 0.005 * FXrate i,t + ηi, + ei,t

ei,t ~ N(0, 0.09382), ηi ~ N(0, 0.02282)‏

Yi,t = Arrival to Hong Kong at month t and from i-th country

Xi,t,1 = Google Trend Search at 1st week of month t and from i-th country

Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th country

Xi,t,3 = Google Trend Search at 3rd week of month t and from i-th country

FXrate i,t = Hong Kong Dollar per one unit of i-th country’s local currency at month t. Average of first week’s FX rate is used as a proxy to FX rate per each month.

21



Slide23 l.jpg

Analysis and Forecasting

  • Conclusion

    • Arrival at time t is positively associated with arrival at time t-1 and arrival at time t-12.

      • It shows strong seasonality and autocorrelation

    • Arrival at time t is positively associated with searches on [Hong Kong].

    • Arrival at time t is positively associated with FX rates.

      • When the local currency appreciates relative to Hong Kong Dollar, visitors to Hong Kong increase.

23



Slide25 l.jpg

US Auto Sales by Make

US Auto Sales by Make

Google Trends under Vehicle Brands Category

  • Monthly summaries released 1 week after end of month

  • Data available by Car Sales, Truck Sales and Total Sales for each make

  • Data available from 2003-2008

  • Source: Automotive News Data Center

  • Google Trends subcategory Vehicle Brands.

  • Weekly Search query index

  • Total 31 verticals in this subcategory

    • 27 verticals matching to Monthly Sales available

25

25


Slide26 l.jpg

Google Categories under Vehicle Brands

NOTE: Area represents the queries volume from first half year 2008 and the color represents queries yearly growth rate

26


Slide27 l.jpg

Auto Sales by Make (Top 9 Make by Sales) Monthly Sales vs. Google Trends at Second Week of each month

27

27


Slide28 l.jpg

Analysis and Forecasting

Fixed effects model:

log(Yi,t) = 2.4276 + 0.2552 * log(Yi,t-1) + 0.4930 * log(Yi,t-12)

+ 0.0005 * Xi,t,2 + 0.0014 * Xi,t,2 + ai * Makei + ei,t

ei,t ~ N(0, 0.13472) , Adjusted R2 = 0.9829

Yi,t = Auto Sales of i-th Make at month t

Xi,t,1 = Google Trend Search at 1st week of month t and from i-th make

Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th make

Makei =Dummy variable for Auto Make

ai = Coefficient to capture the mean level of Auto Sales by Make

ANOVA Table

Df Sum Sq Mean Sq F value Pr(>F)

trends1 1 12.89 12.89 710.3542 < 2e-16 ***

trends2 1 0.05 0.05 2.7987 0.09455 .

log(s1) 1 1532.95 1532.95 84452.7530 < 2e-16 ***

log(s12) 1 24.07 24.07 1325.9741 < 2e-16 ***

as.factor(brand) 26 3.34 0.13 7.0696 < 2e-16 ***

Residuals 1480 26.86 0.02

28

28



Slide30 l.jpg

Analysis and Forecasting

  • Conclusion

    • Sales at time t are positively associated with Sales at time t-1 and Sales at time t-12.

      • Sales show strong seasonality and autocorrelation

    • Monthly Sales are positively correlated to the first and second weeks search volume of each month.

      • If the search volume increase by 1%, the sales volume will increase by an average of 0.19%.

30

30



Slide32 l.jpg

YoY Growth in Initial Claims & Google Search

According to the NBER, the current recession started December 2007.

National unemployment rate passed 5% in mid 2008 and search queries on [Welfare and Unemployment] also increased at same time.





Strong autocorrelation in initial claims l.jpg
Strong Autocorrelation in Initial Claims

Time Series

Autocorrelation Function



Time window for analysis l.jpg
Time Window for Analysis

Recession Starts

Window For Long Term Model

Window For Short Term Model


Model39 l.jpg
Model

Signif. codes: 0.001 ‘***’ 0.05 ‘**’ 0.01 ‘*’

Reference ARIMA(0,1,1) X (1,0,0)12 Model

ARIMA(0,1,1) X (1,0,0)12 Model With Google Trends

Model Fit improved significantly – smaller Standard deviation, high log likelihood and smaller AIC

Initial Claims are positively correlated with searches on Jobs and Welfare.


Long term model prediction comparison with mae l.jpg
Long Term Model: Prediction Comparison with MAE

With Google Trends, the out-of-sample prediction MAE decreases by 16.84%.

Prediction with rolling window from 1/11/2009 to 4/12/2009

Prediction Error at t:

Mean Absolute Error:


Short term model prediction comparison with mae l.jpg
Short Term Model: Prediction Comparison with MAE

With Google Trends, the out-of-sample prediction MAE decreases by 19.23%.

Prediction errors are within the same range as LT Model.

Fit improvement is better with ST Model.


Summary l.jpg
Summary

Google Trends significantly improves out-of-sample prediction of state unemployment, up to 18 days in advance of data release.

Mean absolute error for out-of-sample predictions declines by 16.84% for LT Model and 19.23% for ST Model.

Further work

Can examine metro level data

Other local data (real estate)‏

Combine with other predictors

Detect turning points?


ad