With Google Trends Predicting the Present Hyunyoung Choi Hal Varian June 2009
Problem statement • Government agencies and other organizations produce monthly reports on economic activity • Retail Sales • House Sales • Automotive Sales • Unemployment Problems with reports • Compilation delay of several weeks • Subsequent revisions • Sample size may be small • Not available at all geographic levels Google Trends releases daily and weekly index of search queries by industry vertical • Real time data • No revisions (but some sampling variation) • Large samples • Available by country, state and city Can Google Trends data help predict current economic activity? • Before release of preliminary statistics • Before release of final revision 2
Categories in Google Trends by Query Shares Note: Queries from 2009-01-01 to 2009-04-30 & Growth Comparison w/ the same time window 3
Geography Time window Category 5
Property Management Home Insurance Home Inspections & Appraisal Real Estate Agencies Home Financing Rental Listings & Referrals Subcategories under Real Estate by Query Shares 6
Depicting trends • Google Trends measures normalizedquery share of particular category of queries – controls for overall growth • Often useful to look at year-on-year changes to eliminate seasonality. • Illustrate correlations and covariates. • Improving predictions • Forecast time series using its own lagged values and add Trends data as a predictor. • Statistical significance? • Improved fit? • Improved forecasts? • Identify turning points? 9
Forecasting primer • Basic forecasting models • Autoregressive: value at time t depends on • Value at time t-1 • Seasonal adjustment: value at time t depends on • Value at time t-12 • For monthly data • Transfer function: value at time t depends on • Other contemporaneous or lagging variables • Seasonal autoregressive transfer model: Value at time t depends on • Value at time t-12 (seasonality) • Value at time t-1 (recent behavior) • Other lagging or contemporaneous variables (such as Google Trends data) • Typical question of interest • How much more accurate forecasts can you get from additional variables over and above the accuracy you get with the history of the time series itself? 11 11
Model New Home Sales • Recent Search Activity on • Real Estate Agencies • Rental Listings & Referrals • Home Inspections & Appraisal • Property Management • Home Insurance • Home Financing Exogenous Variables Housing affordability with Average/Median Home Price Time Series Recent Trend with New Home Sales at t-1 Seasonality with New Home Sales at t-12 Google Trends
Predicting the present New Residential Sales from US Census Google Trends Real Estate by Category • Monthly release 24 – 28 days after the month • Seasonally adjusted • National and Regional aggregate • Home Inspections & Appraisal • Home Insurance • Home Financing • Property Management • Rental Listings & Referrals • Real Estate Agencies 13
Analysis and Forecasting Model: Yt = 446.1 + 0.864 * Yt - 1 – 4.340 * us378.1 + 4.198 * us96.2 – 0.001 * AvgPt – 1 Yt : New house sold at t-th month AvgPt – 1: Average Sales Price of New One-Family Houses Sold at (t-1)-th month us378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1st week us96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2nd week July 2008 Actual = 515K Predicted = 442.98K Z-score = 2.53 August 2008 Prediction = 417.52K 15
Analysis and Forecasting • Observations • Since 2005 new house sales have been decreasing, with little seasonality • Google Trends captures seasonality & recent trends • Positive association with Real Estate Agencies (96) • Negative association with Rental Listings & Referrals (378) and Average Price 16
Adventure Travel Bus & Rail Cruises & Charters Attractions & Activities Car Rental & Taxi Services Hotels & Accommodations Air Travel Vacation Destinations Subcategories under Travel by Query Shares 18
Travel to Hong Kong Visitors Arrival Statistics from Hong Kong Tourism Board Google Trends Travel by Category • Monthly summaries release with 1 month lag • Reports Country/Territory of Residence of visitors • Data available 2004-2008 • Hotels & Accommodations • Air Travel • Car Rental & Taxi Services • Cruises & Charters • Attractions & Activities • Vacation Destinations • Australia • Caribbean Islands • Hawaii • Hong Kong • Las Vegas • Mexico • New York City • Orlando • Adventure Travel • Bus & Rail 19
Analysis and Forecasting Model: log(Yi,t) = 0.664 + 0.113 * log(Yi,t-1) + 0.828 * log(Yi,t-12) + 0.001 * Xi,t,2 + 0.001 * Xi,t,3 + 0.005 * FXrate i,t + ηi, + ei,t ei,t ~ N(0, 0.09382), ηi ~ N(0, 0.02282) Yi,t = Arrival to Hong Kong at month t and from i-th country Xi,t,1 = Google Trend Search at 1st week of month t and from i-th country Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th country Xi,t,3 = Google Trend Search at 3rd week of month t and from i-th country FXrate i,t = Hong Kong Dollar per one unit of i-th country’s local currency at month t. Average of first week’s FX rate is used as a proxy to FX rate per each month. 21
Analysis and Forecasting • Conclusion • Arrival at time t is positively associated with arrival at time t-1 and arrival at time t-12. • It shows strong seasonality and autocorrelation • Arrival at time t is positively associated with searches on [Hong Kong]. • Arrival at time t is positively associated with FX rates. • When the local currency appreciates relative to Hong Kong Dollar, visitors to Hong Kong increase. 23
US Auto Sales by Make US Auto Sales by Make Google Trends under Vehicle Brands Category • Monthly summaries released 1 week after end of month • Data available by Car Sales, Truck Sales and Total Sales for each make • Data available from 2003-2008 • Source: Automotive News Data Center • Google Trends subcategory Vehicle Brands. • Weekly Search query index • Total 31 verticals in this subcategory • 27 verticals matching to Monthly Sales available 25 25
Google Categories under Vehicle Brands NOTE: Area represents the queries volume from first half year 2008 and the color represents queries yearly growth rate 26
Auto Sales by Make (Top 9 Make by Sales) Monthly Sales vs. Google Trends at Second Week of each month 27 27
Analysis and Forecasting Fixed effects model: log(Yi,t) = 2.4276 + 0.2552 * log(Yi,t-1) + 0.4930 * log(Yi,t-12) + 0.0005 * Xi,t,2 + 0.0014 * Xi,t,2 + ai * Makei + ei,t ei,t ~ N(0, 0.13472) , Adjusted R2 = 0.9829 Yi,t = Auto Sales of i-th Make at month t Xi,t,1 = Google Trend Search at 1st week of month t and from i-th make Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th make Makei =Dummy variable for Auto Make ai = Coefficient to capture the mean level of Auto Sales by Make ANOVA Table Df Sum Sq Mean Sq F value Pr(>F) trends1 1 12.89 12.89 710.3542 < 2e-16 *** trends2 1 0.05 0.05 2.7987 0.09455 . log(s1) 1 1532.95 1532.95 84452.7530 < 2e-16 *** log(s12) 1 24.07 24.07 1325.9741 < 2e-16 *** as.factor(brand) 26 3.34 0.13 7.0696 < 2e-16 *** Residuals 1480 26.86 0.02 28 28
Analysis and Forecasting • Conclusion • Sales at time t are positively associated with Sales at time t-1 and Sales at time t-12. • Sales show strong seasonality and autocorrelation • Monthly Sales are positively correlated to the first and second weeks search volume of each month. • If the search volume increase by 1%, the sales volume will increase by an average of 0.19%. 30 30
YoY Growth in Initial Claims & Google Search According to the NBER, the current recession started December 2007. National unemployment rate passed 5% in mid 2008 and search queries on [Welfare and Unemployment] also increased at same time.
Strong Autocorrelation in Initial Claims Time Series Autocorrelation Function
Initial Claims Before/After Recession Started California New York
Time Window for Analysis Recession Starts Window For Long Term Model Window For Short Term Model
Model Signif. codes: 0.001 ‘***’ 0.05 ‘**’ 0.01 ‘*’ Reference ARIMA(0,1,1) X (1,0,0)12 Model ARIMA(0,1,1) X (1,0,0)12 Model With Google Trends Model Fit improved significantly – smaller Standard deviation, high log likelihood and smaller AIC Initial Claims are positively correlated with searches on Jobs and Welfare.
Long Term Model: Prediction Comparison with MAE With Google Trends, the out-of-sample prediction MAE decreases by 16.84%. Prediction with rolling window from 1/11/2009 to 4/12/2009 Prediction Error at t: Mean Absolute Error:
Short Term Model: Prediction Comparison with MAE With Google Trends, the out-of-sample prediction MAE decreases by 19.23%. Prediction errors are within the same range as LT Model. Fit improvement is better with ST Model.
Summary Google Trends significantly improves out-of-sample prediction of state unemployment, up to 18 days in advance of data release. Mean absolute error for out-of-sample predictions declines by 16.84% for LT Model and 19.23% for ST Model. Further work Can examine metro level data Other local data (real estate) Combine with other predictors Detect turning points?