MICRO LEVEL FORECASTS FOR INDIA’S EXPORT SECTOR SPECIFIC COUNTRIES AND SPECFIC COMMODITIES

MICRO LEVEL FORECASTS FOR INDIA’S EXPORT SECTOR SPECIFIC COUNTRIES AND SPECFIC COMMODITIES Analytics & Modelling Division NATIONAL INFORMATICS CENTRE Department of Information Technology Ministry of Communication & Information Technology New Delhi-110003

Major input to “India’s export model” for a financial year • Input to an econometric model to derive macro-level forecasts for strategic planning for India’s export – RIS Study • NIC has developed micro-level forecasts for a financial year for specific country and specific commodities (Total variables: 319)

Tools and technologies used : • Monthly time series behavior is captured through Neural network • methodology. • Final model selected has been simulated with-in and outside sample and once stabilized with regard to error statistics forecasts are generated . • 4Thought/Freefore is the state-of-the-art software tool from COGNOS which has been used to simulate and generate micro-level forecast India’s export for a financial year. • The reliability of the forecasts and the degree of confidence are • part of the final model

Table A: SUMMARY OF COUNTRY WISE DATA-SETS(Time Series Forecasting Carried for the listed number of data sets)

Table A: Contd. * Only single variable total export of “All Commodities” from India is considered

Table A: Contd. Includes both the series- monthly as well as annual - with 26 items in each series.

Univariate ARIMA MODEL • In regression analysis, if the error terms are not independent i.e. autocorrelated, the efficiency of the ordinary least-square (OLS) parameter estimates gets adversely affected and the standard error estimates are biased. • Auto Regressive Integrated Moving Average (ARIMA) model is fit for data with autocorrelated errors. This happens frequently with time series data. • The ARIMA procedure analyzes and forecasts equally spaced univariate time series data, transfer function data, and intervention data using the autoregressive moving-average or the more general autoregressive integrated moving-average (ARIMA) model. • An ARIMA model predicts a value in a response time series as a linear combination of its own past values, past errors, and current and past values of other time series.

Univariate ARIMA MODEL – Contd. • An ARIMA model contains three different kinds of parameters: • the p AR-parameters; • the q MA-parameters; • and the variance of the error term. • This amount to a total of p + q + 1 parameters to be estimated. These parameters are always estimated on using the stationary time series (a time series which is stationary with respect to it’s variance and mean).

NEURAL NETWORK • Neural networks cannot do anything that cannot be done using traditional computing techniques, BUT they can do some things which would otherwise be very difficult (time consuming). • Neural networks form a model from training data (or possibly input data) alone. • This is particularly useful when time series behavior is complex, and forecasts for a period is input for the next period forecast. • In a time series, behavior is complex, follows an unknown pattern, has large number of variables, Neural networks learns from the past behavior to develop corresponding complex algorithm and then predicts. (ARIMA: Univariate, Multivariate)

NEURAL NETWORK • -Neural networks are a form of multiprocessor computer system, with • simple processing elements • a high degree of interconnection • simple scalar messages • adaptive interaction between elements • A biological neuron may have as many as 10,000 different inputs, and may send its output (the presence or absence of a short-duration spike) to many other neurons. • Neurons are wired up in a 3-dimensional pattern.

Example A simple single unit adaptive network: The network has 2 inputs I0 and I1, and one output. All are binary. If W0 *I0 + W1 * I1 + Wb > 0, then Output is 1 If W0 *I0 + W1 * I1 + Wb <= 0, then Output is 0 We want it to learn simple : output is 1 if either I0 or I1 is 1. The network adapts as follows: change the weight by an amount proportional to the difference between the desired output and the actual output. As an equation: Δ Wi = η * (D-Y).Ii where η is the learning rate, D is the desired output, and Y is the actual output.

Feed Forward Neural Network

1. EU A. 30613 (Import of Shrimps and prawns frozen ) Model Statistics Model fit: 75.5004 Test fit: 78.4198 Overall fit: 76.4137 Adjusted fit: 65.3762 Iterations: 69 RMS error: 16.0265 Standard deviation: 16.1163 95% confidence interval: 32.2326 Mean absolute error: 12.5406 Mean absolute error (%): 8.7764 F-Statistic: 20.7884 Durbin-Watson Statistic: 1.0007

STATISTICAL MEASURES Model fit A measure of how well the model fits to the original data used in modeling. 100% represents a perfect fit. The model fit would approach 0% if you guessed the average value for the target. If the value is negative, the fit is worse than if you had guessed the average value for the target (that is, you had a naive model). The model fit is based on an adaptation of the standard R^2 statistic (that is, the proportion of the relationship explained between two variables). Adjusted fit The overall fit adjusted for the number of factors, and the number of rows of data contained in the model. This assumes that a more complex model or less data will produce a less predictive model.

Test fit The percentage of variation in the test set explained by the model. Test fit (or percent test fit) is a measure of how well the model predicts the test data, and is the best measure of the genuine predictive performance of the model. The test fit is an adaptation of the standard R^2 statistic. Unlike the model fit, the test fit can be negative. This happens if the current model yields a less accurate prediction of the test set than the naive model. Overall fit An indicator of the model quality, and is a combination of the model fit and the test fit. The overall fit is the percentage of the variation explained in the dependent variable.

B. 90111 (Export of Coffee neither roasted nor decaffeinated Model Statistics Model fit: 75.6046 Test fit: 73.7038 Overall fit: 75.2571 Adjusted fit: 64.0117 Iterations: 54 RMS error: 4.4336 Standard deviation: 4.4593 95% confidence interval: 8.9186 Mean absolute error: 3.1465 Mean absolute error (%): 34.767 F-Statistic: 18.7563 Durbin-Watson Statistic: 0.5446

C. 251611 (Import of Granite,crude/rough ) Model Statistics Model fit: 67.3539 Test fit: 61.8533 Overall fit: 66.0773 Adjusted fit: 56.5328 Iterations: 66 RMS error: 3.4094 Standard deviation: 3.4285 95% confidence interval: 6.857 Mean absolute error: 2.7858 Mean absolute error (%): 6.6183 F-Statistic: 12.4989 Durbin-Watson Statistic: 2.122

2. CHINA A. 670300 (Import of Human Hair, dressed, thinned, bleached or otherwise worked; wool or other animal hair or other textile materials, prepared for use in making wigs or the like ) Model Statistics Model fit: 85.0775 Test fit: 84.3229 Overall fit: 84.9804 Adjusted fit: 74.6557 Iterations: 30 RMS error: 1.0522 Standard deviation: 1.0571 95% confidence interval: 2.1143 Mean absolute error: 0.7224 Mean absolute error (%): 24.07 F-Statistic: 44.3208 Durbin-Watson Statistic: 1.2491

B. CHINA (Import of rest of the codes) Model Statistics Model fit: 87.8544 Test fit: 82.4129 Overall fit: 87.1099 Adjusted fit: 76.5264 Iterations: 126 RMS error: 2828.6593 Standard deviation: 2841.9707 95% confidence interval: 5683.9414 Mean absolute error: 2114.0386 Mean absolute error (%): 12.5192 F-Statistic: 52.9366 Durbin-Watson Statistic: 0.8763

C. CHINA (Unit value index for rest of the codes) Model Statistics Model fit: 61.607 Test fit: 76.4597 Overall fit: 66.02 Adjusted fit: 57.6874 Iterations: 46 RMS error: 6.1855 Standard deviation: 6.2157 95% confidence interval: 12.4314 Mean absolute error: 4.2899 Mean absolute error (%): 4.5121 F-Statistic: 14.5718 Durbin-Watson Statistic: 0.9655

3. USA A. 420310 (Import of Articles of apparel ) MODEL STATISTICS IN TERMS OF THE ORIGINAL DATA Number of Residuals (R) =n 70 Number of Degrees of Freedom =n-m 62 Residual Mean =Sum R / n .683103E-02 Sum of Squares =Sum R**2 121.321 Variance var=SOS/(n) 1.73316 Adjusted Variance =SOS/(n-m) 1.95679 Standard Deviation =SQRT(Adj Var) 1.39885 Standard Error of the Mean =Standard Dev/ .177655 Mean / its Standard Error =Mean/SEM .384512E-01 Mean Absolute Deviation =Sum(ABS(R))/n .992518 AIC Value ( Uses var ) =nln +2m 54.4962 SBC Value ( Uses var ) =nln +m*lnn 72.4841 BIC Value ( Uses var ) =see Wei p153 -95.0882 R Square = .887551 Durbin-Watson Statistic =[A-A(T-1)]**2/A**2 1.95492 D-W STATISTIC SUGGESTS NO SIGNIFICANT AUTOCORRELATION for lag1

B. 570110 ( Import of Carpets and other textile coverings of wool or fine animal hair MODEL STATISTICS IN TERMS OF THE ORIGINAL DATA Number of Residuals (R) =n 103 Number of Degrees of Freedom =n-m 97 Residual Mean =Sum R / n -.783408E-14 Sum of Squares =Sum R**2 1578.37 Variance var=SOS/(n) 15.3239 Adjusted Variance =SOS/(n-m) 16.2718 Standard Deviation =SQRT(Adj Var) 4.03383 Standard Error of the Mean =Standard Dev/ .409574 Mean / its Standard Error =Mean/SEM -.191274E-13 Mean Absolute Deviation =Sum(ABS(R))/n 3.10562 AIC Value ( Uses var ) =nln +2m 293.130 SBC Value ( Uses var ) =nln +m*lnn 308.938 BIC Value ( Uses var ) =see Wei p153 -26.2750 R Square = .858561 Durbin-Watson Statistic =[A-A(T-1)]**2/A**2 1.88808 D-W STATISTIC SUGGESTS NO SIGNIFICANT AUTOCORRELATION for lag1.

C. 610510 (Import of Men's or boys' shirts of cotton, knitted or crocheted ) MODEL STATISTICS IN TERMS OF THE ORIGINAL DATA Number of Residuals (R) =n 105 Number of Degrees of Freedom =n-m 99 Residual Mean =Sum R / n -.708456E-01 Sum of Squares =Sum R**2 10575.5 Variance var=SOS/(n) 100.719 Adjusted Variance =SOS/(n-m) 106.824 Standard Deviation =SQRT(Adj Var) 10.3355 Standard Error of the Mean =Standard Dev/ 1.03876 Mean / its Standard Error =Mean/SEM -.682020E-01 Mean Absolute Deviation =Sum(ABS(R))/n 7.73821 AIC Value ( Uses var ) =nln +2m 496.295 SBC Value ( Uses var ) =nln +m*lnn 512.219 BIC Value ( Uses var ) =see Wei p153 165.540 R Square = .848765 Durbin-Watson Statistic =[A-A(T-1)]**2/A**2 2.04567 D-W STATISTIC SUGGESTS NO SIGNIFICANT AUTOCORRELATION for lag1.

Conclusion : • Time Constraint : • No. of independent variable for which forecast are to be generated is • approximately 319. • As the time series data keep coming over time and forecasts are to be generated based on the latest monthly time series data within a period of approximately 2 weeks forecasts are to be generated for 319 independent variables. • Each variable forecast is an independent exercise. • Existing software tools arte not fully automated and the subject and tool specialist intervention is a must. • Traditional Statistical/Econometric model techniques/software tools are major constraint in terms of automation.

What is Required : • NIC can develop fully automated forecasting system by developing algorithms and testing with state-of-the-art tools available with limited interface. • The state of the art software tool and techniques will require funding. Manpower and resource mobilization to the tune of Rs. 10 lakhs and for a period of 8 months.

MICRO LEVEL FORECASTS FOR INDIA’S EXPORT SECTOR SPECIFIC COUNTRIES AND SPECFIC COMMODITIES