A dynamic parameterization modeling for the age-period-cohort mortality. The case of E&W males mortality experience 1841-2006. Two important, general approaches have been developed for modeling and forecasting mortality, in discrete-time: The curve fitting approach and
A dynamic parameterization modeling for the age-period-cohort mortality
The case of E&W males mortality experience
Two important, general approaches have been developed for modeling and forecasting mortality, in discrete-time:
The curve fitting approach
The principal component (PC) approach
The curve fitting approach involves fitting parametric curves to the cross-sectional mortality rates in age effects, and extrapolating the parameter estimates in time effects As mentioned by many authors,in the curve fitting approach, the deficiency of this approach comes from the high parameter time series interdependencies which cause serious difficulties for forecasting and for estimating the forecast prediction-intervals. Building a multivariate time series for the parameters is theoretically possible but seriously complicates the approach and can lead to computational intractability.
Recent developments in mortality modeling have tended to be extrapolative in nature, with the principal components (PC) approach being prominent.The PC approach involves computing PCs to the table of age-time-specific (log)-mortality rates, to obtain a linear transformation of the data in lower dimension and simplified structure, and extrapolating the PCs.
The main statistical tool used was least-squares estimation via singular value decomposition (SVD) of the matrix of the log age specific observed forces of mortality.
The most well known PC approach is the Lee - Carter (LC) model (1992) :
where denote the singular value and respective left and right singular vectors of
Under theproposed approach, we build a GLM, modeling the actual number of deaths as independent realizations of Poisson random variables, for each calendar year independently, treating age as an explanatory variate, and using a log link function and (orthonormal) polynomial predictor structure in age effects:
The optimum degree k for the design matrix (orthonormal polynomials) can be determined by maximizing the Bayesian Information Criterion (BIC) :
where, l is the (maximum) value of the total likelihood for all the n-calendar years (with n*a observations and n*k parameters).
Next we can apply SVD to the random matrix B to extract the mortality dynamics in age-time effects. Then, the vector of the graduated log-central death rate, for all ages, in year t, inherits the structure of a dynamic mortality factor model:
where the random vector denotes the main age profile, is the matrix of the factor loadings,
is the matrix of PC scores, andP is the matrix of eigenvectors.
Sparse Principal Component Analysis
Although PC analysis is a classic tool for analyzing multivariate data, by reducing the dimensionality of the data, one of the key shortcomings of PC analysis however is that these factors or PCs are linear combinations of all variables, that is, all factor coefficients (or loadings) are non-zero.
Additionally, if the input data consists of non-stationary time series, a single linear combination of all the time series can solely explain the variability, because simultaneous drifting of the series may register as correlations of the columns.
Further, in empirical studies with various mortality experiences, using PCA, results in factor loadings structures with significant values for both positive and negative loadings.
This undesirable feature gives high interdependence structure for different range of ages, explaining different mortality dynamics with the same factor, resulting in spurious interpretations and forecasting.
SPCA improves this problematic dependent structure by giving a better clustering of significant high factor loading values, seeking a trade-off between the two goals of expressive power (explaining most of the variance) and interpretability (making sure that the factors involve only a few variables).
Luss and Aspremont (2006), solves a penalized formulation of the problem (given a covariance matrix
find sparse factors which explains the maximum amount of variance):
Maximize subject to
Where λ is a scalar which defines the sparsity (higher values of λ gives more sparsity).
Thus, applying SPCA to the random matrix B and keeping an optimum subset p(<k) of the SPCs, which explains the “majority” of the variance, leads to
Where denote the first p-columns of the matrix of sparse factor loadings and the SPC scores respectively, and is the error component which denotes the deviation of the model represented by the excluded SPCs.
After some simple algebraically manipulations we obtain
The model now can be viewed as a member of the class of log-linear models, with age and time being the two main random effects (similar to the so-called association model of Goodman, 1991).
Optimum number of SPCs
The values are very informative for the number of components retained, since
1. The decomposition implies that the value can be viewed as mean indices of the factor loadings (or associations) for all the ages with the i-factor
2. The value is a measure of the importance (or a weight) for the i-factor in the construction of the time additive term
Confidence intervals for the values can be constructed by bootstrapping techniques. Alternatively, if we just count the number the values change sign in the bootstrap procedure, then the rate of times where the sign is changed (say a%) will define a confidence coefficient or a confidence level (CL=1-a%) for each SPC. If the values do not change sign in the bootstrap procedure (at any given significant level), it is an indication of robustness and the significance of the particular factor.
Next, we can choose the ‘optimum’ value λ(the scalar which defines the sparsity in the SPCA), to be the value that maximizes the number of significant SPC, as defined by the CL criterion (at a given significance level).
Then, among these significant SPC, we can specify the key age groups by taking, for each age, the maximum association value (MAV), defined by:
It is desirable that the above method should produce distinct key age groups of neighbouring values, which belong to the same components. This desirable feature would lead to a clear picture for the mortality dynamics.
In the presence of cohort effects, the GLM estimates are marginal estimates of age and period effects. If we settle for the residual cohort effect conditional on the already estimated age and period effects we have:
In which the term is treated as
an offset in the rectangular DEBF.
In the same way with the age period effects we apply the eigenvalue decomposition to the covariance matrix of these conditional (cohort) GLM estimates, leading to graduated central mortality rates in age – period – cohort effects:
defined in the rectangular DEBF.
Forecast estimates become (area DCGF)
Using the same arguments, the log-graduated central mortality rates, can be alternatively decomposed as a member of the class of log-linear models with three main random effects:
where denotes the independent main cohort effect:
For forecast purposes, for each time SPC, we advocate a specific class of dynamic linear regression (dlr) models under state space models and Kalman filter techniques:
for each calendar year t (the so-called regressor) and for each time related SPC (i=1,…,p), with the slope as stochastic time variable parameter (TVP) that follow a random walk (RW) process.
The cohort related PCs are modelled as independent RW plus noise time series model, since conditional on age-time effects, we may assume that the cohort related PCs are mean reverted stochastic processes
There are several sources of randomness in the proposed model. The true value of the log central mortality rates, assuming the model specification is correct, is given by
Where e( ) denotes the errors in estimating the relevant quantity, is the residual error in fitting the model for age x and cohort t-x using a finite set of PC functions, and
is the observation error which comes from the random variation of deaths in the Poisson distribution.
Assuming independence among all those different sources of error the total variance is given by
In order to generate an interval forecast, we can obtain the following 95% confidence interval estimate for :
From various mortality investigations, it has been observed that the number of people surviving up to older ages has increased considerably. Crude mortality rates at these ages are usually lack the required quality demanded for the construction of complete lifetables.
Empirical observations shows that the curve of mortality rates, in logarithmic scale, presents a concave shape for the oldest ages. One of the most well known approach to capture this empirical evidence is the Coale-Kisker method (1990), where assumes that the exponential rate of mortality increase at the oldest ages is not constant but declines linearly.
The exponential rate of mortality increase is defined as
at age x, up to ultimate age ω.
We note that in Gompertz law, i.e.
A possible extension to Gompertz law is proposed if we assume for the slope term, b, follows an integrated random walk (IRW) time series model in order to capture the empirical patterns of mortality at oldest ages, such that the projected exponential rate of mortality increase is a linear function of the oldest ages. A more robust approach is if we apply this modelling structure to the age profile
E&W male mortality experience, calendar years 1841-2006, and individual ages 0,1,…,89.
The data are freely provided by the “Human Mortality Database” (www.mortality.org).
Sparse eigenvalues, percentage variance explained, mean associations, CL criterion, with sparse parameter λ=36, in age-time effects
Observed-projected cohort life expectancies at birth (left curves) and observed-projected cohort expected age at death given survival at age 65 (right curves), with associated bootstrap 95% percentile CIs,
Log central mortality rates, smoothed-predicted log central mortality rates and associated CI, for various ages