|
|
|
Methodology Statistical details of dynamic factor analysis are given in Zuur et al. (2003a) and a non-statistical introduction is presented in Zuur et al. (2003b), and Zuur & Pierce (2003c). Below, a non-technical introduction to dynamic factor analysis is given. Dynamic factor analysis is based on multivariate structural time series models. For simplicity, we start with univariate structural time series models.
Structural Time Series Models for Univariate Time Series The underlying principle in structural time series models is that the time series are modelled in terms of a trend, seasonal component, cycle, explanatory variables and noise. Each of these components is allowed to be stochastic. This means that a trend is not restricted to be a straight line and the cyclic and seasonal components are not necessarily neat cosine functions but can change over time. Obviously, if the time series are measured on an annual basis for a period of only 25 years, the seasonal component can be dropped. Furthermore, using a cyclic component for such short time series is less suitable. The resulting model for a univariate time series becomes:
The mathematical formulation of the 'explanatory variables' component is similar to that in linear regression models, namely: Xt* b, where Xt is a vector containing the explanatory variables at time t and b is a vector containing unknown parameters. Brodgar gives estimated values and 95% confidence intervals for b and these can be used to asses whether explanatory variables are related to the time series. The 'trend' component in (1) is based on a so-called random walk and takes the form:
Structural Time Series Models for Multivariate Time Series Now suppose that there are N time series. The univariate structural time series model can be applied on each individual time series. However, such an approach has two major disadvantages, namely:
An approach which overcomes these two disadvantages is dynamic factor analysis. This technique applies a dimension reduction on the N time series. Instead of the N time series, it estimates M common trends, where M is (hopefully) much smaller than N. This is the same principle as in dimension reduction techniques like principal component analysis, factor analysis, correspondence analysis, etc. The main difference is that dynamic factor analysis is designed for time series. The dynamic factor model, in words, is given by:
The mathematical formulation of the dynamic factor model can be found in Zuur et al. (2003a), Harvey (1989), Lütkepohl (1991), among others (see also the Brodgar manual for further references). To illustrate the underlying principle, the model formulation is presented for 5 time series (N=5) and 2 common trends (M=2):
where y1t is the value of the first time series at time t, z1t and z2t (in blue) are the two common trends at time t, the term Xt * b1 represents the effect of the explanatory variables on y1t, and level1 represents a constant level parameter for the first time series. The terms ai1 and ai2 (in red) are the factor loadings. Hence, each time series is modelled as the sum of:
The factor loadings determine how important a particular common trend is for the time series. For example, if a11 is large and a21 is close to zero, then this indicates that the first time series is mainly determined by the first common trend (except for possible effects of the explanatory variables). It is interesting to compare factor loadings with each other. For example, if a12 is large and a22 is close to zero (as was the case for the first time series), then this indicates that the second time series is related to the first common trend as well, and hence the first two time series are driven by the same mechanism. On the other hand, if a12 is close to zero and a22 is large, then this indicates that the two time series are driven by two different underlying mechanisms. Another scenario is that a11 is large and positive, a21 is close to zero, a12 is large and negative and a22 close to zero. In this case, both time series are driven by the same process, but behave in an opposite way. Hence, by comparing factor loadings of different time series, interactions between time series can be inferred. Besides factor loadings, dynamic factor analysis estimates common trends. The common trends are required to be smooth functions (they are modelled as in equation (2)). The common trends represent the patterns in the data which cannot be explained with the measured explanatory variables.
Number of common trends Just as in principal component analysis, one has to choose the number of common trends. A convenient choice is to use 2 common trends because factor loadings can then be plotted versus each other in one graph. Obviously, a more formal tool is needed to choose the number of common trends, for example the Akaike's Information Criterion (AIC). The AIC is a function a the measure of fit (value of maximum likelihood function) and the number of parameters. In the example below, it is shown how to use the AIC.
References Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter. Cambridge University Press. Lütkepohl, H. L. (1991). Introduction to multiple time series analysis. Berlin, Springer. Zuur, A.F., Fryer, R.J., Jolliffe, i.T., Dekker, R. and Beukema, J.J. (2003a). Estimating common trends in multivariate time series using dynamic factor analysis. Environmetrics, 14(7):665-685. Zuur, A.F., Tuck, I.D. and Bailey, N. (2003b). Dynamic factor analysis to estimate common trends in fisheries time series. Canadian Journal of Fisheries and Aquatic Sciences, 60:542-552. Zuur A.F. and Pierce G.J. (2004-In Press). Common trends in Northeast Atlantic Squid time series. Journal of Sea Research. Will appear in Q1 issue of the Journal of Sea Research.
|