|
|
|
Chronological clustering MAFA and dynamic factor analysis are techniques which can be used to estimate trends in multivariate time series. Application of these techniques on biological data assumes that the underlying ecosystem is gradually changing over time. However, these techniques are less appropriate if the ecosystem changes rapidly from one state to another. Ordinary clustering techniques might be applied to identify sudden changes, but these methods are likely to result in groups of years that are difficult to interpret. For example, how does one explain a group containing 1970, 1976, 1992 and 2003? Chronological clustering, as the name already suggests, is especially designed for clustering of time series. The method is fully described in Legendre et al. (1985), Bell and Legendre (1987), and Legendre and Legendre (1998). The first two papers are downloadable from Legendre's website (search on "chronological clustering" in Google), and are easy to read for non-statisticians. Explaining chronological clustering is best done with an example. Hare and Mantua (2000) used 100 biological and physical time series from the North Pacific Ocean. These were variables like atmospheric indices (teleconnection index, North Pacific index, Southern oscillation index, Artic oscillation, etc.), terrestrial indices, oceanic indices and biological indices (zooplankton biomass, Coastal Washington oyster condition index, etc.). Most of the time series were available annually from 1965 onwards. To give an impression of the variability and shape of these time series, all (standardised) series are plotted in Figure A.1.
Figure
A.1 Hundred standardised time series from Hare and Mantua (2000).
Hare and Mantua (2000) concluded that there were two major shifts in these time series, namely in 1977 and 1989. Rudnick and Davis (2003) criticised technical aspects of regime shift analysis. Here, we show that chronological clustering identifies the same regimes. The graphical output of Brodgar is presented below. Chronological clustering requires two parameters to be set, namely the connectedness and the fusion level alpha, which is a clustering sensitivity parameter. Legendre et al. (1985) suggested to use different values of alpha and to keep the connectedness constant. The effect of alpha is as follows. Small values (0.05, 0.01, 0.1) provide a birds-eye overview; the most important breaks in the time series are visualised. Higher values of alpha (0.2, 0.3, 0.4) give more detailed information and therefore show more breaks in the time series. Brodgar can presents results for different values of alpha, but also for a single alpha value. Breaks are identified by vertical lines and the numbers represent the groups. The figure below shows that at the birds-eye view, there are two major breaks, namely in 1977 (a bar represents the first year of a new group) and 1989. For larger values of alpha, we can see that the 80's were reasonably stable, and that more (short term) variation occurred during the 1970s.
Figure A.2. Results of chronological clustering applied to the 100 time series analysed in Hare and Mantua (2000). A vertical line corresponds to the start of a new group. Numbers refer to groups. A small alpha value (0.01) corresponds to a birds eye view, and shows the most important breakpoints in the data. Larger alpha values visualise smaller scale variation.
Legendre et al. (1985) developed an a posteriori test, and this is also included in Brodgar. It can be used to test if for example groups 1 and 3 (for alpha=0.01) belong to the same group. Formulated differently, one might ask the question whether the ecosystem changes back to it original state. If this is indeed the case, the a posteriori test would indicate that groups 1 and 3 were similar. The stars in the figure are so-called singletons. This is a point that does not belong to the group immediately before and after it. See Legendre et al. (1985) for a detailed interpretation of singletons. There is one other important point we need to discuss, namely the measure of similarity between time points (years in this case). Suppose that the time-by-variable matrix is in the following format:
Chronological clustering calculates the association between the rows T1 and T2, between the rows T2 and T3, etc. Legendre and Legendre (1998) argued that the correlation coefficient is not a good option. Legendre et al. (1985, 1998), and Bell and Legendre (1987) used Whittaker's index of association (see Brodgar manual) in combination with chronological clustering. This index transfers a row in the table into a row of fractions (of the row total), and then compares two rows by taking the sum of the absolute differences of the fractions. This index should only be used if the values are non-negative. If the variables Y1,..,Y4 are on a different scale (e.g. Y1 goes from 0 to 1 and Y2 from 1 to 1000), or if they are different types of variables (e.g. Y1 is species abundance, Y2 is the NAO index, Y3 is temperature and Y4 is wind speed), a sensible approach is to standardise the variables (Yj's) first, and then use the Euclidean or absolute difference metric. The figure above was obtained by standardising the 100 time series, and using the Euclidean distance function.
References. Bell, M.A. and Legendre, P. (1987). Multicharacter Chronological Clustering in a Sequence of Fossil Sticklebacks. Systematic Zoology, 36: 52-61. Hare, S.R. and Mantua, N.J. (2000). Emperical evidence for North Pacific regime shifts in Pacific North America. Progress in Oceanography, 47: 103-145. Legendre, P., Dallot, S. and Legendre, L. (1985). Succession of species within a community: Chronological clustering, with application to marine and freshwater zooplankton. Am. Nat. 125: 257-288. Legendre, P., and L. Legendre. 1998. Numerical ecology. Second edition. Elsevier, Amsterdam, The Netherlands. Rudnick, D.L. and Davis, R.E. (2003). Red noise and regime shifts. Deep Sea Research I, 50: 691-699. The first and third papers can be downloaded from: http://www.fas.umontreal.ca/biol/legendre/reprints/
|