Test pixelwise correlation between two time series of gridded satellite data in R
Satellite time series data are useful for studying how biophysical variables change over time and understanding what causes those changes. Recently, I was looking into correlating two time series datasets over Africa to look at the relationship between net primary production (NPP) and rainfall.
After a futile attempt to find an “out-of-the-box” software package that does this, I created an R function to speed things up. Rather unimaginatively, I called it “gridcorts” (Gridded Correlation for Time Series Raster Data). I’ll apply it to two time series datasets of 11 layers (grids) of annual rainfall and 11 layers annual NPP over Africa, each layer represents a year between 2000 and 2010. Stacking these two datasets results in a raster stack of 22 layers. Both datasets are in 0.05 degree grid cell size, meaning that each layer has 1989897 cells (1443 rows, 1379 columns). For each location in a dataset (e.g. rainfall), the function skewers the entire 11-year time series to get a vector of values then correlates pixels from that location with the corresponding one in the other dataset (e.g. NPP). Depending on which method you use, this can tell you how two variables are related across time and space.
The code took 410 seconds, or 7 minutes, on my four-year-old, quad-core, i7-5600U laptop running 64-bit Windows 10 and R 3.6.1. I think that’s a reasonable time to process 2 million pairwise correlations and produce two grids of Spearman’s rank correlation coefficient and significance values. The engine of the function is the stats package’s “cor.test” function that tests for association between two paired samples based on the either parametric Pearson’s r, or the nonparametric Spearman’s rho and Kendall’s tau.
The figure below is an example output using Spearman’s rank correlation coefficient. Please note that Spearman’s rho might give you a warning message:
In cor.test.default(mtrx[i, 1:(layers/2)], mtrx[i, ((layers/2) + ... : Cannot compute exact p-value with ties
So, if you choose a nonparametric method, I would recommend Kendall’s tau instead. More on this topic.
The function was developed for the paper below, so please cite it if you use any part of the code in a publication or if you build upon it.
Citation: Abdi, A.M., Vrieling, A., Yengoh, G.T., Anyamba, A., Seaquist, J.W., Ummenhofer, C.C. and Ardö, J., 2016. The El Niño–La Niña cycle and recent trends in supply and demand of net primary productivity in African drylands. Climatic Change, 138(1-2), pp.111-125. [PDF]
The function is available as a Gist on Github. The NPP and rainfall data I used to test it can be downloaded here.
NOTE: If you are getting NAs as outputs, this could happen in one of two cases, either (1) the layers mostly have NA values or (2) they have less than 4 data values in which case a correlation cannot be established. You have to make sure that you have enough data points in order to establish the correlation.
Happy coding!