Data and code for this post here.
After looking at the hindcasts of the models in my previous post, I wondered in that post if perhaps the differences between the MMM and each of the different model runs of the ensemble might be considered “weather noise”, able to be captured as an AR(1) process. Assuming the MMM represents the actual forced component, this would explain why it generally seems to perform better relative to actual observations (which could be considered simply another instance of the forced component (MMM) + AR(1)) than any other individual runs.
Well, after examining the different model runs relative to the MMM, and using the auto.arima (hat tip) and/or ar functions on them in R, it certainly does not seem like these difference are merely the result of AR(1) noise. Some models generally seem to trend higher while others trend lower than the MMM, which is not suprising given their underlying differences. But it does raise the question again of why the MMM seems to perform better.
Nonetheless, the errors between the MMM and the HadCRUT annual anomalies DO seem to act like an AR(1) process. Using the approximate auto-correlation of 0.5 and rnorm standard deviation of 0.1 that were gleaned from these errors, the following image compares what it would look like with MMM + weather noise of AR(1) vs. just MMM and all the model runs.
In both graphs, there were 54 different yellow “runs”. The bottom image should basically be a re-creation of the graph in the IPCC (seen here in my last post:), although there are some differences. The IPCC one says it was created from 58 simulations and 14 different models, whereas what I had access to were 54 different ensemble members from 22 different models.
As you can see, the top image shows tighter bounds, but the observations still seem to stay within it for the 20th century. To me, going forward, this would seem to be a better way to estimate noise in future predictions than simply showing all models, as it allows us treat the MMM as the “forced” component without rogue terrible models leading to ridiculously large error bounds. It also allows the chance to generate as many model “runs” as we want without needing a model simulation for each. Of course, the negative aspect of assuming a noise model based on the errors between observations and the MMM is that those parameters for the AR1 process can be uncertain…simulating a series and then trying to extract the parameters again can show some wildly different results.
Lucia has relevant discussion on this here, which got me thinking originally along this path.