In part one, I noted a number of potential issues with using the Forster and Gregory (2006) method over a period dominated by ENSO activity to diagnose climate sensitivity. And yet, as shown in that post, using the method on GFDL CM2.1 seemed to yield fairly accurate results, even slightly underestimating climate sensitivity (it appears to do the same with ECHAM-MPI as well). So, if there are indeed issues that can lead to overestimates of the sensitivity (as it does in 12 of the other models), why doesn’t it overestimate sensitivity in these other two models?
One obvious answer would be that some of these potential sensitivity-inflating issues are not present in the models, such as errors in the independent variable (i.e. temperature measurements) and unknown radiative forcings (the Spencer and Braswell argument). But that still leaves other issues, such as the timing offset between atmospheric and sea surface temperature changes (and hence the measured temperature radiative response being off with respect to surface temperature changes), along with the large difference between short-term and long-term cloud feedbacks in the models, both of which ARE included in the GFDL CM2.1 and ECHAM-MPI models.
To investigate, I used the Soden GFDL 2.1 kernels, along with the last 100 years of the GFDL CM2.1 pre-industrial control run from the PCMDI archive, to separate out the radiative responses by climate component (water vapor, surface albedo, and temperature), then ran regressions against combined tos (sea surface temperature) + land tas (2 meter air temperature) for 11 year periods (roughly matching the 2000 – 2010 CERES data we have) to see what the method would yield for these instantaneous feedbacks. As the actual sensitivity and feedbacks per doubling of CO2 for this model is known, we can compare these to the "instantaneous" feedbacks as diagnosed by the FG06 method. The long-term feedbacks for GFDL CM2.1 are from Soden and Held (2006). The remaining feedbacks come from the median of the estimators from my different regression periods in the control experiment. Units are in W/m^2/K.
|12 mo avg||3 mo avg||1 mo avg||Long Term|
|Temperature (Planck + Lapse Rate)||-3.72||-3.75||-3.81||-4.36|
Please note that this does not imply the GFDL CM2.1 has a negative net climate feedback by the typical definition, since I am including the Planck response in the overall feedback presented.
Anyhow, the temperature, water vapor, surface albedo, and cloud flux contributions determined using this technique seem to explain about 97.5% of the variance in TOA radiative fluxes in the GFDL model over this period of the control run. Unfortunately, the variance in the residuals seems to be fairly well correlated with temperature (r^2 = 20%, higher than the cloud and surface albedo portions), so that there seems to be a substantial leftover instantaneous response apart from these other feedbacks. The “Residual” row listed in the table above is merely the difference between the “Overall” diagnosed feedback (from overall flux anomalies) and that of the sum of the individual feedbacks (using the kernel technique). To ensure that this is not merely a statistical artifact, I also regressed the residual fluxes after removing the various climate components against temperature, yielding values near those in the “residual” row.
Nonetheless, this may go a bit towards solving a couple of those mysteries. The fairly large underestimate of the temperature response ( ~ 0.5 to 0.6 W/m^2/K) is very likely be the result of the timing offset / atmospheric temperature lag time previously discussed, and this can have serious consequences on estimated sensitivity (the difference, for example, between 3K and 2.15 K sensitivity). However, we don’t see an overestimate of sensitivity because a) there appears to be a unique short-term response going on here, and b) the underestimate of the cloud feedback, which is significantly more positive in the long-term for this model than it is in the short-term. From Dessler (2010), we see that only one other model significantly underestimates the positive cloud feedback in the short-term: ECHAM-MPI. Part (b) leads me to consider that the reason the FG06 method does NOT overestimate the sensitivity in these two particular models is because of this relationship between short-term and long-term cloud feedbacks. It’s worth noting that Dessler (2010) calculated an even smaller short-term cloud feedback from GFDL CM2.1 than here…I use a different part of the control run, but otherwise I’m not sure how to explain the difference.
The water vapor calculation is pretty close, although perhaps a bit underestimated using the instantaneous method. Surprisingly, the short-term albedo and long-term albedo estimates are about the same. I think this is surprising since typically the albedo feedbacks are considered slower feedbacks that won’t fully manifest in the short-term. Finally, one may notice that the GFDL CM2.1 overall feedback of –1.37 W/m^2/K from Soden and Held (2006), if converted to ECS in a typical manner (3.8 W/m^2 / 1.37 W/m^2/K = 2.78 K) does not correspond to the published ECS of that model (3.4 K). You get closer if you use the 4.3 W/m^2 TOA forcing described in Soden and Held (2006) for a doubling of CO2 instead of 3.8, but it still does not seem to explain how CM2.1 can have a significantly higher radiative response to surface temperature changes while also having a higher sensitivity to a CO2 doubling than CM2.0. Unless the estimated CO2 forcings are that much different? This is why I have left a “?” in the residual row for the long-term column.
Regardless, I will be investigating this method in some other control runs, along with more periods in the GFDL control run. From these results alone, however, my tentative conclusion would be that using the radiative fluxes over a period similar to 2000-2010 to measure climate sensitivity, in the absence of errors in the regressors and no noise due to unknown radiative forcings, would lead to:
1) Likely underestimates of the temperature response (due to the timing offset)
2) Inaccuracy in the cloud feedback, although the direction is unknown (the models are split on this, at least according to Dessler 2010).
3) Some “residual” response component, whose magnitude and sign is unknown with respect to the different models
Of course, testing the method on more models may change things. That’s a lot left to do.
Code and data for this post available here.