I originally posted on the Forster and Gregory 2006 method for determining climate sensitivity back in June. Since then, I’ve come across a number of issues (mostly discovered by others) on why this method may not constrain the sensitivity with the accuracy previously assumed. I will likely need to create a page of resources on this alone, because the number of links to blog posts and papers could easily grow unmanageable if I were to repeat them in every post. However, for now I’ll mention the recent Science of Doom and Isaac Held posts, and then begin listing some of the difficulties I see (particularly without using a limited time period):

- The unknown radiative forcing. This is Dr. Spencer’s main objection (SB08, SB10, and part of SB11), which suggests the forcing will lead to an underestimate of the response (and hence and overestimate of sensitivity) if it is correlated with T. Murphy and Forster 2010 argue that the effect is small, but it doesn’t look like their use of such a deep mixed layer was appropriate. Science of Doom has a good introduction to this issue in the post I link to above.
- There is not a strong reason to believe that the radiative response to a temperature change is both constant across all seasons and linear (also briefly mentioned in the SoD post). However, it theoretically could be a reasonable approximation.
- Dessler 2010 concludes that there is unlikely to be a correlation between the short-term/instantaneous cloud feedback and the long-term cloud feedback (at least, it’s not present in models). The set-up of FG06 implicitly only includes the short-term feedback, and since the cloud feedback is one of the biggest uncertainties surrounding equilibrium climate sensitivity, it suggests that the FG06 method could not necessarily diagnose the ECS in this case.
- There is a timing offset between the sea surface temperature changes and the bulk of the tropospheric temperature changes (1 – 3 months). When using a smaller period with interannual variations greater than the long-term trend (e.g., the CERES era), and with the bulk of the Planck response coming from the atmosphere, the timing offset would lead to an underestimate of the response.
- Using the method in the control runs of models generally leads to a large overestimate of the climate sensitivity (which suggests major issues with the method, the models, or both).
- Errors in the surface temperature measurements will yield underestimates of the response (and overestimates in the sensitivity). I specifically point to the surface temperature measurements rather than the satellite flux measurements because they act as the independent variable in the regressions, meaning they’ll lead to regression attenuation using traditional OLS methods (assuming Gaussian white noise for the flux measurements, we would get a lower correlation but not necessarily an underestimate).
- The sampling error using only (for example) 10 year periods could make it difficult to diagnose accurately.

For this post, I will briefly mention #5, and then use the GFDL 2.1 500 year control run from PCMDI to explore #6 and #7.

### General Inaccuracies using the GCM Control Runs

For point #5, I will point to a perhaps unexpected place…figure 2 of Dessler 2011:

The black lines are from the control runs of the models. Note the regression slopes at 0 lag (which corresponds to the FG06 method), which I’ve circled in green. Now, the average ECS of these models we know to be about 3 K, which corresponds to a radiative response of about (3.8 W/m^2 / 3 K) = 1.27 W/m^2/K for the radiative response, a number that we’d expect to see as the average regression slope. But instead, the average regression slope is closer to 0.5 W/m^2/K, which corresponds to a whopping 7.6 K ECS, more than double of what is known for these models. Using the FG06 method in the control runs of the models thus overestimates the sensitivity in what appears to be 12 out of the 14 models. Dr. Spencer goes into some more issues of testing this against models.

Anyhow, there appear to be two models that actually show reasonable results when using the FG06 method. From the Trenberth, Fasullo, and Abraham response, it appears that one of them is ECHAM_MPI. The other one looks to be GFDL CM2.1 from my tests:

### Closer Look at the GFDL CM2.1 Control Run

The GFDL CM2.1 has a ECS of 3.4 according to the IPCC AR4 table. This corresponds to a response of 1.12 W/m^2/K, which is almost the exact value I get when using the 500 years of the control run and monthly temperature anomalies (r^2=.09). Of course, it is curious why the correlation would be so low if the FG06 method uses an appropriate model, considering there is no measurement noise in the flux or temperature outputs from the model.

Anyhow, before continuing further, I’d like to show a chart of the control run global surface air temperature, which has got me scratching my head a bit:

Now, my understanding is that the pre-industrial control experiment does not include any change in forcings, and that the model is run to “stabilize” prior to the start of the control experiment. The gray lines are the monthly anomalies, while the black line is the 30 year moving average. Note that we see climate-scale trends (using the 30 year averages) that are completely unforced (if I’m understanding it correctly); this is not “natural variability” in terms of solar or volcanic variation, but rather in the “no TOA forcing changes” sense. Whether this is simply model drift, or is actually supposed to be simulating long-term, unforced variability, I’m not sure…it’s on the scale of 0.2C for what appears to be some < 75 year periods, which seems like it could be significant compared to the 20th century rise.

Anyhow, I will proceed with some different trials in order to diagnose the climate feedback in the model’s control run and compare it to the known value. First, I’ll note that using *annual* averages over the 500 year period gives me a response of **1.37** W/m^2/K (r^2 = 0.26), which would be an underestimate of sensitivity. Dr. Held, in his response to my comment on his blog, mentioned that using 1000 years (which I don’t think was available at PCMDI), he got response of 1.7 W/m^2/K. I was a bit surprised not to match his results, since 500 years seems like it would be enough to constrain it, until I broke it down into two 250 years periods and found responses of 1.56 and 1.7 W/m^2/K, which seems to suggest that there are periods of temperature change that are not met with corresponding radiative responses (perhaps this is what allows for the continuing drift?).

**Using Absolute Values Rather Than Anomalies**

Anyhow, in Murphy et. al 2009, they extend the FG06 method to use CERES monthly observations. One curious point is that in figure 2, they use interannual AND seasonal variations (that is, they don’t take monthly anomalies) to show the radiative response. Steve McIntyre has explored this as well. The result is that you get higher r^2 values, but I think this may inflate the confidence we should have in the Murphy (2009) method. After all, it doesn’t seem that seasonal changes in temperature and the radiative response would necessarily be equal to longer term changes, and in fact using absolute values for monthly temperature anomalies and fluxes with the 500 year control run results in a response of 4.55 W/m^2/K (way higher than the known long-term value) with a r^2 = 94%! However, that 94% clearly is not indicative that it is accurately reflecting the ECS. Of course, such a high value may only suggest that the GFDL CM2.1 model overestimates the flux response to seasons, or underestimates the seasonal temperature response.

### Regression Attenuation from Uncertainty in Surface Temperatures

I added white noise with a standard deviation of .075 C into the surface air temperatures to simulate measurement/sampling noise, based on estimates of the uncertainty presented in the charts of the HadCRUTv3 paper towards the early 21st century (during the CERES era). This resulted in the expected regression attenuation in the 500 year monthly test, bringing the estimated response down to about **1.02** W/m^2/K, and thus yielding a slight overestimate of the sensitivity. It is curious that the use of TLS to avoid this attenuation is not examined in more detail. For instance, FG06 mention:

For less than perfectly correlated data, OLS regression of Q-N against T_s will tend to underestimate Y values and therefore overestimate the equilibrium climate sensitivity (see Isobe et al. 1990).

The reason main reasons that they give for sticking with OLS is two-fold: 1) the issue of cause and effect (T_s inducing the radiative flux changes at short time scales, not the opposite), also discussed in MF09, which is in dispute by Spencer in point #1 of the post, and 2) that using it on the model HadCM3 does not yield accurate results using different regressions.

I will point out that even IF no unknown radiative forcing is confounding the feedback signal in this way, this still does NOT mean that there is no “error” in the independent variable…as shown above, HadCRUTv3 includes estimates of uncertainties. Furthermore, that another method correcting for these errors (both in terms of cause and effect and measurement error) would yield overestimates of the response in a climate model that includes neither these specified variations in cloud forcing nor measurement error is unsurprising, but it says nothing about whether a different regression method is appropriate for real world data that DOES include such errors. Finally, I found the following statement from FG06 appendix quite interesting:

Another important reason for adopting our regression model was to reinforce the main conclusion of the paper: the suggestion of a relatively small equilibrium climate sensitivity. To show the robustness of this conclusion, we deliberately adopted the regression model that gave the highest climate sensitivity (smallest Y value). It has been suggested that a technique based on total least squares regression or bisector least squares regression gives a better fit, when errors in the data are uncharacterized (Isobe et al. 1990). For example, for 1985–96 both of these methods suggest YNET of around 3.5 +.- 2.0 W/m^2/K (a

0.7–2.4 Kequilibrium surface temperature increase for 2 x CO2), and this should be compared to our 1.0–3.6-K range quoted in the conclusions of the paper.

Murphy et. al (2009) explore orthogonal regression a bit as well, but I couldn’t find anything that explicitly takes into account the known uncertainties in the surface temperatures.

### Sampling Error in 10 year intervals

Finally, I will look at the different estimates we get for the climate radiative response when breaking the 500 year control run into 50 10-year periods (and still including noise). I set the standard deviation for Net TOA flux measurements to 0.33 W/m^2 per month when adding in the white noise (based on estimated CERES RMSE (SW + LW) / sqrt(2)) The red lines represent the “true” radiative response. Using monthly values, we get the following results over 10 year periods:

Based on those responses, this includes climate sensitivities from 1.7 C to 14.2 C (!) based on a 10-year period, when the known sensitivity is 3.4 C.

For annual data (which should reduce some of the measurement noise, but yield a smaller sample size), we get:

Which includes everything from 0.8 C to 13.1 C.

Interestingly, these sampling errors tend to lead towards overestimates of the response, or underestimates in climate sensitivity. I’m not quite sure at the moment why this should be the case.

All code and data for this post is available here.

I am glad you are examinign the FG06 method. I looked at it in some detail when researching into the IPCC AR4 observational evidence as to climate sensitivity – see my early July post in Climate Etc. about the IPCC’s invalid alteration of the FG06 climate sensitivity estimated PDF. I find it much easier to see why FG06 would overestimate sensitivity than underestimate it.

‘Now, my understanding is that the pre-industrial control experiment does not include any change in forcings, and that the model is run to “stabilize” prior to the start of the control experiment.’

I don’t know how long the GFDL CM2.1 model was run before the start of the (900 year?) pre-industrial control run, but I doubt if more than 1000 years or so, given the computational costs involved at that time. I have examined 6,000 years of sea temperature data from the more recent HadCM3 model control run, and it doesn’t stabilise for over 2,000 years.

Comment by Nic L — November 5, 2011 @ 3:43 pm

Thanks Nic, I definitely remember your Climate Etc post. I agree that most issues with the FG06 methodology seem to lead to overestimates of sensitivity (although that sampling error above seems to be in the opposite direction?). I’ll be posting some results soon from Deming/TLS regressions that seem to show that even adjusting for the tiniest amount of known uncertainties in the surface temperature measurements will yield significantly lower sensitivities. Given the extensive discussion in FG06 regarding the OLS as an upper bound on sensitivity, I’m quite surprised Murphy et al. (2009) didn’t explore this issue in more depth, but instead proceeded with diagnosing 20th century aerosol forcing from the sensitivity earlier in the paper. (I’m also unable to reproduce their orthogonal regression results at this time, but I’m a bit new to it and need to try and figure out precisely what they’ve done).

That’s quite interesting about the HadCM3 control run, I had no idea it took that long to stabilize. However, I know the start of the control run is when they begin the 20th century experiment runs as well…I’m not sure if it would affect sensitivity/predictions to begin it prior to stabilization. Out of curiosity, what work are you doing with the HadCM3 model? Have you had a chance to look at the Murphy et al. (2009) paper that Forster references in your post?

Comment by troyca — November 6, 2011 @ 7:33 am

Troyca

I’m unsure about the effect of sampling errors. FG06 used a ‘robust’ regression techinique, which if I recall correctly involved regressing on multiple subsets of the data. I’m not sure how much difference allowing for errors in the surface temperature would make, since most of the ‘errors’ in the radiative response come from residuals in fitting the regression line(s), not from errors in measuring radiative fluxes or forcings. Actually, the SW regressions have low residuals, but the LW regressions have high ones, as I recall. Of course, with TLS or Deming regression part of the fitting errors will be allocated to the surface temperatures. But I tend to largely accept the FG06 argument that changes in radiative fluxes (net of changes in forcings) should respond to surface temperatures more reliably than vice versa, so OLS is right in principle irrespective of errors in measuring surface temperature.

I would be most interested to learn how you get on with replicating Murphy 2009. I was unimpressed by it giving no explanation for obtaining different results from those in FG06 using the same data, despite Forster being an author of both papers.

I was looking at HadCM3 control run ocean data (kindly supplied by the UK Met Office) to get estimates of natural variability. Come to think of it, it was the variability that took 2000+ years to stabilise, not the mean temperature. The temperature (top 3000m) declined pretty continuously throughout most of the control run, until about year 5,200, after which it stabililised!

Comment by Nic L — November 6, 2011 @ 12:57 pm

[…] discussed in a previous post, Forster and Gregory (2006) and Murphy et. al. (2009) use OLS regressions of temperatures versus […]

Pingback by Estimated Sensitivity and Climate Response with TLS « Troy's Scratchpad — November 7, 2011 @ 7:58 am

Nic L,

Regarding Murphy 2009 getting different results than FG06, with the table it appears that they’ve included some extra years in the ERBE runs (97, 98, and 99 in one part), while removing other ones 91,92. I suppose it shows how sensitive the method is to which years are included, which is not all that surprising. I agree that it is not particularly satisfying, as the choices seem to be arbitrary (e.g. excluding actual measured data because “the rapidly changing radiative forcing after the eruption of Mt. Pinatubo in 1991 makes the data difficult to interpret through about 1993″, but they are fine use partial years and “a composite ‘‘1998′ was created using

data from 1997 and 1999 to fill gaps in the 1998 record”). It seems any choice of inclusion/exclusion of years can be theoretically justified, which is a dangerous situation.

I’m not sure I totally understand the “FG06 argument that changes in radiative fluxes (net of changes in forcings) should respond to surface temperatures more reliably than vice versa, so OLS is right in principle irrespective of errors in measuring surface temperature.” I get that orthogonal regression is likely inappropriate, but I don’t understand why TLS to “correct” for some of these errors in measurement surface temperature would be inappropriate (and assuming no errors is appropriate, other than to get an upper bound of ECS).

Thanks,

-Troy

Comment by troyca — November 7, 2011 @ 9:21 am

If the y variable really does depend on an independent x variable, as my comment implied, and most of the ‘errors’ are due to scatter about a straight line rather than measurement errors (as FG06 stated) then I think OLS is likely to give the best slope estimate (although I was maybe overstating the case to say it was right in principle). See, e.g., Isobe et al. ‘Linear regression in astronomy I’, Astrophysical Journal, 1990 (which I think FG06 may have cited).

When I’ve come across the term orthogonal regression, it has usually been used synonymously with Total Least Squares (TLS). What are you using it to mean?

Comment by Nic L — November 7, 2011 @ 10:03 am

Admittedly, I’ve been relying primarily on Google to look up and implement orthogonal regression ( this paper was particularly helpful). However, my understanding is that orthogonal regression is a special case of TLS, where the variance in “errors” of each variable is assumed to be the same. This is likely inappropriate because of the reasons you mentioned. However, if we use TLS but assume that the measurement “errors” in T are much smaller than the others errors (i.e., a large value for δ), we can avoid the bias of a straight orthogonal regression while still assuming that there are *some* errors in the x measurements. That’s what I tried to show in my recent post (pingback is above).

If Murphy et. al. (2009) is using orthogonal regression to mean simply the Deming regression, then I would have expected more information regarding the assumed relative errors of the variables, which I didn’t see on first read.

Comment by troyca — November 7, 2011 @ 10:25 am

[…] part one, I noted a number of potential issues with using the Forster and Gregory (2006) method over a […]

Pingback by Measuring Sensitivity with the GFDL-CM 2.1 Control Run, Part 2 « Troy's Scratchpad — December 21, 2011 @ 7:11 pm