Troy's Scratchpad

January 19, 2011

Testing the PHA with synthetic data – Part 2

Filed under: Uncategorized — troyca @ 6:37 pm

This is a continuation of part 1, where I began testing the USHCNv2 PHA using synthetic data.  In that post I simply examined if the synthetic data looked reasonable, as well as if the PHA will automatically adjust to increase the temperature trend.  In this part, I examine a more interesting couple of questions, which is: 1) whether the PHA removes the effect of UHI in these datasets, and 2) whether the resulting output of the PHA (generally the F52 dataset) will hide a UHI signal (if we try to use our method of pairwise comparisons of nearby station trends to find it).

This post re-uses a lot of slightly modified goodies from the past, and the updated package can be found here.

Adding in UHI effects

The first steps of the data generation are the same as before.  However, in my tests here I also added in UHI contamination equal to the “true” underlying trend, thereby creating a dataset whose resulting trend appeared to be double the actual temperature trend.  Here are the steps:

1) I went through each station, and selected a trend* for the log of some economic indicator (or we could call it population) that lasted between 3 to 10 years.  I did this until I got fluctuating trends for the whole 100 year period for that station. The trend is randomly selected according to a Gaussian distribution with parameters for the mean and standard deviations specified.  Here is a look at the actual 1970-1980 log trend histogram for stations with NHGIS data:

NOTE: I actually found what appeared to be some issues with my 2000 NHGIS variable data by creating a chart like this, which I will discuss in a later post.

2) This economic indicator trend is then multiplied by another random number to specify how this corresponds to the UHI bias in the temperature trend.  The resulting temperature trend is then added to the station monthly temperatures.

Now, if we perform the same nearby station comparison that I’ve done in my many previous posts, we get a graph that looks like this:

You’ll notice that this looks a lot cleaner than the graphs with actual data.  This is likely a combination of the fact that a) actual UHI contamination is not as bad as simulated here, b) the variables we use as proxies for UHI are not as accurate as I’ve simulated here, and c) there is no missing data or other inhomogeneities added in at this point. 

After running the PHA

Here, I’ve taken the UHI infected “TOB” dataset and ran it through the PHA.  I’ll note that I only ran this on the AVG dataset (since that’s all I’ve created) rather than on a separate MAX and MIN datasets, as the official F52 results use.  The resulting dataset and my calculated yearly temperature anomalies can be seen in the code package included with the post.  However, here is a chart plotting the three datasets:

The resulting trends for the three datasets are:

With UHI: 0.389 Degrees/Decade
Without UHI: 0.191 Degrees/Decade
PHA Adjusted From UHI: 0.345 Degrees/Decade

On the plus side, the PHA reduced the effect of UHI contamination by 22%.  I will also note that this is only one particular interpretation of how UHI might contaminate a dataset.  My reading of MW09 suggests the the PHA might operate better if UHI effects occur in short steps and bursts rather than these 3-10 year trends (on the other hand, UHI contamination might operate even more gradually). 

Regardless, we see in this scenario that the PHA is NOT a miracle worker, and left a strong majority of the UHI bias within the resulting dataset.  This should not be a surprise, as if the underlying trend of the bulk of the stations (as we’ve implemented here) is increased because of UHI contamination, this will appear to be the “normal”/unbiased signal.     

So what about trying to recover our UHI signal?  If we run everything through the same pairwise test as before, we get the following graph:

Note the change in slope from 11.0 to 3.31.  Essentially, the PHA has reduced the apparent UHI impact by 70%, while only removing about 22% of the actual effect.  Therefore simply using the resulting F52 dataset in this case for our pairwise comparisons would result in an underestimate of the UHI effect.  It is possible that adding in various other inhomogeneities would increase the amount of changepoints, and therefore muddy the UHI waters even further, and this is what I’ll be investigating in Part 3.

It is quite probable that my resulting datasets and the effect of UHI do not reflect reality.  However, I believe these tests demonstrate that it is at least possible that the PHA adjustments hinder our ability to find a large portion of the UHI signal in the F52 dataset, WITHOUT actually removing the UHI contamination itself.  This should be kept in mind when trying to locate a UHI signal using these post-PHA-adjusted datasets.


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: