Troy's Scratchpad

November 24, 2010

A New Model for UHI in US 1970-2000 Temperature Trends

Filed under: Uncategorized — troyca @ 6:42 pm


Since starting this blog, I’ve been trying to determine the UHI effect based on how changing populations trends correlate with changing temperature trends at various stations.  I believe the most successful demonstration of that was here.

Along that path, I stumbled upon the NHGIS, which gives  access to aggregate census data. Over the past few weeks I’ve been working with this data to determine any variables that might serve as better proxies for this “UHI” effect.

The Data

All intermediate data, code, and my downloaded NHGIS datasets can be retrieved from here (the biggest package yet).

The temperature data, once again, is USHCNv2.

The data for my variables (e.g. Num Vehicles Availables, Aggregate Family Income, etc.) is from NHGIS.  The availability of differents topics can be seen here, and based on the limitation of the data and what I thought could possibly be proxies for land use/UHI, I was left with only a few choices.  Notes on how I got from the raw NHGIS datasets to the variables can be seen in the Origins.txt included in the package above.

Ultimately, I found that many of these variables were decent proxies for the UHI effect, but few were independent of each other and most of them did not perform better than population.

However, the Aggregate Family Income of a “place” DOES seem to be a better proxy for UHI than population, which is unsurprising given that economic development typically spurs the surface and land-use changes.  Furthermore, the number of workers in Agriculture, Fishing, Forestry, and Hunting seems to be pretty orthogonal to Aggregate Family Income, and also a decent proxy for UHI in its own right.  It is negatively correlated, which also makes sense, because a decrease in these sorts of jobs suggests an “urbanization”.  Regressions using these these two explanatory variables has led to the best results.

One note on Aggregate Family Income: For determining the magnitude of the effect, I use an inflation-adjusted value for each year, calculated using this site.  The inflation-adjusted value is reported in 1970’s dollars.  While inflation does not affect the correlation, it can greatly affect the estimated magnitude of UHI based on our model.

The Method (#1)

The first method I use is very similar to the one I used with population previously, but I will reproduce some of the basic points here.

1) For each station, the linear temperature trend (dT) is calculated based on the data from 1970-2000.  A station is only included if it reports an annual temperature for at least 7 years in every decade from 1970-2000.

2) Similarly, a trend is calculated for both Income and AgrWork (and other variables) at each station, using the 1970, 1980, 1990, and 2000.  The trend is in terms of the log of these variables, so dI and dA (which I will use for short-hand) refer to the log-difference each year.

3) Based on latitude and longitude, we match all close station pairs.  A match is determined based on approximate distance in km between the station pairs.  If the difference between two stations is less than the “threshold” distance, the pair is included.

4) We then regress the difference between two temperature trends of nearby stations against the difference in log trends of Income and AgrWorkers in those same stations.  So we try to find an equation of the form (TempTrend2 – TempTrend1) = a * (logIncomeTrend2-logIncomeTrend1) + b * (logAgrWorkersTrend2 – logAgrWorkersTrend1) + c

The Method (#2)

The other method I used was to simply compare all stations at once.  Steps 1) and 2) are exactly the same as above.  Then,

3) The U.S. is broken up into 2.5 x 2.5 degree grid cells, and the average temperature trend is calculated for each grid cell.  A value for dtAdj at each station is determined, which is the temperature trend for a particular station MINUS the average temperature trend of all other station temperature trends in its own grid cell.

4)  The dtAdj for each station is then compared against the resulting dI and dA for that station.

Step #3, is used to remove the spatial auto-correlation that I’ve encountered before, since we adjust each station’s temperature trend by others in the region.

Why have this second method at all?  First of all, it allows for comparing all stations at once, rather than over-representing those stations that have more stations clustered nearby.  Second, it allows for us to use stations that are more solitary, taking advantage of nearby ones that may have valid temperature data but do not have data for the other variables.  In other words, while we are limited to some 400 stations of those that have our Income data  AND temperature data, we can use all 800 or so that have at least temperature data to perform our regional adjustments.

Of course, in the event that we do encounter spatial auto-correlation and the Income increases more in certain grid-cells than others, we are almost certain to get an underestimate of the slope, since we’re subtracting the average trend even if the UHI effect be a net positive.


This first table shows the results of running Method #2 on the Raw, F52, and TOB USHCN datasets, regressing dtAdj against dI and dA. 

This second table shows the results of method #2, where we perform the pairwise comparisons.  I’ve shown the results at various threshold distances for what constitutes a match.

In both methods, there seems to be a fairly robust signal (outside of the F52 dataset).  I will continue forward using the TOB set, with more explanation of why in the discussion at the end.

Magnitude of the Effect

At this point, it’s worthwhile to take a look at the magnitude of the effect.  Though there are more sophisticated methods for calculating the yearly US anomaly out there, I only needed an approximation here, and so the method I used was fairly simple: use a 5×5 degree grid for the US, calculate the average anomaly for each grid cell, then average the grid cells.

Here’s a graph that shows my resulting anomalies versus that of GISS. (Note that they have different baselines).

They match up pretty well, with the trends fairly close.

Now, in order to calculate the magnitude of the effect, I simply subtract the (coefficient for each variable times the number of years that have elapsed times the annual trend of that variable at that station)  from the anomaly for the station.

Clearly, this is going to be extremely sensitive to both the coefficient AND the annual trend of the variable at a particular station.  For something like population, the annual trend is pretty straightforward if we have the data.  However, with Income it was necessary to adjust for inflation, since an increase in Aggregate Income that is only par with inflation does not suggest economic development.

Regarding the coefficients, I will show the adjustments using the higher-end ( 9.33, -2.046) and lower-end (3.22, -1.4) of our results from above:

On the higher end, we see about 25.4% of our trend due to UHI during this time period, and on the lower end we have about 9.4%.


We’re left with a range likely in between 10% and 25% for the U.S. during this time period using the TOB dataset.  As I’ve discussed before, it is not surprising that we find a weak to non-existent signal in the F52 adjusted data-set.  While it is possible that this is due to perfectly removing the UHI effect, it is also quite likely that the infilling of data has only added noise to dilute the signal.  Furthermore, when the temperature readings have been adjusted to match nearby stations, we can hardly expect our pairwise tests to then yield anything meaningful.  

To me, the real wildcards here are the inhomogeneity adjustments. On the one hand, we may be getting a correlation here between the the instrument-related adjustments and the economic development near a particular station, in which case we are overestimating the effect of UHI.  On the other hand, it may be that adjusting for station equipment type and location moves (and avoiding the pairwise temperature trend adjustments) may in fact increase the signal, suggesting that what we have is an underestimate.  This is likely to be my next avenue of research, when I get a chance.


  1. Troy, very interesting. Since the mean is just the average of TMax and TMin, you should redo this with just those two alone. I’m seeing TMax decreasing with TMin increasing. Seems to me the UHIA would tend to keep warmth in (less loss) which would increase TMin. That could be proved if you do this analysis with those two numbers.

    Comment by jrwakefield — November 27, 2010 @ 7:40 am

    • jrwakefield,

      With those two numbers, I actually find less effect individually than when the are combined.

      For TOB MAX, threshold=50km, I get a correlation of 0.282 from the 128 station pairs, with coefficients 9.37 and -1.733

      For TOB MAX, threshold=300km, I get a correlation of .124 from the 4748 station pairs, with coefficients 3.54 and -0.575

      For TOB MIN, threshold=50km, I get a correlation of 0.209 from the 127 station pairs, with coefficients 7.74 and -2.335

      For TOB MIN, threshold=300km, I get a correlation of 0.110 from the 4851 station pairs, with coefficients 3.81 and -0.992

      Comment by troyca — November 30, 2010 @ 11:46 am

  2. The problem with using TOB in this analysis is that the magnitude of the bias introduced by sensor shifts (CRS to MMTS) in the middle of your sample period (~1985) is going to swamp the UHI signal in many cases. It also doesn’t help that MMTS transitions also generally coincided with instrument location moves, and are somewhat correlated with urbanity both numerically (e.g. slightly more urban stations went to MMTS than rural) and temporally (I believe urban stations changed to MMTS earlier than rural, but that should be double checked).

    I’d suggest looking at CRS stations only as a potential good next step, or figuring out a more robust way to filter out the sensor transition effects (e.g. I’m using a pairing model that eliminates pairs with different sensor types and sensor transitions more than 5 years apart).

    Interesting work overall, however! I also wouldn’t write off inhomogeneity corrections too much; in theory they should be reasonably good at dealing with UHI characterized by discontinuities in the record, but a bit less so in dealing with slow processes. If an abrupt change in microsite due to urbanization is the primary driver, that would be detected. If mesoscale warming due to changes in the broader surrounding environment are the driver it might be more difficult.

    Comment by Zeke Hausfather — November 30, 2010 @ 9:19 am

    • Hi Zeke,

      Steven Mosher briefly mentioned that there were problems with using the TOB dataset on the Judith Curry thread:

      I mentioned there that “I’d like to start with the TOB and then perform both 1) station move and instrument change adjustments and 2) re-apply the same test in my post, and make my own corrections for UHI based on the results. I should then be able to compare the resulting trends of TOB + 1 + 2 to that of F52. The trick is figuring out how to do #1 (perhaps reproducing the old Karl and Williams (1987) way?).”

      I’ve come to wonder if doing #1 might be a bit difficult to start, so I was thinking of just using those stations that have experienced minimal change (based on reported temperatures) from the TOB to F52 datasets. But I’m not sure how many stations I’ll get that way.

      Your suggestion to look at the CRS-only stations is an excellent one, however, and assuming this station history information is easy to obtain it will likely be my next avenue of inquiry.

      Another thing I’m thinking is that the subset of stations that actually has data from NHGIS appeared to have more urban profiles, so perhaps if the vast majority of them are urban there won’t be the urban/rural correlation with station change that you mention to skew the data. However, once again I won’t know this until I get a look at the stations WRT to their GRUMP profiles.

      Comment by troyca — November 30, 2010 @ 11:19 am

  3. […] Troy's Scratchpad For Climate Science Analysis and WIPs Skip to content HomeAbout ← A New Model for UHI in US 1970-2000 Temperature Trends […]

    Pingback by F52 vs TOB in the New UHI Model | Troy's Scratchpad — December 1, 2010 @ 9:53 pm

  4. Just some confirmatory stuff from me. I’m working with the early record ( 1900-1940)
    worldwide data ( historical reconstruction for population density)…

    If your gunna find UHI anywhere you’re looking in the right time period.

    There are some other sources for classifying rural/urban, but none of it
    very open. Landscan for population, Modis 500 meter urban extent. Hyde for
    historical population and land use ( used in AR5)

    Comment by steven mosher — December 2, 2010 @ 2:46 pm

  5. […] is a continuation of first part of the UHI analysis with the USHCNv2 temperature record from 1970-2000 using NHGIS economic and […]

    Pingback by Continuing “A New Model for UHI in the US 1970-2000″ | Troy's Scratchpad — December 21, 2010 @ 9:56 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

<span>%d</span> bloggers like this: