Recently, Zeke informed me of some nicely-formatted population data from NOAA for the U.S. going back to 1930. Performing similar tests to Part 1 and Part 2 gives us an opportunity to use these extra years to search for any UHI bias.
Quick Note on Methods
Intermediate Data, Code, and Graphs for this post can be downloaded here.
“PopulationFinder” is the mini Java app that scans the specified folder for any DAT files, assumes they are in the NOAA format, and matches up the stations from USHCNv2 to their population as specified in the NOAA population files. Basically, you can just download and unzip all the state population files into one directory and the PopulationFinder will do the heavy lifting. The only thing that you can tweak here (and will be discussed later) is the “MaxDistInKm” value, which will use all grid points within that distance to determine the average population for a station. I’ve set it to 1.0 km, which means typically only 1-3 grid points are used.
One change I’ve subsequently made to the TempDataProcessor is how to determine if a station should be kept or not when performing the test. Previously I just checked to make sure it reported avg temps at least 25 of the 31 years. Now, with a longer time period, I want to make sure that every decade is represented, so it requires at least 7 years of reported avg temps from every decade.
I ran tests using the whole time period (1930-2000) and then just from 1970-2000 to compare the tests from Part 1 and Part2.
Here are a couple of graphs for nearby stations using Raw and TOB datasets, but the remainder can all be found in the data/code/graph pack linked above.
Here is a quick summary of all the results from both 1930 on and 1970 on.
Compare to our results from Part 2, and you’ll see that our correlations tend to be much worse and the slopes lower, especially in the same 1970-2000 period:
So, if I’m assuming there is a strong enough signal to detect, here are some of my thoughts:
- Lower threshold distance is better. This is kind of a no-brainer, since we’d think that nearby stations would have less actual climate-related differences. We see that in almost all cases the < 0.5 degree distance shows better correlation and a higher slope. Of course, some of this may be due to fewer observations, which is also a problem…there are not necessarily enough very close station pair candidates here. I have a few ideas on how we might get more observations for this using a sliding time period window. Once we get to higher thresholds, we start getting into the problem of spurious correlation due to trends in regions which I discussed previously.
- The TOB datasets show the strongest signal. Generally, they tend to do better than raw, which would be explained by the fact that they remove the noise from time of observation bias. Of course, F52 does the worst by a good margin. There are a couple explanations we might have for this. According to NOAA on the F52 adjustments, “The U.S. HCN version 2 ‘pairwise’ homogenization algorithm addresses these and other issues…” So, either these corrections
- Have “effectively account[ed] for any ‘local’ trend at any individual station”, thereby reducing the urban effect to something very small, as described here. Or
- Have messed up any of our attempts to perform our comparison of nearby stations by already adjusting them to match other local stations. This is not necessarily contrary to (A), if the UHI effect is indeed handled correctly. However, if the corrections are based on faulty assumptions, then the waters have simply been muddied. A closer examination of this is probably necessary.
- There is more correlation than we would expect without any UHI signal. However, there is clearly a lot of noise, and the magnitude of the effect so far seems to be on the smaller end. As I’ve mentioned previously, I want to do a post dedicated to examining the magnitude of the effect.
- Longer time frames bring out the signal better. This is simply from comparing the 1930-2000 dataset to the 1970-2000 dataset for NOAA population data. Makes sense if we’re reducing weather noise.
- Using NHGIS “place” population data shows a stronger signal than the NOAA population data. There are once again a few different explanations here. First, it may be that my data from NHGIS had fewer stations with population data, and that the ones it did have happened to show higher correlation. It may also be that the NOAA data, since it is derived from county levels, is not as accurate in more rural areas as the NHGIS data for place (though I sort of doubt this). Finally, it may be that the NOAA pop data IS more accurate for the grid points immediately surrounding a station, BUT that the “place” population is actually a better proxy for what goes into the UHI effect. Maybe the development of the closest “place” actually biases a station more than simply the population density of its 1 square km area.
This leaves me with quite a bit to investigate. Any feedback is of course welcome, especially from people that may have already been down this path.