Troy's Scratchpad

December 21, 2010

Continuing “A New Model for UHI in the US 1970-2000”

Filed under: Uncategorized — troyca @ 9:56 pm

This is a continuation of first part of the UHI analysis with the USHCNv2 temperature record from 1970-2000 using NHGIS economic and industry data.  Code and intermediate results can be downloaded for this post here.

In that post, Zeke left a helpful comment mentioning that the apparent UHI signal might instead simply be the bias of comparing stations that switched from CRS to MMTS instrumentation at different times.  Using the CRS-only stations here did not leave us with a lot of data to make a definitive statement.

In this analysis, I use the station history file found here to determine the month and year in which a particular station switched from CRS to MMTS, thereby allowing us to calculate the number of months between when one station and its neighbor station changed.

There are two notes I’d like to make before starting:

  1. The station history file ends in 1994.  If there is a history file that extends beyond that, I can’t find it.
  2. Stations switch-overs may simply not be recorded.  Therefore, I cannot simply assume that if a station does not note a switch to MMTS, that it has simply stayed as CRS.  Because of this and #1, the program only classifies stations as “NO_SWITCH” IF there is no switch recorded AND the station is listed in the “ushcn-crs-stations.txt” meta file.  This means that the NO_DATA category consists of stations that have either made the switch to MMTS after 1994, do not have a year or record for the switch, or have switched to ASOS instead.

Given the above notes, here is a chart showing the number of station switch-overs per year within the network for stations with valid temp data (7 or more reported years for each decade from 1970 to 2000):

And here is the subset of stations that include NHGIS data as well (which we’ll actually be using in our analysis):

As we can see, the bulk of the changes occur within the mid eighties.

Now for the results of our pairwise comparison tests.  These take much of the same form as the previous tests, except we now have another threshold as well for determining inclusion: # of months between the switch-overs.  A couple more notes regarding assumptions within this exercise:

  1. I only count the monthly difference up to the point that it will have an effect in our analysis.  So since we only go up to 2000, a station which never makes the shift away from CRS will list the switch date as January 2001.
  2. Some switch-overs have the year listed but not the month.  In these cases, I’ve set the month to “6”.

As you can see, the signal appear pretty robust and its existence is not greatly affected by the difference between the months switching to MMTS.  I was hoping to do more in this post, and have begun looking into the PHA (as you may have noticed in my last post) to see if I can gauge the magnitude of the UHI effect in the F52 dataset, but have run into difficulties with this project.  If anybody has experience re-creating the official F52 dataset by running the PHA, please drop me a line.


December 10, 2010

Running the USHCNv2 Software (Pairwise Homogeniety Algorithm)

Filed under: Uncategorized — troyca @ 6:02 pm

As my analysis has brought me to this point, I thought I might as well share my experience with running the USHCNv2 Software (specifically the PHA) that produces the F52 output.

Overall, I stuck primarily to the quickstart guide, and the process is documented very well.  There were a couple issues that were excerbated by my lack of familiarity with Linux, so hopefully this description will help others going along this path.

Getting Started

Since I use a Windows machine (I know, I know) I decided to simply try compiling and running the software in a Virtual Machine.  Both VMWare Player and an Ubuntu 10.10 ISO image are free (although you do need to sign up with an account for WMWare) so I went with those options.

There were 3 apps that I needed to [sudo apt-get install] on this Ubuntu machine in order to run the PHA: csh (since only BASH is included), gawk, and g77 (for Fortran compilation, since I wanted to use the same compiler they recommended).
Getting g77 installed was a little tricky, since apparently it is not included/compatible with the latest version of gcc.  It could not be found within the normal packages in sources.list either.  This forum post contains a tip on how to get it, but I’ll reproduce the steps here: (use caution since I understand this reverts your version of gcc as well)

  1. Ensure you have aptitude (sudo apt-get install aptitude)
  2. Edit sources.list to include the following references at the bottom: (sudo gedit ~/etc/apt/sources.list)
    deb hardy universe
    deb-src hardy universe
    deb hardy-updates universe
    deb-src hardy-updates universe
  3. Update aptitude references and install (sudo aptitude update, then sudo aptitude install g77)

Preparing Inputs

Now, one of the irritating things with this process is that the INPUT files (temperature and history) are by station, rather than combined into one large file with all the stations, such as those that we get from the USHCN v2 temperature site.  Furthermore, the .HIS files seem to be in a very different format from the one station.history file I found here, and I could not find where the .HIS files used to generate the F52 dataset might be located.

Neither could I find any quick way to generate these HIS and individual station temperature files.  Thus, I wrote a Java app that will take arguments for the USHCNv2 temperature file (such as TOB), station history file, and station meta-file (e.g. ushcn-v2-stations.txt), and will output the collection of HIS and temperature files per station, and a station meta-file that has all stations without temperature data removed from it.  Right now I only export the temperature files with the suffix “_avg.raw”, although this can be easily changed to min/max and tob if desired.

I’m still not sure if these HIS files are in the correct format.  There seems to be some discrepency between the only sample HIS provided (“011084.his”) and the description of format of this HIS file (which according to “USHCN.v2.0.20090806.doc” is “SHF_tob_inst.txt”).  Furthermore, the station.history file does not seem to include all the information that can possibly be entered in a HIS file, so my Java app will leave some fields blank.

Compiling and Running

From here, most things ran pretty well following the quickstart guide.  However, there were three script files that seemed to have minor bugs:,, and

All three of these have the same logic for processing the input type argument:
if [ $elem == “max” ]
export iel=1
export gel=”tmax”
echo “MAX”
elif [ $elem == “min” ]
export iel=2
export gel=”tmin”
echo “MIN”
elif [ $elem == “pcp” ]
export iel=4
export gel=”prcp”
echo “PCP”
echo “Unknown element: $elem”

The BASH interpreter does not seem to like the C/Java style “==”, and thus complains about the argument.  Removing one of the ‘=’ does the trick.  Furthermore, it is missing an entry for the “avg” choice, so you’ll need to add something like:

elif [ $elem = “avg” ]
export iel=3
export gel=”tavg”
echo “AVG”

Finally, I made the mistake of originally only allocating 512 MB of RAM to the VM.  This meant any attempt to run the PHA on any “realworld” scenario was swiftly met by the OOM killer.  I tried to fiddle around with the .incl files to lower the array sizes at first, which led to a whole host of errors.  In the end, simply increasing the RAM for the VM got it to run ok, although it took a couple of hours.


My original goal was to simply see if I could go from the TOB AVG dataset to the F52 AVG dataset.  If you want to see the result of this test run, it is included in the package here.  A quick diff, however, revealed a number of differences.  This was further confirmed when I compared the result of the first station, which shows the USHCNv2 TOB.avg file, the USHCNv2 F52.avg file, and my test run using the USHCNv2 TOB.avg file as input:

As you can see, there is quite a difference.  I’ll need to read up and run a few more tests to figure out why this is happening, but there are quite a few possibilities:

  1. I tried to go from TOB–>AVG, but I passed “raw” as the parameter to the scripts.  My initial thought was that this simply told what file extension to look for, but perhaps extra processing is done as well.
  2. I don’t have the actual station HIS files for input.
  3. I don’t know exactly what values were set for various INCL parameters, such as “neighborhood” stations and the like.
  4. Perhaps the F52 is formed by averaging F52_max and F52_min, which would mean running the algorithm on MAX and MIN separately?

As I said, this takes several hours to run on my machine, so it’s not easy to quickly test where I might have gone wrong.

Not all is lost, however.  For the period from 1970-2000, my average anomaly calculations (using 2.5×3.5 gridded) for the U.S. look almost exactly the same when comparing the F52 dataset downloaded from USHCNv2 vs. the F52 created from my run of the PHA (r-squared=.9998).  Furthermore, both match up rather well with the GISS anomalies during that period:

The GISS slope is .288 C/Decade, the slope from my average calculated from USHCNv2 F52 is 0.298 C/decade, and the slope of my average calculated from my generated F52 is 0.292 C/decade.  Compare that to the average I get from the inputted TOB dataset at 0.257 C/Decade, and it is clear that the PHA run did indeed seem to bring it more in line with the F52 we get from the USHCN website.

For my purposes this may be enough, and I may proceed with my analysis a bit more before coming back to some of these unanswered questions.

Update 1/3: Many of these lingering questions have been answered in a subsequent post hereTo summarize, I won’t be able to reproduce the F52 dataset because the HIS files are not currently available, and because the official PHA run also uses a COOP temperature database that also has yet to be released.

December 3, 2010

Aqua ch5 Daily vs UAH Monthly

Filed under: Uncategorized — troyca @ 7:50 am

I probably should’ve looked at this before placing my bet, or giving my update.  However, here’s a graph showing the average daily anomalies for Aqua CH5 from the AMSU website for a month vs. the reported UAH anomaly for that month.

This starts in August 2002, since that is the earliest data available on the website for the daily anomalies.  For anyone curious, the r-squared value is 0.80, with the linear trend equation of UAH Anomaly = 0.9283 * AQUA + 0.1854

So we’re looking at about 0.18 C warmer anomaly report from UAH than the average daily anomalies, if you’re using those anomalies to guess the upcoming temperature report.

Update (12/20)

The daily anomaly averages were calculated relative to the “Average” line available at the AMSU website.  Given the difference we generally see between these daily anomalies and the reported UAH anomaly, it’s easy to think that they are just being reported relative to a different baseline.

However, according to Roy Spencer here , the average line is indeed the 1979-1998 average.  The UAH dataset here also reports relative to 1979-1998, as can be confirmed by summing those anomalies (and getting 0).

To dispell the notion that the “Average” line on the AMSU website is simply the 2002-2009 average, here is the difference:

December 1, 2010

F52 vs TOB in the New UHI Model

Filed under: Uncategorized — troyca @ 9:53 pm

This is a follow-up on my last post, which found what appears to be a fairly robust UHI signal in the TOB dataset from USHCNv2.  The same thing cannot be found in the F52 dataset.  Here I look into a few things to see whether the signal in the TOB dataset may be the result of bias from changes to station location and instrumentation.

The code is the same as that last post, and can be found here.  The results of the new tests that form the data for this post can be found here.

The first test here, as Zeke suggested, is to look at CRS stations only, since those likely did not undergo any instrument change during our time period.  I did this by simply subbing in the ushcn-crs-stations.txt as my stations file from USHCNv2.

Ultimately, this  gave me 62 stations with both valid temperature data (>=7 years per decade) and NHGIS data for my two variables.  The results of method #2 (see last post for more info) were [Income Coefficient=7.752, Agriculture Coefficient= -2.235, Correl= 0.284], although by only using CRS-temperature data the dtAdj is probably not very effective because of limited data per grid cell.  Here are the results of the pairwise comparison:

On the one hand, it doesn’t look like we get all that much different of an effect.  But there is much less data available, and the coefficient for income drops off significantly at that 400 km threshold.

For MMTS stations, we have 296 stations available.  Results of method #2 are: [Income Coefficient=3.548, Agriculture Coefficient= -0.767, Correl= 0.14] .  And our pairwise comparison test:

It may be possible that the results are not simply the result of bias due to station changes, if we consider the breakdown of the stations that actually contain NHGIS data.  I use the station inventory with current population data appended produced by Ron Broberg here to help create the graph below.

Note that in the subset of stations with NHGIS data available, there is a much smaller percentage of stations that might be categorized as “rural”.  Since this class of stations tends to correlate with station changes, we have fewer involved that might require these type of corrections from TOB to F52.

Another thing to look at is the differing trends of just the CRS stations between those that only have temperature data vs. those that we use in our tests because they have NHGIS data as well:

The last test I attempted was to see which stations had their trends change the least between TOB and F52.  I then included only stations whose trend changed by less than 0.1 degree C per decade in my analysis.

The result left me with only 114 stations that had both NHGIS and valid temperature data.  Here are the results of that comparison:

Here, we note a significantly smaller effect based on the income coefficient.  However, one thing worth noting is the difference in trends according to my temp anomaly calculator when only using these 114 stations.

The annual trend for TOB calculated in the last post was 0.0276.  With only those 114 stations, it is 0.0275 here.  So minimal change.  On the other hand, F52 using all our stations with valid data yields an 0.031 trend, whereas only those stations with minimal change from TOB yields only an 0.028 trend.

There still quite a bit to investigate…

7.752 -2.235 0.284 Num Stations = 62

Blog at