As my analysis has brought me to this point, I thought I might as well share my experience with running the USHCNv2 Software (specifically the PHA) that produces the F52 output.
Overall, I stuck primarily to the quickstart guide, and the process is documented very well. There were a couple issues that were excerbated by my lack of familiarity with Linux, so hopefully this description will help others going along this path.
Getting Started
Since I use a Windows machine (I know, I know) I decided to simply try compiling and running the software in a Virtual Machine. Both VMWare Player and an Ubuntu 10.10 ISO image are free (although you do need to sign up with an account for WMWare) so I went with those options.
There were 3 apps that I needed to [sudo apt-get install] on this Ubuntu machine in order to run the PHA: csh (since only BASH is included), gawk, and g77 (for Fortran compilation, since I wanted to use the same compiler they recommended).
Getting g77 installed was a little tricky, since apparently it is not included/compatible with the latest version of gcc. It could not be found within the normal packages in sources.list either. This forum post contains a tip on how to get it, but I’ll reproduce the steps here: (use caution since I understand this reverts your version of gcc as well)
- Ensure you have aptitude (sudo apt-get install aptitude)
- Edit sources.list to include the following references at the bottom: (sudo gedit ~/etc/apt/sources.list)
deb http://us.archive.ubuntu.com/ubuntu/ hardy universe
deb-src http://us.archive.ubuntu.com/ubuntu/ hardy universe
deb http://us.archive.ubuntu.com/ubuntu/ hardy-updates universe
deb-src http://us.archive.ubuntu.com/ubuntu/ hardy-updates universe - Update aptitude references and install (sudo aptitude update, then sudo aptitude install g77)
Preparing Inputs
Now, one of the irritating things with this process is that the INPUT files (temperature and history) are by station, rather than combined into one large file with all the stations, such as those that we get from the USHCN v2 temperature site. Furthermore, the .HIS files seem to be in a very different format from the one station.history file I found here, and I could not find where the .HIS files used to generate the F52 dataset might be located.
Neither could I find any quick way to generate these HIS and individual station temperature files. Thus, I wrote a Java app that will take arguments for the USHCNv2 temperature file (such as TOB), station history file, and station meta-file (e.g. ushcn-v2-stations.txt), and will output the collection of HIS and temperature files per station, and a station meta-file that has all stations without temperature data removed from it. Right now I only export the temperature files with the suffix “_avg.raw”, although this can be easily changed to min/max and tob if desired.
I’m still not sure if these HIS files are in the correct format. There seems to be some discrepency between the only sample HIS provided (“011084.his”) and the description of format of this HIS file (which according to “USHCN.v2.0.20090806.doc” is “SHF_tob_inst.txt”). Furthermore, the station.history file does not seem to include all the information that can possibly be entered in a HIS file, so my Java app will leave some fields blank.
Compiling and Running
From here, most things ran pretty well following the quickstart guide. However, there were three script files that seemed to have minor bugs:
gen_corr.sh, gen_homgog.sh, and gen_fillin.sh
All three of these have the same logic for processing the input type argument:
if [ $elem == “max” ]
then
export iel=1
export gel=”tmax”
echo “MAX”
elif [ $elem == “min” ]
then
export iel=2
export gel=”tmin”
echo “MIN”
elif [ $elem == “pcp” ]
then
export iel=4
export gel=”prcp”
echo “PCP”
else
echo “Unknown element: $elem”
exit
fi
The BASH interpreter does not seem to like the C/Java style “==”, and thus complains about the argument. Removing one of the ‘=’ does the trick. Furthermore, it is missing an entry for the “avg” choice, so you’ll need to add something like:
elif [ $elem = “avg” ]
then
export iel=3
export gel=”tavg”
echo “AVG”
Finally, I made the mistake of originally only allocating 512 MB of RAM to the VM. This meant any attempt to run the PHA on any “realworld” scenario was swiftly met by the OOM killer. I tried to fiddle around with the .incl files to lower the array sizes at first, which led to a whole host of errors. In the end, simply increasing the RAM for the VM got it to run ok, although it took a couple of hours.
Results
My original goal was to simply see if I could go from the TOB AVG dataset to the F52 AVG dataset. If you want to see the result of this test run, it is included in the package here. A quick diff, however, revealed a number of differences. This was further confirmed when I compared the result of the first station, which shows the USHCNv2 TOB.avg file, the USHCNv2 F52.avg file, and my test run using the USHCNv2 TOB.avg file as input:
As you can see, there is quite a difference. I’ll need to read up and run a few more tests to figure out why this is happening, but there are quite a few possibilities:
- I tried to go from TOB–>AVG, but I passed “raw” as the parameter to the scripts. My initial thought was that this simply told what file extension to look for, but perhaps extra processing is done as well.
- I don’t have the actual station HIS files for input.
- I don’t know exactly what values were set for various INCL parameters, such as “neighborhood” stations and the like.
- Perhaps the F52 is formed by averaging F52_max and F52_min, which would mean running the algorithm on MAX and MIN separately?
As I said, this takes several hours to run on my machine, so it’s not easy to quickly test where I might have gone wrong.
Not all is lost, however. For the period from 1970-2000, my average anomaly calculations (using 2.5×3.5 gridded) for the U.S. look almost exactly the same when comparing the F52 dataset downloaded from USHCNv2 vs. the F52 created from my run of the PHA (r-squared=.9998). Furthermore, both match up rather well with the GISS anomalies during that period:
The GISS slope is .288 C/Decade, the slope from my average calculated from USHCNv2 F52 is 0.298 C/decade, and the slope of my average calculated from my generated F52 is 0.292 C/decade. Compare that to the average I get from the inputted TOB dataset at 0.257 C/Decade, and it is clear that the PHA run did indeed seem to bring it more in line with the F52 we get from the USHCN website.
For my purposes this may be enough, and I may proceed with my analysis a bit more before coming back to some of these unanswered questions.
Update 1/3: Many of these lingering questions have been answered in a subsequent post here. To summarize, I won’t be able to reproduce the F52 dataset because the HIS files are not currently available, and because the official PHA run also uses a COOP temperature database that also has yet to be released.
Hello;
I’m curious about your impression about the feasibility of testing the PHA portion with synthetic databases.
I’m certainly not a statistician, but I think it would be important to run test cases to examine and elucidate the behaviours of the PHA code. I wonder things like, ‘Are all records equally persuasive in their influence on the outcome?’ or ‘are certain data-shapes preferred?’, etc.
I expect that statisticians would apply more sophisticated tests to validate the PHA approach to transformation of data.
This seems to be a pivotal question that has not been resolved, because the arcane implementation details are an effective impediment to external ‘validation’ exercises.
I guess that, for credibility, a ‘forensic tester’ would need to qualify themselves by reproducing the NASA results perfectly.
Also, how would you assess the feasibility of testing the PHA ‘module’ in a standalone situation?
Is it inextricably woven into the GISS tapestry?
You seem to be among the very few folks that have even begun to attempt this challenging work.
Best,
RR
Comment by Ruhroh — December 28, 2010 @ 10:40 am
Hi Ruhroh,
The PHA can definitely be tested in a stand-alone situation. It produces the F52 dataset available in USHCNv2 (although you must merge the output files to get 1 file), and the only way this is linked to GISS is that GISS uses the F52 file as input. The adjusted files can be looked at independently of GISS. However, if you wanted to get an idea of how certain changes affect the trend, you can use a spatial averaging algorithm like the one I’ve included with this post.
It can also be tested with synthetic datasets. A few of these tests are performed in ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/menne-williams2009.pdf
It would certainly be interesting to create a few different UHI “infected” datasets and see to what degree the PHA is able to eliminate the effect.
Comment by troyca — December 28, 2010 @ 11:18 am
[…] I suggested in the original post documenting my experience, one of my goals was to reproduce the USHCNv2 F52 dataset using the […]
Pingback by More on running the USHCNv2 PHA (Response from Dr. Menne) « Troy's Scratchpad — January 3, 2011 @ 12:09 am
[…] now that I’m up a running with the USHCNv2 Pairwise Homogeneity Algorithm, I’ve begun my very amateur process of “testing” it using synthetic datasets. […]
Pingback by Testing the PHA with synthetic data – Part 1 « Troy's Scratchpad — January 14, 2011 @ 8:56 pm