Saturday, November 21, 2009

One CRU Email...

This morning, after my tea took effect and jogged my brain cells into something approaching operational capability, I decided to do a little original research into the 160 megabytes of data illegally hacked from the CRU.  My first attempt was to search for any files containing “FOIA” or “withhold”.  FOIA stands for Freedom Of Information Act, the legislation that gives Americans the right to request copies of any information created with public funds (as nearly all climate research in the U.S. is).  I chose this first search because I wondered if there was interesting material in this collection related to something that has bothered me the most about the climate research: the secrecy and opacity of the data used to drive the climate models.

One of the bedrock principles of the scientific method is the notion of reproducability.  Any sciencist proposing something new must provide the data required to let other scientists reproduce his or her results.  For example, if Alvin the physicist claimed to have found a way to create desktop fusion, he's obligated to provide all the information necessary for other physicists to reproduce – and thus verify or debunk – his work and his results.

Something that's bothered me from the beginning about the global warming proponents – the scientists, I mean – is that they have not published the data underlying their results.  They have persistently refused to do so, even when many credible arguments questioning their data sources exist.  To pick on just one example of this, the climate models all take as one of their inputs a series of surface temperature data.  This is an example of an important piece of data that must be disclosed to make the results reproducible.

These data are not simple records of a thermometer somewhere, as those thermometers did not exist over the entire time period of the data.  Instead, they are a complex amalgam of many sources of data, including both direct data (e.g., thermometer readings) and proxy data (e.g., the width of tree rings).  These data are a real challenge to try to distill into a single, simple series of temperatures over time.  There are many reasons: each set of contributing data covers different time periods, they have different granularity (there may be hourly thermometer readings, but tree rings can at best give you a guess at an annual average temperature), there may be missing readings, some readings may be anomalous for reasons that can't be determined now, different sets of data may have conflicting results, etc., etc.

The IPCC report was criticized by many, in my view completely rightfully, for not disclosing the means by which the temperature series data set was created from the underlying data.  The derived temperature series data was made available to all, and more recently (because of a FOIA request) the underlying data was made available – but the methods used in distilling the underlying data into the temperature series data has never been disclosed.  The scientists involved did their level best – and largely succeeded – in treating this as proprietary data that need not be disclosed.  The critics and skeptics have looked particularly hard at this temperatures series data because, frankly, it looks suspicious – there are many, many places where it simply looks implausible.

So, with that context, here's one of the emails I found in my search this morning.  I have bolded some of the passages I found particularly interesting and put a footnote number in, but otherwise this is an unedited copy of the email found in file 1231257056.txt:
From: Stephen H Schneider
To: santer1@llnl.gov
Subject: Re: [Fwd: data request]
Date: Tue, 6 Jan 2009 10:50:56 -0800 (PST)
Cc: "David C. Bader" , Bill Goldstein , Pat Berge , Cherry Murray , George Miller , Anjuli Bamzai , Tomas Diaz De La Rubia , Doug Rotman , Peter Thorne , Leopold Haimberger , Karl Taylor , Tom Wigley , John Lanzante , Susan Solomon , Melissa Free , peter gleckler , "Philip D. Jones" , Thomas R Karl , Steve Klein , carl mears , Doug Nychka , Gavin Schmidt , Steven Sherwood , Frank Wentz

"Thanks" Ben for this, hi all and happy new year. I had a similar experience--but not FOIA since we at Climatic Change are a private institution--with Stephen McIntyre demanding that I have the Mann et al cohort publish all their computer codes for papers published in Climatic Change.  I put the question to the editorial board who debated it for weeks. The vast majority opinion was that scientists should give enough information on their data sources and methods so others who are scientifically capable can do their own brand of replication work, but that this does not extend to personal computer codes with all their undocumented sub routines etc. It would be odious requirement to have scientists document every line of code so outsiders could then just apply them instantly1. Not only is this an intellectual property issue, but it would dramatically reduce our productivity since we are not in the business of producing software products for general consumption and have no resources to do so. The NSF, which funded the studies I published, concurred--so that ended that issue with Climatic Change at the time a few years ago.

This continuing pattern of harassment, as Ben rightly puts it in my opinion, in the name of due diligence is in my view an attempt to create a fishing expedition to find minor glitches or unexplained bits of code--which exist in nearly all our kinds of complex work--and then assert that the entire result is thus suspect2. Our best way to deal with this issue of replication is to have multiple independent author teams, with their own codes and data sets, publishing independent work on the same topics--like has been done on the "hockey stick". That is how credible scientific replication should proceed.

Let the lawyers figure this out, but be sure that, like Ben is doing now, you disclose the maximum reasonable amount of information so competent scientists can do replication work, but short of publishing undocumented personalized codes etc. The end of the email Ben attached shows their intent--to discredit papers so they have no "evidentiary value in public policy"--what you resort to when you can't win the intellectual battle scientifically at IPCC or NAS.
 Good luck with this, and expect more of it as we get closer to international climate policy actions, We are witnessing the "contrarian battle of the bulge" now, and expect that all weapons will be used.
   Cheers, Steve
PS Please do not copy or forward this email3.

Stephen H. Schneider
Melvin and Joan Lane Professor for Interdisciplinary Environmental Studies,
Professor, Department of Biology and
Senior Fellow, Woods Institute for the Environment
Mailing address:
Yang & Yamazaki Environment & Energy Building - MC 4205
473 Via Ortega
Ph: 650 725 9978
F:  650 725 4387
Websites:  climatechange.net
           patientfromhell.org


----- Original Message -----
From: "Ben Santer"
To: "Peter Thorne" , "Leopold Haimberger" , "Karl Taylor" , "Tom Wigley" , "John Lanzante" , "Susan Solomon" , "Melissa Free" , "peter gleckler" , "Philip D. Jones" , "Thomas R Karl" , "Steve Klein" , "carl mears" , "Doug Nychka" , "Gavin Schmidt" , "Steven Sherwood" , "Frank Wentz"
Cc: "David C. Bader" , "Bill Goldstein" , "Pat Berge" , "Cherry Murray" , "George Miller" , "Anjuli Bamzai" , "Tomas Diaz De La Rubia" , "Doug Rotman"
Sent: Tuesday, January 6, 2009 9:23:41 AM GMT -08:00 US/Canada Pacific
Subject: [Fwd: data request]

Dear coauthors of the Santer et al. International Journal of Climatology paper (and other interested parties),

I am forwarding an email I received this morning from a Mr. Geoff Smith.  The email concerns the climate model data used in our recently-published International Journal of Climatology (IJoC) paper. Mr. Smith has requested that I provide him with these climate model datasets. This request has been made to Dr. Anna Palmisano at DOE Headquarters and to Dr. George Miller, the Director of Lawrence Livermore National Laboratory.

I have spent the last two months of my scientific career dealing with multiple requests for these model datasets under the U.S. Freedom of Information Act (FOIA). I have been able to do little or no productive research during this time. This is of deep concern to me.

From the beginning, my position on this matter has been clear and consistent. The primary climate model data used in our IJoC paper are part of the so-called "CMIP-3" (Coupled Model Intercomparison Project) archive at LLNL, and are freely available to any scientific researcher. The primary observational (satellite and radiosonde) datasets used in
our IJoC paper are also freely available. The algorithms used for calculating "synthetic" Microwave Sounding Unit (MSU) temperatures from climate model data (to facilitate comparison with actual satellite temperatures) have been documented in several peer-reviewed publications. The bottom line is that any interested scientist has all the scientific information necessary to replicate the calculations performed in our IJoC paper, and to check whether the conclusions reached in that paper were sound.

Neither Mr. Smith nor Mr. Stephen McIntyre (Mr. McIntyre is the initiator of the FOIA requests to the U.S. DOE and NOAA, and the operator of the "ClimateAudit.com" blog) is interested in full replication of our calculations, starting from the primary climate model and observational data. Instead, they are demanding the value-added quantities we have derived from the primary datasets (i.e., the synthetic MSU temperatures).

I would like a clear ruling from DOE lawyers - ideally from both the NNSA and DOE Office of Science branches - on the legality of such data requests. They are troubling, for a number of reasons.

1. In my considered opinion, a very dangerous precedent is set if any derived quantity that we have calculated from primary data is subject to FOIA requests4. At LLNL's Program for Climate Model Diagnosis and Intercomparison (PCMDI), we have devoted years of effort to the calculation of derived quantities from climate model output. These derived quantities include synthetic MSU temperatures, ocean heat
content changes, and so-called "cloud simulator" products suitable for comparison with actual satellite-based estimates of cloud type, altitude, and frequency. The intellectual investment in such calculations is substantial.

2. Mr. Smith asserts that "there is no valid intellectual property justification for withholding this data". I believe this argument is incorrect. The synthetic MSU temperatures used in our IJoC paper - and the other examples of derived datasets mentioned above - are integral components of both PCMDI's ongoing research, and of proposals we have submitted to funding agencies (DOE, NOAA, and NASA). Can any competitor simply request such datasets via the U.S. FOIA, before we have completed full scientific analysis of these datasets?

3. There is a real danger that such FOIA requests could (and are already) being used as a tool for harassing scientists rather than for valid scientific discovery. Mr. McIntyre's FOIA requests to DOE and NOAA are but the latest in a series of such requests. In the past, Mr. McIntyre has targeted scientists at Penn State University, the U.K. Climatic Research Unit, and the National Climatic Data Center in
Asheville. Now he is focusing his attention on me. The common denominator is that Mr. McIntyre's attention is directed towards studies claiming to show evidence of large-scale surface warming, and/or a prominent human "fingerprint" in that warming. These serial FOIA requests interfere with our ability to do our job.

Mr. Smith's email mentions the Royal Meteorological Society's data archiving policies (the Royal Meteorological Society are the publishers of the International Journal of Climatology). Recently, Prof. Glenn McGregor (the Chief Editor of the IJoC) provided Mr. McIntyre with the following clarification:

"In response to your question about data policy my position as Chief Editor is that the above paper has been subject to strict peer review, supporting information has been provided by the authors in good faith which is accessible online (attached FYI) and the original data from which temperature trends were calculated are freely available. It is not
the policy of the International Journal of Climatology to require that data sets used in analyses be made available as a condition of publication."

As many of you may know, I have decided to publicly release the synthetic MSU temperatures that were the subject of Mr. McIntyre's FOIA request (together with additional synthetic MSU temperatures which were not requested by Mr. McIntyre). These datasets have been through internal review and release procedures, and will be published shortly on PCMDI's website, together with a technical document which describes how synthetic MSU temperatures were calculated. I agreed to this publication process primarily because I want to spend the next few years of my career doing research. I have no desire to be "taken out" as scientist, and to be involved in years of litigation.

The public release of the MSU data used in our IJoC paper may or may not resolve these problems. If Mr. McIntyre's past performance is a guide to the future, further FOIA requests will follow. I would like to know that I have the full support of LLNL management and the U.S. Dept. of Energy
in dealing with these unwarranted and intrusive requests.

I do not intend to reply to Mr. Smith's email.

Sincerely,

Ben Santer
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel:   (925) 422-3840
FAX:   (925) 422-7675
email: santer1@llnl.gov
----------------------------------------------------------------------------

Stephen Schneider is a professor at Stanford University, and one of the more prominent and visible scientists who are proponents of anthropogenic global warming (though he wasn't always so).  Most of his work is publicly funded.  Ben Santer is an equally prominent climate researcher and key contributor to the IPCC report, working at Lawrence Livermore National Laboratory (100% public funding).

Footnotes:

1.  Schneider is arguing here that the scientists should be able to keep their code secret.  But the code is exactly where the methods used to distill the underlying data into the temperature series are detailed – so he is rather directly arguing that this key part of their research should remain secret.  In my opinion that is a priori anti-science.  What makes this even more egregious is that these methods were developed with public money, and they are therefore the public's property!

2.  Yes, Dr. Schneider – we harassers have the unmitigated gall to suggest that if the foundation of your work is suspect, then so is the rest of your work.  Sheesh!  This is not the kind of thing that someone seeking the truth would say, but it is most definitely the kind of thing someone with a vested interest in the status quo would say.

3.  Why not copy or forward this email, Dr. Schneider?  Are you ashamed of your attitude?  You dang well ought to be!

4.  This is a fascinating and revelatory statement for me.  Here Dr. Santer is making the case that the data in question is his (or his team's), and should not be publicly accessible.  What an interesting attitude for someone who works for me (and all the other taxpayers) to take! It's as if I went to my boss at work and said “You know all that code I wrote last year, that you paid me to write?  Well, it's mine, I tell you!  All mine!”  I'm pretty sure I know where that attitude would get me...

2 comments:

  1. If the original data and the derived data is made available, as well as the algorithm to produce derived data from original data, then there is no need to make the code available. There's sufficient information available for reproducibility.

    What Schneider is saying is that "competent scientists" can reproduce the work, and that's correct. Sufficient information is available for them to do that. But lay people whose only intention is to discredit work rather than do any research themselves cannot pick on the STYLE of the code, rather than whether or not the code is correct.

    It's a very sensible approach - it satisfies freedom of information while protecting against time wasters.

    And why does he not want to distribute the email? Because people will pick apart the email, further wasting his time. Santer said he has spent a full two months dealing with FOIA requests, rather than doing any actual research. I'm sure Schneider doesn't want that happening to him too.

    Regarding your point #4, he actually did release the data didn't he? He also explains why he considers it an intellectual property issue - because the data is used in funding requests. He's not saying the data is his, but that he needs protection from the data being used by other scientists to obtain funding.

    PS I am both a researcher and a programmer, so I also understand how much work it is to document and verify code, and how much work it is to present data for public consumption. So I feel empathy with Santer and Schneider here :)

    ReplyDelete
  2. Code should always be made available for a peer-review. It can contain errors like any other moment in the researcher's job.

    Brians comment about code secrecy and work involved in it does not hold. Research work, writing etc also nees much work. So should we make the whole research work secret just because they contain much work? (You know the answer yourselves...)

    ReplyDelete