The Community for Technology Leaders

From the Editors: Reproduced and Reproducible Results


Pages: pp. 3-4

The punchline of this message is that computation changes everything, high-speed computation changes everything quickly, and lots of people doing lots of high-speed computation changes everything a lot.

In the past few years, many excellent books describing virtually all aspects of the Manhattan Project have appeared: its genesis and development, follow-on projects, and the ongoing consequences for the people involved, the organization of big science, and the global political situation. Various reasons explain recent interest in it: this year is the 60th anniversary of the Los Alamos Lab; we're living in very unsettled times with topics like nuclear weapons and the Lab's security very much in the news; and general concern about security and espionage. And there are probably other, deeper reasons of which I'm unaware.

It's slightly arbitrary, but we can take as a starting point of this recent wave of writings two remarkable works by Richard Rhodes: The Making of the Atomic Bomb (Touchstone, 1995) and Dark Sun (Touchstone, 1996). Once Rhodes started the trend, many other authors followed: S.S. Schweber's In the Shadow of the Bomb: Oppenheimer, Bethe, and the Moral Responsibility of the Scientist (Princeton Univ. Press, 2000), Greg Herken's Brotherhood of the Bomb (Henry Holt & Company, 2002), Robert S. Norris's biography of General Groves, Racing for the Bomb (Steerforth, 2003), and, most recently, Brian VanDeMark's Pandora's Keepers (Little, Brown, & Company, 2003).

Like the other books on this list, Pandora's Keepers is a scholarly work that is simultaneously well researched and amazingly readable. Because it is scholarly, it contains many pointers to other writings, including just about all the books just mentioned. Unfortunately, it also contained some 30 passages that appear almost verbatim in the earlier works. I, for one, very much doubt that the copying was intended. But from a legal viewpoint, this could be plagiarism (even if that were not the intent), so the passages will be removed, and the book will be reissued as a paperback only.

This episode got a surprising amount of attention in the national press. In fact, the New York Times Sunday Book Review even had an editorial about it. Interestingly, it brings to mind several things relevant to computational science, including consideration of the difference between experts and laymen, the nature and definition of originality, the distinction between verifying and borrowing, and how this definition applies to computation and the results of computation. On the subject of experts and laymen, I'll mention that I'd read all the other books on the list before reading Pandora's Keepers. Although it was obvious that it covered much of the same ground, I certainly didn't notice any direct borrowing. But reading the other books is nothing like writing them, and the other books' authors noticed the duplications right away. They were expert historians; I am a layman.

Part of the difficulty with definitions of "copying" is that some kinds of borrowing are considered okay in literature and certain kinds are okay in science. In literature, borrowing plot ideas is common. The plot of King Lear shines through Jane Smiley's A Thousand Acres and Joan Didion's River Run; West Side Story takes its plot from Romeo and Juliet. One of the most interesting plot histories is that of Hamlet. According to Giorgio de Santillana and Hertha Von Dechen (in their book Hamlet's Mill), the story goes back from Shakespeare's Hamlet; to Amleth of Denmark; to Livy's account of Lucius Junius Brutus in Rome; to the Kalevala, the national epic of Finland; to Kai Khusrau in Firdausi's Shahnama (the Book of Kings), the national epic of Iran; all the way to Yudhishthira in India's ancient epic, The Mahabharata. Maybe the old saying that there are only four basic plots is true!

With one important difference, the situation is the same in science as it is in literature. Experts notice things that laymen would miss. Straight copying is definitely frowned upon, but certain themes occur again and again. In computing, using open-source software, borrowing, or re-using is even considered good and desirable. The one important difference between science and almost anything else is that in science we want and, in fact, must have repeatability. Experimental results that can't be repeated in other labs are suspect, and computations that can't be generated on other machines are just about useless.

As it happens, generating nonrepeatable results via high-speed computation is alarmingly easy. Before IEEE arithmetic became essentially standard, the iteration x k+1 = 2 x k mod 1 gave different results on different machines. These days, arithmetic is better understood, but many scientific computations increasingly have an iteration like the one just described as their innermost loop, simply because Monte Carlo is used and a random number generator must be run many, many times. Such calculations may spread over many machines—some running Java numerics, some C, some C++, and some Fortran. Different locations on Earth have different rates of occurrence of soft errors and different processors have different clocks. If the Monte Carlo computations are aimed at sampling from a well-defined limit distribution, most of this is irrelevant. However, when we use dynamic Monte Carlo to determine time-dependent behavior, the situation is much more delicate. It looks like we're going to need a new, more flexible definition of reproducibility.

As I was saying, computation changes everything...

58 ms
(Ver 3.x)