Issue No. 04 - July/August (2007 vol. 9)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2007.79
The First and Last Words
Francis Sullivan's thoughts on programming errors ("Wrong Again!" vol. 9, no. 3, p. 96) speculate that formal verification of computer programs is unlikely. It's worth noting that Bertrand Meyer's theory of "design by contract" was inspired by the work of theorists in formal proofs of programs, but being a person in need of something that actually helped him get programs right, rather than producing papers in journals, he came up with a very workable approach. DBC consists of a language for assertions that can be optionally checked at runtime, backed with a theory about DBC's relationship to inheritance and user-supplier relationships. In his language (Eiffel), Meyer was able to make this approach fully integrated into the language. In other languages, only a poor imitation can be achieved, and yet that poor imitation is often very powerful. I have found it by far the most effective approach to correctness for scientific computing.
When I helped write the Eiffel mathematics library, EiffelMath, which encapsulated the NAG C library for use in Eiffel, I was amazed at the effectiveness of the approach. I subsequently used the ideas in C++ projects to great effect. My understanding is that this was planned for Java but omitted due to a rush to vend. If so, that's a real pity. I urge computational scientists to familiarize themselves with this theory.
I also read your First Word ("You're Recommending What?!" vol. 9, no. 3, p. 2), and got a big kick out of it. Today at work, we were discussing the younger members of the team who've decided that they don't want or need a developer's manual but instead want a Wiki. Apparently they can't read off paper any more.
An Internet bridge partner of mine is married to a famous Canadian politician. She's named Julia and has two children, but her Wikipedia entry said she was Joan and had three. I wrote her an email and said maybe there was something going on that she didn't know about; when we all finished laughing, of course, her son knew how to fix it.
I was jokingly going to tell you that the next thing that would happen is videos instead of user manuals, but it already happened. For example, http://showmedo.com/videos/python has more than 90 videos about Python, including one on Django. Some are in German, but that's OK—the kids just look at the pictures.
Time to retire, because I don't get it.
Paul F. Dubois
A Plea for Python
I'm sure I'm not the only one who finds it amusing that in an entire issue devoted to the benefits of Python for scientific computing, the magazine offers a book review of a text that uses four different computer languages, none of which is Python. I'd feel a lot better about adopting a new course in my computing if I could find a good book on scientific computing with Python and its tools and libraries. I've found a few tutorials but nothing substantial. Are any authors rushing such a book into production? How many college campuses are using this for their course work?
When you next visit the Python topic, please list numerical books and college courses in which Python is the basis for performing the computations.
Gauss-Vaníček or Fourier Transform?
From J. John Sepkoski's record of the marine animal genera that appear in the overall fossil record, 1 Robert Rohde and Richard Muller 2 extracted a time series of marine animal diversity and analyzed it using the discrete Fourier transform. They found a highly significant 62-million-year periodicity in the series, detrended by a cubic. This time series has 167 data points, and the times aren't uniformly distributed, so to use the Fourier transform, Rohde and Muller assumed marine diversity to be constant between data points and evaluated it at times equally spaced at 0.25 million years to obtain a series of some 2,170 terms. They then fit a cubic polynomial to that series and examined the residuals—the series minus the cubic. Finally, they extended the residual series with zeroes so that the extended series' Fourier power spectrum would have densely distributed frequencies and to enable the use of the fast Fourier transform. Thus, the original series of 167 terms appears as a ghost in the series for which the Fourier power spectrum (FPS) was actually computed.
The Gauss-Vaníček power spectrum (GVPS) of a time series is a measure of how well various frequencies' harmonic functions fit the time series, doesn't require data to be equally spaced in time, can be computed for every frequency without needing to extend the original series, and is possibly more suitable than the Fourier transform for analyzing time series such as the marine diversity series. In an article that appeared in this magazine last year, Mensur Omerbashich 3 computed the GVPS of the nondetrended marine genera diversity time series and found that no significant 62-million-year periodicity appeared in the unmodified series, reporting instead significant periodicities of 194 and 140 million years.
But in fact, as I show on p. 61 of this issue, the Gauss-Vaníček spectral analysis of the diversity time series detrended by a cubic exactly matches Rohde and Muller's Fourier transform analysis. Moreover, neither the FPS nor the GVPS identifies any significant periodicity in the unmodified diversity time series. Thus, in analyzing the diversity time series, paleontologists need decide only whether to detrend the data: Gauss-Vaníček and Fourier spectral analyses yield identical results.
James L. Cornette
Mensur Omerbashich replies:
Based on his own power-spectra (PS) of Sepkoski data detrended by a specific function, James Cornette imputes that detrending in general is the modification vital for assessing my paper and the FSA, and speaks as if I used Gauss-Vaníček power-spectra (GVPS) to draw my periods. Neither applies. First, the detrending is excluded from the Gauss-Vaníček spectral analysis (GVSA) when variance-spectra (VS) are used for verification purposes because detrending doesn't comply with the raw data-only physical criterion accompanying the GVSA by definition, and because variance feeds on noise. Noise is the most natural gauge available for this sort of verification, so more noise means clearer VS-PS separation; conventional noise thus becomes part of the signal for verification purposes. Second, the variance gives the most natural description of noise, whereas only VS can measure the signal imprint strength in noise uniquely, so it is GVVS, not GVPS, that are useful for SA method verification purposes. Then to draw periods, I used VS only, with their depicted 99 percent confidence levels. Cornette didn't ask for clarification, although my final printed article had the unsolvable color coding, with blue, red, and brown depicting the GVVS of Rohde-Muller-adapted data, the GVVS of unmodified (depadded and derepeated) and nondetrended data, and the GVVS's 99 percent confidence levels, respectively. I plotted all three entities in black in my original submission, with GVPS in gray. [ See the original illustration reprinted at the end of this letter.— eds.] Demonstrably confused, Cornette then arbitrarily chose to compare GVPS with FPS on the unmodified and modified cubic-detrended data, respectively, equating such an unnatural (indiscriminative in manipulation type) mix of data treatment approaches with my comparison of the Rohde-Muller FPS of manipulated (modified and detrended) data versus the GVVS of non-manipulated (raw) data. Comparing FPS versus GVPS on the altered data only doesn't constitute a physically independent verification because this merely computes two PS that by definition (regardless of SA method) must react in the same way to the same data alteration (the detrending by a cubic), which they do, as Cornette trivially shows. Note I was denied preprint access to the result he states is "shown on p. 61," so my response could be incomplete in addressing his claim's technical aspects—the meaning of his " exactly" and VS if any.
The objection Cornette attempted to make opens topics that are more fundamental than the issue of whether he understood the GVSA and my paper entirely. Besides making a negative verification of a previous report claiming a new period in a fossil record, my paper showed that the paleontological paradigm, which assumes diversities are completely known (real and static once they occur), is unsound. This frail assumption is what allows researchers to fill sparse records by repeating (here, around 90 percent of) the data until the record is made fully "populated" and thus Fourier-ready. I challenged this paradigm by showing that PS of a manipulated data set can differ significantly from VS of the respective raw data. Cornette, on the other hand, computes FPS and GVPS to the best of the two methods' mathematical abilities (by twice using the cubic-detrended data) but not their natural abilities (by using the manipulated versus raw data). Based on those two PS giving statistically indistinguishable results while differing significantly from my result, he concludes that all one has to decide prior to selecting a SA method is whether one wants to indiscriminatingly detrend one's data or not. However, there is no reason why PS of modified should be the same as PS of the respective unmodified non-raw data that were arbitrarily (say, other-than-cubic-) detrended; this would require that the detrending be universal, that is, insensitive to the type of a detrender function, which is a nonsense. Only if data repetition in inherently sparse paleontological records were tolerably natural (and not just another mathematical trick) would Cornette's "any detrending" concept make sense from the physical viewpoint, that is, beyond any doubt and in all real-life situations regardless of SA method. If that were the case here—if an unspecified combination of data repetitions and detrending weren't significantly affecting the spectrum—then the claimed period would have remained significant in VS of the non-manipulated Sepkoski record as well, just as the record's longest 194- and 140-million year-long periods have. It's chiefly because this wasn't the case that the claimed period doesn't seem real, and not just because its VS estimate differs a few percent from the claimed value. Despite this obvious inconsistency and based on his ill logics (it's unclear how he computes "FPS… in unmodified… time series" when the FSA can't even process gapped records), Cornette proposes we forge the entire approach to SA verification: "compare" the two methods by comparing FPS of somehow-completed and somehow-detrended versus GVPS of unmodified and somehow-detrended data. But two wrongs never add up to one right so somehow and detrending become the issue, and all his procedure does is verifying not the accuracy of the two methods themselves, but whether GVPS and FPS give precisely the same result when compared using the same vector of real numbers twice. Cornette thus objects to a fact: after applying an alternative method to the fullest of its abilities, I primarily challenged a disputable (method-driven) paradigm of a posteriori static diversities. I didn't examine the effects of the detrending as stand-alone but as applied on an extremely sparse record—a situation likely to contribute significant noise. It's a combination of the detrending and data sparseness that could have had influenced the claimed result significantly; Cornette had no alternative ways of checking for such combined effects because VS represent the only means to do that, and he diverged to using PS only.
Because paleontology relies on the paradigm that I challenged, at long last it might be obvious why climatology is so dreadfully controversial: perhaps we've hit the wall when it comes to the reliability of the tools used for analyzing sparse (erratic?) records. Any manipulations we perform on raw data actually manipulate the public as a whole. Still most researchers pretend they know what's going on when treating sparse or otherwise hard-to-understand records (for instance, what's the cost of "detrending," "ghosts," and other poorly understood procedures and phraseology in the FSA?). Instead of trying to understand (the limits on usefulness of) such data, we began at some point with "modeling" (arbitrarily adjusting, in fact) our own understanding of such data, which we then sell to the public as the analyses of the data themselves; no wonder controversies abound. Mathematicians like Fourier (Cornette) often create imaginary worlds in which, quite Nietzscheanly, the cause justifies the manipulation. Such concepts ambush all physical sciences ever since A. Einstein half-jokingly proposed overthrowing H.R.M. king Data. But just as its indefinability prevents the fake king Time from being re(oy)ally spasmatic, so are only the raw data—thy king.