January/February 2012 (Vol. 14, No. 1) pp. 5-8
1521-9615/12/$31.00 © 2012 IEEE
Published by the IEEE Computer Society
Published by the IEEE Computer Society
|Object-Oriented Scientific Programming—Without Reinventing the Wheel|
|Patterns Are Useful, but What about Performance?|
|Algorithm Primer for Nanoscience|
|The Niche, the Missing, the Promise|
PDFs Require Adobe Acrobat
This issues reviews include a book on object-oriented design of scientific software that explores useful design patterns for scientific computation and an algorithm primer for nanoscience.
Object-Oriented Scientific Programming—Without Reinventing the Wheel
Damian Rouson, Jim Xia, and Xiaofeng Xu, Scientific Software Design, The Object-Oriented Way, Cambridge Univ. Press, 2011, ISBN: 978-0-521-88813–4, 382 pp.
This book on object-oriented design of scientific software presents design patterns that the authors have found to be applicable and useful in scientific computation. Largely driven by examples, the book includes many close-to-real scientific applications and uses the Fortran and C++ programming languages, with some emphasis on the former. The level is appropriate for moderately experienced programmers. As a textbook for a graduate course in scientific object-oriented programming, additional material would be needed because this book focuses on program design. More experienced scientific object-oriented programmers (who perhaps were never quite convinced about the utility of design patterns) might learn a trick or two as well—or might realize that they've occasionally reinvented the wheel.
The book is divided into three parts. The three chapters in Part I ("The Tao of Scientific OOP") deal with general aspects of development costs, how object-oriented programming can help reduce these, and how this applies specifically in a scientific context.
Part II, "SOOP to Nuts and Bolts," (where "SOOP" stands for scientific object-oriented programming), has six chapters that describe useful design patterns for scientific applications, such as the abstract calculus pattern and the puppeteer pattern. The final part, "Gumbo SOOP," contains chapters that discuss formal constraints and mixed-language programming, offer a brief introduction to parallel programming, and showcase multiphysics architectures.
The book also includes two appendices at the end—the first on the mathematical background used in the examples (Lagrange interpolation, linear and nonlinear solvers, partial differential equations, and finite difference approximations) and the second on the Universal Modeling Language (UML). It might have been more useful to have the latter as part of the main text, because UML diagrams aren't intuitively understandable to the uninitiated, and they're used throughout the book.
Patterns Are Useful, but What about Performance?
One of the book's strengths is that it demonstrates the utility of design patterns in scientific computing by showing how they solve real software design problems. The book also shows that the choice of programming language is, to a large degree, irrelevant to the software design. The choice of C++ and Fortran 2003 is beneficial in that they both support abstract data-type calculus through operator overloading, a technique of which the authors are strong proponents. When well designed, such a calculus helps with code maintenance by making the programmer's intention much clearer and reducing the code's size.
Another welcome aspect of the book is its discussion of the debugging qualities of codes. All too often, textbooks pretend that this aspect of programming doesn't exist. Here, the debugging models are a bit crude and the discussion ignores some of the tools that facilitate and speed-up debugging (such as gdb and valgrind), but this doesn't invalidate the main point: that a modular, object-oriented approach greatly reduces the amount of code to inspect, and thus reduces debugging time.
The detailed code examples are the book's best feature. In fact, the book's codes are not so much examples as an integral part of the text (despite being referred to as "figures"). The examples here stay closer to real scientific applications than those in most object-oriented textbooks. This helps the authors avoid both the easier problems (for which almost any design would work well) and artificially constrained problems (for which only one design works, but which you're unlikely to ever encounter).
The book does have a few weaknesses as well. It has a slight philosophical slant, which can distract from the book's valuable practical aspects. For example, the explanation of design patterns' role in architecture is much longer than is needed for readers to appreciate the discussion of doing object-oriented programming in Fortran 2003 (and the associated challenges) or how you might set up a multiphysics application. There's also a tendency to invoke the authority of the authors of the 1994 book, Design Patterns: Elements of Reusable Object-Oriented Software, rather than present an argument for why that particular pattern arises.
A more substantial weakness is that this text doesn't consider runtime performance, focusing instead on a scalable design. In fact, the authors suggest that scalable design will lead to scalable performance. This works at the course-grained level of the program's design, but is potentially counter-productive when applied to the fine-grained level. For example, the C++ implementation of the vortex class (which models a ring of 3D points) uses the push_back method of the standard template library class std::vector to set variables in an uninitialized vector, even though that vector will always hold three floating-point numbers. This is inefficient because, for each component of each point of the ring, memory for the vector might have to be allocated dynamically. It's also confusing, because a dynamically sized vector is conceptually different from a fixed-sized array. In a scientific context, problem sizes tend to grow, so addressing this performance issue only at scale would require a rewriting of the code's fine-grained details, which is a risky undertaking that easily overshadows the initial advantage in maintainability. For low-level parts of your code, you should know and care about the inefficiencies of different object-oriented methods. In the current case, a simple fixed-size array would have sufficed and would likely be two to three times faster.
A minor, temporary weakness of the book is its use of Fortran 2003, which isn't fully supported by all mainstream compilers. In contrast, the authors avoid features from the C++11 standard and the Boost C++ library—such as smart pointers and a proper fixed-size array—which would have been useful in the current context to avoid the inefficiencies in using std::vector.
Finally, scientific or numerical libraries fall outside the book's scope, although using such libraries benefits the scalability of the code design. After all, each module that you don't maintain is one less module to debug. Libraries are also more likely to have been optimized for runtime performance.
This book makes a good case for the usefulness of design patterns and object-oriented programming for maintainable code, but disregards runtime performance and scientific libraries. Still, it's one of those books that I wish I'd read earlier in my programming career. I found many design patterns familiar simply because I'd seen them before in my own code. I'll likely turn to this book in the future whenever I suspect a program design problem might be solved already.
Ramses van Zon is an application analyst and HPC computational science specialist at the University of Toronto's SciNet High Performance Computing Consortium. His research interests include non-equilibrium statistical physics and molecular dynamics simulations. van Zon has a PhD in theoretical physics from Utrecht University in The Netherlands. Contact him at firstname.lastname@example.org.
Algorithm Primer for Nanoscience
Kálmán Varga and Joseph A. Driscoll, Computational Nanoscience, Applications for Molecules, Clusters, and Solids, Cambridge Univ. Press, 2011, ISBN: 978-1107001701, 444 pp.
Chemistry, solid-state physics, biology, and materials science are converging at the nanoscale. High-powered instruments to visualize and probe at atomic-length scales have become ubiquitous in these disciplines, as have the theoretical tools necessary to understand and explain the resulting physical structures and phenomena. Indeed, accurate computations and theoretical analysis are often necessary partners with experiment to assess the importance of the quantum manifestations at Ångström- and nanometer-length scales.
This book by Kálmán Varga and Joseph A. Driscoll is a welcome educational venture into this important and fast-growing discipline. Because such a moving target is too complex and rapidly expanding to completely cover, the authors have narrowed their focus (somewhat) by restricting their aim "to provide a comprehensive program library and a description of advanced algorithms to help students and researchers learn novel methods and develop their own approaches."
Part I of the book deals with 1D problems. The pedagogy is appropriate and the algorithm descriptions are well presented. The material can be covered quickly and might well suit a one-semester advanced undergraduate class, where the instructor could fill in some of the motivation and provide context before moving on to selected examples from the more advanced 3D topics in Part II of the book.
In Part II, the choice is rich, with topics including Monte Carlo, molecular dynamics, (time-dependent) density functional strategies, electron transport across single molecules, and the manipulation of atomic-orbital and plane-wave basis states for electronic structure calculations. The wide range and large number of advanced topics in Part II lets instructors select topics to supplement the teaching of a first-year graduate course in, say, solid-state physics, or even a multidisciplinary course in computational nanoscience.
The Niche, the Missing, the Promise
As a student many years ago, I would have welcomed such a book when taking the canonical "Numerical Methods Course for Scientists" or the mathematical physics class. Nowadays, there are well-documented subroutine libraries for most of the numerical methods, but the driver or main program for specific applications are still needed. There are also a growing number of complete codes available as open source maintained by dedicated groups of users. A recent publication, for example, listed 24 such codes. 1
These codes are frequently complex and might be impenetrable to the novice, so it's noteworthy that the codes presented online for this book have been purposely simplified to increase their pedagogical value. If you're teaching or taking a graduate-level class or seminar series with relatively disparate topics, you'll greatly appreciate access to a collection of relevant codes and examples such as those given here. A well-written algorithm can be like a classic novel in revealing how components interact, the consequences of various actions, and the denouement.
Although I'm enthusiastic about the concepts that motivated this book and the selected topics, I'm a bit disappointed that one or two of my favorite topics didn't receive adequate exposition. For example, the concept of the density of states (DOS) is quite basic, but the 1D example given is nearly useless, as the points in the energy bands with zero slope give an infinite density of states. The resulting plot of DOS(E) with vertical lines at those values of E and essentially zero elsewhere is uninformative—though, admittedly, it gives the instructor an opportunity to drive home the concept. In 3D, the tetrahedron method to evaluate the electronic DOS is elegant and provides a lesson in numerical simplicity. Too bad it wasn't included.
Ieagerly give this book a passing grade for entering a brave new world, but I also look forward to a revised and updated second edition that has been polished by classroom experience and user feedback. Indeed, this book and associated software could (and should) join the vanguard of new instructional methodology as the Internet allows faster feedback, and the instructional content and value can be improved expeditiously.
Bruce Harmon is a distinguished professor in the Department of Physics and Astronomy at Iowa State University and a senior scientist at the US Department of Energy's Ames Laboratory. His research interests include nanoscale magnetism, x-ray magnetic dichroism, phonons, and computational materials discovery. Harmon has a PhD in physics from Northwestern University. Contact him at email@example.com.