Testing Scientific Programs
Paul F. Dubois, Lawrence Livermore National Laboratory

SEP 12, 2012 14:27 PM
A+ A A-
July/August 2012 (Vol. 14, No. 4) pp. 69-73
1521-9615/12/$31.00 © 2012 IEEE

Published by the IEEE Computer Society
Testing Scientific Programs
Paul F. Dubois
  Article Contents  
  Automated Testing System  
  Detailed Capabilities  
  A Basic Example  
  Download Citation  

The Automated Testing System (ATS) is an open source, Python-based tool for automating the testing of applications, especially scientific simulations. It's especially designed to support the work of a team of subject-matter experts.

The Automated Testing System (ATS) is an open source, Python-based tool for automating the testing of applications, especially scientific simulations written by teams of subject-matter experts.

Scientific simulations are usually built by teams of application developers. These teams are made up of two to 20 people, with a range of subject specialties. The larger applications are invariably an integration of contributions by a set of subject-matter experts (SMEs) with some computer science support. The integration of these originally separate areas of expertise into a full-scale model is a challenge both technically and socially.
After I joined the Lawrence Livermore National Laboratory (LLNL) in 1976, I was contributing mathematical improvements to a variety of these teams. For each code, as we called these simulations, I had to learn how to run a test to make sure the code was still getting the "right" answer. These teams often had just one or two short-running test problems for testing the simulation's integrity as a whole. The SMEs had their own tests that they would run when modifying their own work, but other team members didn't know how to run these and would only seek help to run them if they had a reason to suspect that their work would interact with someone else's package.
Because of the SME-oriented structure, teams didn't have many integrated tests of their entire simulation. I worked with many teams that had just one or two test problems that I was to run before contributing changes. Inevitably, this, along with poor source-code management practices, led either to freewheeling chaos or to a highly structured approach that lowered productivity. In both cases, morale would be lowered and evasive behaviors induced.
At the same time, there wasn't enough small-scale testing within the packages, because of the effort it would take to run and examine the results of a large number of tests. This problem also slowed development of the smaller one- or two-person simulations out of which the large ones would later grow.
My experiences taught me that scientific applications are biomechanical—and the "bio" part feels pain and tries to avoid it. Some developers would protect themselves by isolating themselves, not contributing their changes for six months or more at a time, which then became a difficult process. Others would waste a great deal of time believing physics models needed new terms or better algorithms only to find small bugs or incorrectly computed terms that only came into significance in certain kinds of problems or interactions with other packages.
As a result of these experiences, I decided to design a testing system that would deal with the sociological issues as well as the technical ones. This article describes that system—the Automated Testing System (ATS)—and its biomechanical orientation.
Automated Testing System
ATS is an open source, Python-based tool for automating the testing of applications. It's especially designed to test scientific simulations, although it can be used for any program that can be run with a command-line, doesn't require interaction, and which can signal its own success or failure via its exit status. ATS is able to run tests in parallel and utilize available resources efficiently.
The tool runs anywhere Python can run—that is, nearly anywhere. In particular, it runs out of the box on Linux, Mac, and Windows.
ATS is being used in multiple large applications at LLNL and is achieving its goals. Using the extensive provisions for user customization, the users are exercising their creativity to make the program do things unimagined by its author, and we feel confident enough to suggest it for more widespread use.
Deciding to use a product such as ATS requires that you're confident you won't end up "stuck" if you get a new machine that we don't have. At LLNL, we're pretty confident that won't happen; you'll be able to take care of your own needs. I'll explain how that portability works in a bit.
ATS and its documentation are available at http://code.google.com/p/ats. The documentation includes a tutorial about a mythical code named Andyroid. Andyroid is biomechanical, of course! I recommend the tutorial for a more thorough introduction than I can offer here.
Ideally we need many, many tests—from unit tests of small functions or designed-by-contract classes and clusters, up to integrated simulations. We need a variety of tests that are going to cover the different packages sufficiently while not inhibiting the experts from running them in more complicated ways during development.
We need automation that can run these tests and collect their results, using whatever hardware resources we have available, so that the entire collection runs in a reasonable time—which in my experience means no more than a long lunchtime for programming that's going to be integrated to the main source line.
Larger, longer-running tests might be wanted for major releases or nightly checks, and fewer for other quick checks. We don't want a maintenance hassle with different test suites for each of these purposes. Rather, we want a single test suite that we filter for multiple purposes.
However, we don't want centralization. We want to let the team work at full speed without having to wait for each other any more than is necessary to ensure integrity.
Testing must be supported by good practices in source control, commit and release policies, and team management, of course; but the testing system must not reduce productivity. Our "bio" components will use the testing system properly only if it results in a better day at the office in the long run.
Here's how ATS meets these requirements:

• There's no required central database of tests to run—tests can be spread over many directories, and usually adding a test in a subdirectory is entirely a local operation (this avoids source-code-commit collisions).

• A test input file could contain within itself, in comments, directions for how to run itself in one or more ways. A subject-matter expert can still run these tests normally using his own command-line input; but when ATS runs them, it runs the application according to the special comments the expert has added to the application input. We call this introspection. The user can control the commenting convention.

• The execution of the tests can occur over many processors and hosts, in parallel (provisions are available to avoid resource conflicts).

• Filtering allows one test suite to be used for differing levels of required quality assurance, such as quick, integration-to-main, nightly, weekly, and release suites.

• Support for batch systems is provided.

These features support the execution of large test suites that thoroughly test all parts of an application that's built by a team of SMEs. These experts are able to run tests of their own design, which at the same time can be run by other members of the team, and which can be added or modified without collisions in the source-code control system.
By making full use of resources and appropriate filtering, the time required for testing is kept small enough to keep developers productive.
A typical code team will usually have a set of tests that are required to be run before committing a change to the main branch of the source-control system. More thorough test suites can be run nightly and weekly with still more for a major release. This system maintains trust levels between developers by reducing the chances that one developer's mistakes get pulled down into another developer's area, causing failures that weren't there the day before.
On the other end of the spectrum, a single developer or two can automate testing that they might otherwise be reluctant to do regularly. In any case, the key is to have the developers eager to do proper testing because they find, at the end of the day, that it makes their lives better. If every bug fix must be accompanied by a new test that checks for that bug, it does wonders for quality control. In fact, many developers beginning a new effort will write the tests first and enjoy the progress shown by ever-decreasing numbers of failures as the implementation is completed.
Detailed Capabilities
When testing software, we've found that these features must be supported:

• A test might depend on another test, and won't execute unless the parent test succeeds.

• A test might be constructed that's expected to fail.

• Tests can be grouped and treated as a unit, such as a run of a generator, the main code, and a postprocessor.

• Tests can be filtered out (that is, not executed) in many ways. These can include the number of processors, time limit, platforms, or other user-defined criteria. In ATS, a built-in filter called a level can be used to easily stratify a test suite into subsets of increasing thoroughness.

• The results of the testing must be easy to postprocess by registering routines to be executed after the tests have completed.

• The test specification language must be easy to learn. While ATS input can be written using the full power of Python, the basic operations require only a few statements written in a special vocabulary that isn't hard to learn.

• It's extremely useful to be able to rerun only those tests that failed or that might have contributed to the failure (such as a parent that might have written bad data for the child to use).

• Users must be able to avoid resource conflicts and assist in the scheduling of jobs to improve throughput. For example, two tests that write a file with the same name in the same place would interfere with one another, while another test could run in the same directory without worry. The test system must be able to be instructed to avoid the former but allow the latter.

When choosing which test to start next, ATS gives increased priority to tests that use a lot of processors or which must be completed before other tests can start, or which have been given enhanced priority by the user. Complete logging with useful summaries, along with preservation of appropriate output, is a must.
A Basic Example
While referring you to the tutorial for a thorough example, here's a basic example of ATS input. For clarity we assume specification of the path to an executable using a command-line argument. Various ways to specify executables are discussed in the tutorial (at http://code.google.com/p/ats).
test(clas="-input mydeck delta=3",np=32, label="mydeck: 3")

defines a test that executes our principal executable with the given command-line arguments ( clas), launching the job in parallel on 32 processors.
To expand this example to include a postprocessing program that will run if the test succeeds, we could make this simple modification:
maintest = test(clas="-input mydeck delta=3", np=32, label="mydeck: 3")testif (maintest,   executable=

            post, label="mydeck: 3 post")

Here, post is a variable we have previously set with the desired path to the postprocessor.
The only other command that's used frequently is the source function, which is basically an include with the introspection feature supported. For example,

would process the indicated file as if it was included in the main input file, the latter being typically given as an argument on the ATS command line. Using the source command lets us distribute the tests over many directories.
The user can give each test a set of name/value option pairs and use these in filtering:
test(clas="-input mydeck delta=3",np=32, label="mydeck: 3", deltatest = True)

Using a filter of deltatest is True would select such tests for execution and skip the tests lacking such an option.
Parameter sweeps or exploring different input combinations for a given test are easy to set up using the introspective feature.
The user can port ATS to a new machine to take advantage of that machine's features, such as multiple processors or front and backend processors, by creating a machine definition for it. This usually isn't difficult because the developer can define most of the new machine via inheritance, and use all of the LLNL machine definitions as examples.
A user specifies a particular machine module by setting an environment variable. At startup, ATS searches the installed add-ons for that particular value, and alters the behavior of ATS when running tests according to this new "machine" module. The machine module can add command-line options to ATS and interpret the actual values supplied. Wrapper scripts can also be used to set things up properly for a given scenario.
Usually the porting process is just a matter of doing accounting on the resources in use and directing proposed tests to these resources when they're available, using the correct command to make the run using the desired resources.
The machine module that LLNL wrote to use multiple nodes of a particular parallel machine totaled 219 lines of code, including blank and comment lines. Most of the methods replaced by inheritance do one simple job, such as those shown in Figure 1 .

Figure 1. Some of the coding in a machine module for a parallel processor. Another method does the actual launch using the particular OS commands.

The parser object is an instance of a standard Python class used for parsing command-line arguments. It's completely documented in the Python standard library. This is typical of the way we leverage the Python world. For example, you might use standard library modules to make ATS display webpages showing current progress. The job scheduler is also replaceable in a similar manner.
We were able to use this same machine file for seven different machines with differing numbers of processors and features by using a form of introspection. This and other LLNL machines are defined in the source files that you can download at http://code.google.com/p/ats, in the subdirectory LC (which stands for Livermore Computing).
ATS is released under a Berkeley Software Distribution (BSD) license. T.J. Alumbaugh, Nu Ai Tang, and Ines Heinz were the Lawrence Livermore National Laboratory (LLNL) support team for this work. Nu Ai Tang (tang10@llnl.gov) is the expert on porting to new machines. I'd also like to especially thank users Burl Hall, Brian McCandless, and Mike Owen for their ideas and support.
This work was performed under the auspices of the US Department of Energy by LLNL under Contract DE-AC52-07NA27344.
Paul F. Dubois retired from full-time work at LLNL in 2005, and currently works there part-time. See pfdubois.com for a description of his career. He promises once again that this is his last article. From 1993–2006 he edited the Scientific Programming department for CiSE and its predecessor magazine, Computers in Physics. Contact him at pfdubois@gmail.com.


[%= name %]
[%= createDate %]
[%= comment %]
Share this:
Please login to enter a comment:

Computing Now Blogs
Business Intelligence
by Keith Peterson
Cloud Computing
A Cloud Blog: by Irena Bojanova
The Clear Cloud: by STC Cloud Computing
Computing Careers: by Lori Cameron
Display Technologies
Enterprise Solutions
Enterprise Thinking: by Josh Greenbaum
Healthcare Technologies
The Doctor Is In: Dr. Keith W. Vrbicky
Heterogeneous Systems
Hot Topics
NealNotes: by Neal Leavitt
Industry Trends
The Robotics Report: by Jeff Debrosse
Internet Of Things
Sensing IoT: by Irena Bojanova