Pages: p. 4
Justin L. Tripp, Maya B. Gokhale, and Kristopher D. Peterson
In its traditional form, reconfigurable supercomputing uses field-programmable gate arrays to augment high-performance microprocessors in clusters, often involving FPGAs with millions of system gates, as well as dedicated arithmetic units and megabits of on-chip memory. More recently, approaches based on reconfigurable logic have succeeded in including floating-point tasks and have realized several floating-point libraries, computational kernels, and applications in FPGAs.
Trident, the recipient of a 2006 R&D 100 award for innovative technology, synthesizes circuits from a high-level language. It provides an open framework for exploring algorithmic C computation on FPGAs by mapping the C program's floating-point operations to hardware floating-point modules and automatically allocating floating-point arrays to off-chip memory banks using four schedulers and a loop pipelining scheme.
Nicholas Moore, Albert Conti, Miriam Leeser, and Laurie Smith King
Supercomputing architectures vary in the level of programming support they offer, but in most cases they need code particular to the targeted architecture and field-programmable gate array hardware, both for processing data and passing data between the application and the FPGA, and such code is intertwined with application code.
Reconfigurable supercomputing is a volatile field, with vendors rapidly introducing new architectures and retiring previous ones. Consequently, applications with hardware-specific FPGA optimizations embedded in the code are not portable across different reconfigurable computing architectures. Vforce is not specific to FPGAs and can be used to support many different types of special-purpose processors, including graphics processing units, digital signal processors, and IBM's Cell processor.
Martin C. Herbordt, Tom VanCourt, Yongfeng Gu, Bharat Sukhwani, Al Conti, Josh Model, and Doug DiSabello
Accelerating high-performance computing applications with field-programmable gate arrays can potentially deliver enormous performance with both parallelism and payload delivered per operation. At the same time, using FPGAs presents significant challenges, including low operating frequency—an FPGA clocks at one-tenth the frequency of a high-end microprocessor.
Achieving significant speedups on a new architecture without expending exorbitant development effort—while retaining flexibility, portability, and maintainability—is a classic problem. Researchers have addressed this problem periodically over the past 30 years and generally agree that compromises are required: Either restrict the variety of architectures or scope of application or bound expectations of performance or ease of implementation.
Gerald R. Morris and Viktor K. Prasanna
Researchers at the Engineer Research and Development Center and the University of Southern California are focusing on algorithms and architectures to facilitate high-performance, reconfigurable computer-based scientific computing. Examples of this research include IEEE-Std-754 floating-point units, molecular dynamics kernels, linear-algebra routines, and sparse-matrix solvers.
Reconfigurable computers that combine GPPs with FPGAs are now available. The FPGAs can be configured to become, in effect, application-specific coprocessors. Additionally, developers can use HLL-to-HDL compilers to program RCs using traditional HLLs. The authors' FPGA-augmented designs achieved more than a twofold wall-clock runtime speedup over software. Given that the software-only and FPGA-augmented versions use the same off-the-shelf code and algorithm, are compiled with the same compiler, run on the same platform, and use the same input sets, the comparisons accurately indicate the improvements attributable to FPGA-based acceleration.
Sadaf R. Alam, Pratul K. Agarwal, Melissa C. Smith, Jeffrey S. Vetter, and David Caliga
Hardware description languages' idiosyncrasies and limited support for floating-point operations hamper scientific application developers' ability to port and optimize their codes for these devices. Furthermore, HDL programming methodologies aimed at chip design aren't suitable for programming large-scale scientific applications. With high-level languages, reconfigurable systems can achieve application speedup—allowing scientific code developers to harness the power of FPGA devices without becoming HDL experts.
The authors used HLLs to conduct an analysis and FPGA implementation of the particle-mesh Ewald method, a biomolecular algorithm that is part of Amber, a widely used molecular dynamics framework. Amber provides a collection of system preparation, simulation, and analysis packages that biomolecular scientists can use in simulations to conduct computational experiments studying the dynamics of large macromolecules, including biological systems such as proteins, nucleic acids, and membranes.