This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Overcoming Communication Latency Barriers in Massively Parallel Scientific Computation
May/June 2011 (vol. 31 no. 3)
pp. 8-19
Ron O. Dror, D.E. Shaw Research
J.P. Grossman, D.E. Shaw Research
Kenneth M. Mackenzie, D.E. Shaw Research
Brian Towles, D.E. Shaw Research
Edmond Chow, D.E. Shaw Research
John K. Salmon, D.E. Shaw Research
Cliff Young, D.E. Shaw Research
Joseph A. Bank, D.E. Shaw Research
Brannon Batson, D.E. Shaw Research
Martin M. Deneroff, D.E. Shaw Research
Jeffrey S. Kuskin, D.E. Shaw Research
Richard H. Larson, D.E. Shaw Research
Mark A. Moraes, D.E. Shaw Research
David E. Shaw, D.E. Shaw Research

Anton, a massively parallel special-purpose machine that accelerates molecular dynamics simulations by orders of magnitude, uses a combination of specialized hardware mechanisms and restructured software algorithms to reduce and hide communication latency. Anton delivers end-to-end internode latency significantly lower than any other large-scale parallel machine, and its critical-path communication time for molecular dynamics simulations is less than 3 percent that of the next-fastest platform.

1. A. Bhatelé et al., "Overcoming Scaling Challenges in Biomolecular Simulations Across Multiple Platforms," Proc. IEEE Int'l Symp. Parallel and Distributed Processing, IEEE Press, 2008, doi:10.1109/IPDPS.2008.4536317.
2. K.J. Bowers et al., "Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters," Proc. ACM/IEEE Conf. Supercomputing (SC 06), IEEE, 2006, doi:10.1145/1188455.1188544.
3. B.G. Fitch et al., "Blue Matter: Approaching the Limits of Concurrency for Classical Molecular Dynamics," Proc. 2006 ACM/IEEE Conf. Supercomputing (SC 06), ACM Press, 2006, doi:10.1145/1188455.1188547.
4. T. Narumi et al., "A 55 TFLOPS Simulation of Amyloid-Forming Peptides from Yeast Prion Sup35 with the Special-Purpose Computer System MDGRAPE-3," Proc. ACM/IEEE Conf. Supercomputing (SC 06), ACM Press, 2006, doi:10.1145/1188455.1188506.
5. J.C. Phillips, J.E. Stone, and K. Schulten, "Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters," Proc. ACM/IEEE Conf. Supercomputing (SC 08), IEEE Press, 2008, no. 8.
6. D.E. Shaw et al., "Millisecond-Scale Molecular Dynamics Simulations on Anton," Proc. Conf. High Performance Computing Networking, Storage and Analysis, ACM Press, 2009, doi:10.1145/1654059.1654099.
7. R.O. Dror et al., "Exploiting 162-Nanosecond End-to-End Communication Latency on Anton," Proc. 2010 ACM/IEEE Int'l Conf. High Performance Computing, Networking, Storage and Analysis, IEEE CS Press, 2010, doi:10.1109/SC.2010.23.
8. E. Chow et al., Desmond Performance on a Cluster of Multicore Processors, tech. report DESRES/TR-2008-01, D.E. Shaw Research, 2008.
9. D. Kerbyson et al., "Performance Evaluation of an EV7 AlphaServer Machine," Int'l J. High Performance Computing Applications, vol. 18, no. 2, 2004, pp. 199-209.
10. R. Fatoohi, S. Saini, and R. Ciotti, "Interconnect Performance Evaluation of SGI Altix 3700 BX2, Cray X1, Cray Opteron Cluster, and Dell PowerEdge," Proc. 20th Int'l Parallel and Distributed Processing Symp. (IPDPS 06), IEEE Press, 2006, doi:10.1109/SC.2005.11.
11. J. Beecroft et al., "QsNetII: Defining High-Performance Network Design," IEEE Micro, vol. 25, no. 4, 2005, pp. 34-47.
12. R. Biswas et al., "An Application-Based Performance Characterization of the Columbia Supercomputer," Proc. ACM/IEEE Conf. Supercomputing (SC 05), IEEE CS Press, 2005, doi:10.1109/SC.2005.11.
13. M.D. Noakes, D.A. Wallach, and W.J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," ACM SIGARCH Computer Architecture News, vol. 21, no. 2, 1993, pp. 224-235.
14. S. Kumar et al., "The Deep Computing Messaging Framework: Generalized Scalable Message Passing on the Blue Gene/P Supercomputer," Proc. 22nd Ann. Int'l Conf. Supercomputing (ICS 08), ACM Press, 2008, pp. 94-103.
15. K.J. Barker et al., "Entering the Petaflop Era: The Architecture and Performance of Roadrunner," Proc. ACM/IEEE Conf. Supercomputing (SC 08), IEEE Press, 2008, no. 1.
16. S.L. Scott, "Synchronization and Communication in the T3E Multiprocessor," Proc. 7th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM Press, 1996, pp. 26-36.
17. A. Hoisie et al., "A Performance Comparison through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple," Proc. 2006 ACM/IEEE Conf. Supercomputing (SC 06), ACM Press, 2006, doi:10.1145/1188455.1188534.
18. S. Plimpton, "Fast Parallel Algorithms for Short-Range Molecular Dynamics," J. Computational Physics, vol. 117, no. 1, 1995, pp. 1-19.
19. G. Almási et al., "Optimization of MPI Collective Communication on BlueGene/L Systems," Proc. 19th Ann. Int'l Conf. Supercomputing (ICS 05), ACM Press, 2005, pp. 253-262.
1. S.L. Scott, "Synchronization and Communication in the T3E Multiprocessor," Proc. 7th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM Press, 1996, pp. 26-36.
2. J. Beecroft et al., "QsNetII: Defining High-Performance Network Design," IEEE Micro, vol. 25, no. 4, 2005, pp. 34-47.
3. P.A. Boyle et al., "QCDOC: A 10 Teraflops Computer for Tightly-Coupled Calculations," Proc. ACM/IEEE Conf. Supercomputing (SC 04), IEEE CS Press, 2004, doi:10.1109/SC.2004.46.
4. D. Kerbyson et al., "Performance Evaluation of an EV7 AlphaServer Machine," Int'l J. High Performance Computing Applications, vol. 18, no. 2, 2004, pp. 199-209.
5. M. Blocksome et al., "Design and Implementation of a One-Sided Communication Interface for the IBM eServer Blue Gene Supercomputer," Proc. ACM/IEEE Conf. Supercomputing (SC 06), ACM Press, 2006, doi:10.1145/1188455.1188580.
6. M.D. Noakes, D.A. Wallach, and W.J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," ACM SIGARCH Computer Architecture News, vol. 21, no. 2, 1993, pp. 224-235.
7. G. Buzzard et al., "An Implementation of the Hamlyn Sender-Managed Interface Architecture," Proc. 2nd USENIX Symp. Operating Systems Design and Implementation (OSDI 96), ACM Press, 1996, pp. 245-259.
8. G. Almási et al., "Optimization of MPI Collective Communication on BlueGene/L Systems," Proc. 19th Ann. Int'l Conf. Supercomputing (ICS 05), ACM Press, 2005, pp. 253-262.
9. C. Sosa and G. Lakner, "IBM System Blue Gene Solution: Blue Gene/P Application Development," IBM Redbook, SG24-7287, 2008, www.redbooks.ibm.com/redbooks/pdfssg247287.pdf .

Index Terms:
Data communications, interprocessor communications, multiprocessor systems, network communication, parallel systems, special-purpose hardware, Anton
Citation:
Ron O. Dror, J.P. Grossman, Kenneth M. Mackenzie, Brian Towles, Edmond Chow, John K. Salmon, Cliff Young, Joseph A. Bank, Brannon Batson, Martin M. Deneroff, Jeffrey S. Kuskin, Richard H. Larson, Mark A. Moraes, David E. Shaw, "Overcoming Communication Latency Barriers in Massively Parallel Scientific Computation," IEEE Micro, vol. 31, no. 3, pp. 8-19, May-June 2011, doi:10.1109/MM.2011.38
Usage of this product signifies your acceptance of the Terms of Use.