This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Finite State Machine-Based Optimization of Data Parallel Regular Domain Problems Applied in Low-Level Image Processing
October 2004 (vol. 15 no. 10)
pp. 865-877

Abstract—A popular approach to providing nonexperts in parallel computing with an easy-to-use programming model is to design a software library consisting of a set of preparallelized routines, and hide the intricacies of parallelization behind the library's API. However, for regular domain problems (such as simple matrix manipulations or low-level image processing applications—in which all elements in a regular subset of a dense data field are accessed in turn) speedup obtained with many such library-based parallelization tools is often suboptimal. This is because interoperation optimization (or: time-optimization of communication steps across library calls) is generally not incorporated in the library implementations. This paper presents a simple, efficient, finite state machine-based approach for communication minimization of library-based data parallel regular domain problems. In the approach, referred to as lazy parallelization, a sequential program is parallelized automatically at runtime by inserting communication primitives and memory management operations whenever necessary. Apart from being simple and cheap, lazy parallelization guarantees to generate legal, correct, and efficient parallel programs at all times. The effectiveness of the approach is demonstrated by analyzing the performance characteristics of two typical regular domain problems obtained from the field of low-level image processing. Experimental results show significant performance improvements over nonoptimized parallel applications. Moreover, obtained communication behavior is found to be optimal with respect to the abstraction level of message passing programs.

[1] A.D. Bagdanov and M. Worring, Multi-Scale Document Description Using Rectangular Granulometries Document Analysis Systems V, LNCS 2423, pp. 445-456, Aug. 2002.
[2] H.E. Bal et al., The Distributed ASCI Supercomputer Project Operating Systems Rev., vol. 34, no. 4, pp. 76-96, Oct. 2000.
[3] G. Baumgartner et al., A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry Proc. 2002 ACM/IEEE Conf. Supercomputing, pp. 1-10, Nov. 2002.
[4] S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Teng, Generating Local Addresses and Communication Sets for Data Parallel Programs J. Parallel and Distributed Computing, vol. 26, no. 1, pp. 72-84, Apr. 1995.
[5] J.M. Constantin, M.W. Berry, and B.T. Vander Zanden, Parallelization of the Hoshen-Kopelman Algorithm Using a Finite State Machine Int'l J. Supercomputer Applications and High Performance Computing, vol. 11, no. 1, pp. 31-45, Spring 1997.
[6] A. Darte, D. Chavarría-Miranda, R. Fowler, and J. Mellor-Crummey, Generalized Multipartitioning for Multi-Dimensional Arrays Proc. 16th Int'l Parallel and Distributed Processing Symp., Apr. 2002.
[7] M. Frigo and S.G. Johnson, “FFTW: An Adaptive Software Architecture for the FFT,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 3, p. 1381, 1998.
[8] J.M. Geusebroek, A.W.M. Smeulders, and H. Geerts, A Minimum Cost Approach for Segmenting Networks of Lines Int'l J. Computer Vision, vol. 43, no. 2, pp. 99-111, July 2001.
[9] J.E. Hopcroft, R. Motwani, and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, second ed. Addison Wesley, 2000.
[10] L.H. Jamieson, E.J. Delp, C.-C. Wang, J. Li, and F.J. Weil, A Software Environment for Parallel Computer Vision Computer, vol. 25, no. 2, pp. 73-75, Feb. 1992.
[11] Z. Juhasz and D. Crookes, A PVM Implementation of a Portable Parallel Image Processing Library Proc. EuroPVM '96, pp. 188-196, Oct. 1996.
[12] K. Kennedy et al., Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries J. Parallel and Distributed Computing, vol. 61, pp. 1803-1826, 2001.
[13] D. Koelma, P.P. Jonker, and H.J. Sips, A Software Architecture for Application Driven High Performance Image Processing Parallel and Distributed Methods for Image Processing, Proc. SPIE, vol. 3166, pp. 340-351, July 1997.
[14] C. Lee and M. Hamdi, Parallel Image Processing Applications on a Network of Workstations Parallel Computing, vol. 21, no. 1, pp. 137-160, Jan. 1995.
[15] C. Lee, Y.-F. Wang, and T. Yang, Global Optimization for Mapping Parallel Image Processing Tasks on Distributed Memory Machines J. Parallel and Distributed Computing, vol. 45, no. 1, pp. 29-45, Aug. 1997.
[16] P. Maurer, Logic Simulation Using Networks of State Machines Proc. Design, Automation and Test in Europe Conf. 2000 (DATE 2000), pp. 674-678, Mar. 2000.
[17] MPI: A Message-Passing Interface Standard (version 1.1) Message Passing Interface Forum, technical report, Univ. of Tennessee, Knoxville, Tenn.,http:/www.mpi-forum.org, June 1995.
[18] D. Milicev and Z. Jovanovic, A Finite State Machine-Based Formal Model of Software Pipelined Loops with Conditions Int'l J. Computer Research, vol. 10, no. 1, pp. 11-20, 2001.
[19] P.J. Morrow, D. Crookes, J. Brown, G. McAleese, D. Roantree, and I. Spence, Efficient Implementation of a Portable Parallel Programming Model for Image Processing Concurrency: Practice and Experience, vol. 11, pp. 671-685, Sept. 1999.
[20] C. Nicolescu and P. Jonker, EASY-PIPE An Easy to Use Parallel Image Processing Environment Based on Algorithmic Skeletons Proc. 15th Int'l Parallel and Distributed Processing Symp., Apr. 2001.
[21] C. Nicolescu and P. Jonker, A Data and Task Parallel Image Processing Environment Parallel Computing, vol. 28, nos. 7-8, pp. 945-965, Aug. 2002.
[22] M. Prieto, I.M. Llorente, and F. Tirado, Data Locality Exploitation in the Decomposition of Regular Domain Problems IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 11, pp. 1141-1149, Nov. 2000.
[23] M. Püschel, B. Singer, M. Veloso, and J. Moura, Fast Automatic Generation of DSP Algorithms Proc. Int'l Conf. Computational Science, pp. 97-106, 2001.
[24] C. van Reeuwijk, A.J.C. van Gemund, and H.J. Sips, Spar: A Programming Language for Semi-Automatic Compilation of Parallel Programs Concurrency: Practice and Experience, vol. 9, no. 11, pp. 1193-1205, Nov. 1997.
[25] F.J. Seinstra, User Transparent Parallel Image Processing PhD thesis, Intelligent Sensory Information Systems, Faculty of Science, Univ. of Amsterdam, The Netherlands, May 2003.
[26] F.J. Seinstra and D. Koelma, P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 7, pp. 758-768, July 2002.
[27] F.J. Seinstra and D. Koelma, User Transparency: A Fully Sequential Programming Model for Efficient Data Parallel Image Processing Concurrency and Computation: Practice and Experience, vol. 16, no. 6, pp. 611-644, May 2004.
[28] F.J. Seinstra, D. Koelma, and A.D. Bagdanov, On the Correctness of Lazy Parallelization Technical Report Series, vol. 2004-01, Intelligent Sensory Information Systems, Faculty of Science, Univ. of Amsterdam, The Netherlands, Mar. 2004.
[29] F.J. Seinstra, D. Koelma, and J.M. Geusebroek, A Software Architecture for User Transparent Parallel Image Processing Parallel Computing, vol. 28, nos. 7-8, pp. 967-993, Aug. 2002.
[30] B. Singer and M. Veloso, Learning to Construct Fast Signal Processing Implementations J. Machine Learning Research, vol. 3, pp. 887-919, Dec. 2002.
[31] C. Soviany, Embedding Data and Task Parallelism in Image Processing Applications PhD thesis, Delft Univ. of Technology, The Netherlands, May 2003.
[32] J.M. Squyres, A. Lumsdaine, and R.L. Stevenson, A Toolkit for Parallel Image Processing Parallel and Distributed Methods for Image Processing II, Proc. SPIE, vol. 3452, July 1998.
[33] R. Taniguchi et al., Software Platform for Parallel Image Processing and Computer Vision Parallel and Distributed Methods for Image Processing, Proc. SPIE, vol. 3166, pp. 2-10, July 1997.
[34] J.A. Webb, Implementation and Performance of Fast Parallel Multi-Baseline Stereo Vision Proc. 1993 DARPA Image Understanding Workshop, pp. 1005-1010, Apr. 1993.
[35] R.C. Whaley, A. Petitet, and J.J. Dongarra, Automated Empirical Optimization of Software and the ATLAS Project Parallel Computing, vol. 27, nos. 1-2, pp. 3-25, Jan. 2001.

Index Terms:
Parallel processing, data communications aspects, optimization, image processing software.
Citation:
Frank J. Seinstra, Dennis Koelma, Andrew D. Bagdanov, "Finite State Machine-Based Optimization of Data Parallel Regular Domain Problems Applied in Low-Level Image Processing," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 10, pp. 865-877, Oct. 2004, doi:10.1109/TPDS.2004.55
Usage of this product signifies your acceptance of the Terms of Use.