This Article 
 Bibliographic References 
 Add to: 
A Comparison of FPGA and GPU for Real-Time Phase-Based Optical Flow, Stereo, and Local Image Features
July 2012 (vol. 61 no. 7)
pp. 999-1012
Karl Pauwels, K.U.Leuven, Leuven
Matteo Tomasi, University of Granada, Granada
Javier Díaz, University of Granada, Granada
Eduardo Ros, University of Granada, Granada
Marc M. Van Hulle, K.U.Leuven, Leuven
Low-level computer vision algorithms have extreme computational requirements. In this work, we compare two real-time architectures developed using FPGA and GPU devices for the computation of phase-based optical flow, stereo, and local image features (energy, orientation, and phase). The presented approach requires a massive degree of parallelism to achieve real-time performance and allows us to compare FPGA and GPU design strategies and trade-offs in a much more complex scenario than previous contributions. Based on this analysis, we provide suggestions to real-time system designers for selecting the most suitable technology, and for optimizing system development on this platform, for a number of diverse applications.

[1] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[2] S. Sabatini, G. Gastaldi, F. Solari, K. Pauwels, M. Van Hulle, J. Díaz, E. Ros, N. Pugeault, and N. Krüger, “A Compact Harmonic Code for Early Vision Based on Anisotropic Frequency Channels,” Computer Vision and Image Understanding, vol. 114, no. 6, pp. 681-699, 2010.
[3] K. Pauwels, N. Krüger, M. Lappe, F. Wörgötter, and M.M. Van Hulle, “A Cortical Architecture on Parallel Hardware for Motion Processing in Real Time,” J. Vision, vol. 10, no. 10, p. article 18, 2010. 1018.abstract.
[4] R. Nelson and J. Aloimonos, “Obstacle Avoidance Using Flow Field Divergence,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 10, pp. 1102-1106, Oct. 1989.
[5] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust Object Recognition with Cortex-Like Mechanisms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[6] T. Corpetti, E. Memin, and P. Perez, “Dense Estimation of Fluid Flows,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 365-380, Mar. 2002.
[7] G. Papadopoulos, A. Briassouli, V. Mezaris, I. Kompatsiaris, and M. Strintzis, “Statistical Motion Information Extraction and Representation for Semantic Video Analysis,” IEEE Trans. Circuits and Systems for Video Technology, vol. 19, no. 10, pp. 1513-1528, Oct. 2009.
[8] P. Kovesi, “Phase Preserving Denoising of Images,” Proc. Fifth Int'l/Nat'l Biennial Conf. Digital Image Computing, Techniques, and Applications (DICTA '99), pp. 212-217, 1999.
[9] K. Pauwels and M. Van Hulle, “Optic Flow from Unstable Sequences through Local Velocity Constancy Maximization,” Image and Vision Computing, vol. 27, no. 5, pp. 579-587, 2009.
[10] A. Bruhn, J. Weickert, and C. Schnorr, “Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods,” Int'l J. Computer Vision, vol. 61, no. 3, pp. 211-231, 2005.
[11] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms,” Int'l J. Computer Vision, vol. 47, nos. 1-3, pp. 7-42, 2002.
[12] M. Anguita, J. Díaz, E. Ros, and F. Fernandez-Baldomero, “Optimization Strategies for High-Performance Computing of Optical-Flow in General-Purpose Processors,” IEEE Trans. Circuits and Systems for Video Technology, vol. 19, no. 10, pp. 1475-1488, Oct. 2009.
[13] G. Bradski, “The OpenCV Library,” Dr. Dobb's J. Software Tools, vol. 25, pp. 120-126, 2000.
[14] J. Díaz, E. Ros, F. Pelayo, E. Ortigosa, and S. Mota, “FPGA-Based Real-Time Optical-Flow System,” IEEE Trans. Circuits and Systems for Video Technology, vol. 16, no. 2, pp. 274-279, Feb. 2006.
[15] J. Díaz, E. Ros, R. Carrillo, and A. Prieto, “Real-Time System for High-Image Resolution Disparity Estimation,” IEEE Trans. Image Processing, vol. 16, no. 1, pp. 280-285, Jan. 2007.
[16] J. Lu, S. Rogmans, G. Lafruit, and F. Catthoor, “Stream-Centric Stereo Matching and View Synthesis: A High-Speed Approach on GPUs,” IEEE Trans. Circuits and Systems for Video Technology, vol. 19, no. 11, pp. 1598-1611, Nov. 2009.
[17] J. Marzat, Y. Dumortier, and A. Ducrot, “Real-Time Dense and Accurate Parallel Optical Flow Using CUDA,” Proc. Int'l Conf. in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), Feb. 2009.
[18] A. Bruhn, J. Weickert, T. Kohlberger, and C. Schnoerr, “A Multigrid Platform for Real-Time Motion Computation with Discontinuity-Preserving Variational Methods,” Int'l J. Computer Vision, vol. 70, no. 3, pp. 257-277, 2006.
[19] M. Werlberger, W. Trobin, T. Pock, A. Wedel, D. Cremers, and H. Bischof, “Anisotropic Huber-L1 Optical Flow,” Proc. British Machine Vision Conf. (BMVC), Sept. 2009.
[20] W.J. MacLean, S. Sabihuddin, and J. Islam, “Leveraging Cost Matrix Structure for Hardware Implementation of Stereo Disparity Computation Using Dynamic Programming,” Computer Vision and Image Understanding, vol. 114, no. 11, pp. 1126-1138, 2010.
[21] J. Bergen, P. Anandan, K. Hanna, and R. Hingorani, “Hierarchical Model-Based Motion Estimation,” Proc. European Conf. Computer Vision (ECCV), pp. 237-252, 1992.
[22] A. Brodtkorb, C. Dyken, T. Hagen, J. Hjelmervik, and O. Storaasli, “State-of-the-Art in Heterogeneous Computing,” Scientific Programming, vol. 18, pp. 1-33, 2010.
[23] S. Che, J. Li, J.W. Sheaffer, K. Skadron, and J. Lach, “Accelerating Compute-Intensive Applications with GPUs and FPGAs,” Proc. Symp. Application Specific Processors, pp. 101-107, June 2008.
[24] S. Sarkar, G. Kulkarni, P. Pande, and A. Kalyanaraman, “Network-on-Chip Hardware Accelerators for Biological Sequence Alignment,” IEEE Trans. Computers, vol. 59, no. 1, pp. 29-41, Jan. 2010.
[25] N. Gac, S. Mancini, M. Desvignes, and D. Houzet, “High Speed 3D Tomography on CPU, GPU, and FPGA,” EURASIP J. Embedded Systems, vol. 2008, pp. 1-12, 2008.
[26] D.B. Thomas, L. Howes, and W. Luk, “A Comparison of CPUs, GPUs, FPGAs, and Massively Parallel Processor Arrays for Random Number Generation,” FPGA '09: Proc. ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, pp. 63-72, 2009.
[27] L. Howes, P. Price, O. Mencer, O. Beckmann, and O. Pell, “Comparing FPGAs to Graphics Accelerators and the Playstation 2 Using a Unified Source Description,” Proc. Int'l Conf. Field Programmable Logic and Applications (FPL '06), pp. 1-6, Aug. 2006.
[28] R. Weber, A. Gothandaraman, R.J. Hinde, and G.D. Peterson, “Comparing Hardware Accelerators in Scientific Applications: A Case Study,” IEEE Trans. Parallel and Distributed Systems, vol. 22, no. 1, pp. 58-68, Jan. 2011.
[29] A. Papakonstantinou, K. Gururaj, J. Stratton, D. Chen, J. Cong, and W.-M. Hwu, “FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs,” Proc. IEEE Seventh Symp. Application Specific Processors (SASP '09), pp. 35-42, July 2009.
[30] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” IEEE Micro, vol. 28, no. 2, pp. 39-55, Mar. 2008.
[31] S. Asano, T. Maruyama, and Y. Yamaguchi, “Performance Comparison of FPGA, GPU and CPU in Image Processing,” Proc. Int'l Conf. Field Programmable Logic and Applications (FPL), pp. 126-131, 2009.
[32] J. Chase, B. Nelson, J. Bodily, Z. Wei, and D.-J. Lee, “Real-Time Optical Flow Calculations on FPGA and GPU Architectures: A Comparison Study,” Proc. 16th Int'l Symp. Field-Programmable Custom Computing Machines, pp. 173-182, Apr. 2008.
[33] B. Cope, P. Cheung, W. Luk, and L. Howes, “Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study,” IEEE Trans. Computers, vol. 59, no. 4, pp. 433-448, Apr. 2010.
[34] “Handel-C Mentor Graphics,” Website, http://www.mentor. com/products/fpgahandel-c /, 2010.
[35] M. Tomasi, M. Vanegas, F. Barranco, J. Díaz, and E. Ros, “High-Performance Optical-Flow Architecture Based on a Multiscale, Multi-Orientation Phase-Based Model,” IEEE Trans. Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1797-1807, Dec. 2010.
[36] M. Tomasi, “Pyramidal Architecture for Stereo Vision and Motion Estimation in Real-Time FPGA-Based Devices,” PhD dissertation, Univ. of Granada, June 2010.
[37] E. Ros, J. Díaz, S.M.I. Odeh, and A. Cañas, “A Low Level Real-Time Vision System Using Specific Computing Architectures,” Proc. Sixth WSEAS Int'l Conf. Signal Processing, Computational Geometry and Artificial Vision, pp. 192-197, 2006.
[38] P. Burt and E. Adelson, “The Laplacian Pyramid as a Compact Image Code,” IEEE Trans. Comm., vol. 31, no. 4, pp. 532-540, Apr. 1983.
[39] J. Díaz, E. Ros, S. Mota, and R. Carrillo, “Local Image Phase, Energy and Orientation Extraction Using FPGAs,” Int'l J. Electronics, vol. 95, no. 7, pp. 743-760, 2008.
[40] “Xilinx,” Website, http:/, 2010.
[41] M. Tomasi, M. Vanegas, F. Barranco, J. Díaz, and E. Ros, “Real-Time Architecture for a Robust Multiscale Stereo Engine,” IEEE Trans. Very Large Scale Integration Systems, vol. PP, no. 99, pp. 1-12 2010.
[42] D.J. Fleet and A.D. Jepson, “Computation of Component Image Velocity from Local Phase Information,” Int'l J. Computer Vision, vol. 5, pp. 77-104, 1990.
[43] T. Gautama and M. Van Hulle, “A Phase-Based Approach to the Estimation of the Optical Flow Field Using Spatial Filtering,” IEEE Trans. Neural Networks, vol. 13, no. 5, pp. 1127-1136, Sept. 2002.
[44] NVIDIA CUDA Programming Guide, NVIDIA, 2004.
[45] “Seven Solutions,” Website, http:/, 2010.
[46] E.M. Ortigosa, A. Canas, E. Ros, P.M. Ortigosa, S. Mota, and J. Diaz, “Hardware Description of Multi-Layer Perceptrons with Different Abstraction Levels,” Microprocessors and Microsystems, vol. 30, no. 7, pp. 435-444, 2006.
[47] “Middlebury Computer Vision Pages,” Website, http:/vision., 2010.
[48] M. Vanegas, M. Tomasi, J. Díaz, and E. Ros, “Multi-Port Abstraction Layer for FPGA Intensive Memory Exploitation Applications,” J. Systems Architecture, vol. 56, no. 9, pp. 442-451, 2010.
[49] V. Volkov, “Better Performance at Lower Occupancy,” Proc. GPU Technology Conf. (GTC), 2010.
[50] “Digikey,” Website, http:/, May 2010.
[51] “NVIDIA,” Website, http:/, May 2010.
[52] “Many-Core Processors Report Ready for Duty,” white paper, General Electric,, 2010.
[53] “Xilinx Aerospace and Defense,” Website, http://www.xilinx. com/esp/aerospace-defense index.htm, 2011.

Index Terms:
Reconfigurable hardware, graphics processors, real-time systems, computer vision, motion, stereo.
Karl Pauwels, Matteo Tomasi, Javier Díaz, Eduardo Ros, Marc M. Van Hulle, "A Comparison of FPGA and GPU for Real-Time Phase-Based Optical Flow, Stereo, and Local Image Features," IEEE Transactions on Computers, vol. 61, no. 7, pp. 999-1012, July 2012, doi:10.1109/TC.2011.120
Usage of this product signifies your acceptance of the Terms of Use.