This Article 
 Bibliographic References 
 Add to: 
Parallel Architecture for Fast Transforms with Trigonometric Kernel
October 1994 (vol. 5 no. 10)
pp. 1091-1099

We present an unified parallel architecture for four of the most important fast orthogonal transforms with trigonometric kernel: Complex Valued Fourier (CFFT), Real Valued Fourier (RFFT), Hartley (FHT), and Cosine (FCT). Out of these, only the CFFT has a data flow coinciding with the one generated by the successive doubling method, which can be transformed on a constant geometry flow using perfect unshuffle or shuffle permutations. The other three require some type of hardware modification to guarantee the constant geometry of the successive doubling method. We have defined a generalized processing section (PS), based on a circular CORDIC rotator, for the four transforms. This PS section permits the evaluation of the CFFT and FCT transforms in n data recirculations and the RFFT and FHT transforms in n-1 data recirculations, with n being the number of stages of a transform of length N=r/sup n/. Also, the efficiency of the partitioned parallel architecture is optimum because there is no cycle loss in the systolic computation of all the butterflies for each of the four transforms.

[1] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series,"J. Math. Comput., vol. 19, pp. 297-301, 1965.
[2] Groginsky and Works, "A pipelined fast Fourier transform,"IEEE Trans. Comput., vol. 19, pp. 1015-1019, 1970.
[3] E. E. Swartzlander,VLSI Signal Processing Systems, Hingham, MA: Kluwer Academic, 1986.
[4] M. C. Pease, "An adaptation of the fast Fourier transform for parallel processing,"J. ACM, vol. 15, pp. 252-264, 1968.
[5] F. Argüello, "Application specific array processors for fast orthogonal transforms" (in Spanish), Ph.D. dissertation, Universidad de Santiago de Compostela, Spain, 1992.
[6] H.T. Kung and C.E. Leiserson, "Systolic arrays for VLSI," inSparse Matrix Proc. 1978, Soc. Indus. Appli. Math., I. S. Duff and G. W. Steward, Eds., 1979, pp. 256-282.
[7] J. A. B. Fortes, K.-S. Fu, and B. W. Wah, "Systematic design approaches for algorithmically specified systolic arrays, " inComputer Architecture: Concepts and Systems, V. M. Milutinovic´, Ed. New York: North-Holland, Elsevier Science, 1988, ch. 11, pp. 454-494.
[8] W. Shen and A. Y. Oruc, "Systolic arrays for multidimensional discrete transforms,"J. Supercomputing, vol. 4, pp. 201-222, 1990.
[9] B. Holland and J. Mather, "Monolithic frequency domain processing with 450 MFLOPS throughput,"Electron. Eng., vol. 61, pp. 29-36, 1989.
[10] H.M. Ahmed, "Directions in DSP processors,"IEEE J. Select. Areas Commun., vol. 8, pp. 1420-1427, 1990.
[11] H. Malvar, "Fast computation of discrete cosine transform through fast Hartley transform,"Electron. Lett., vol. 22, pp. 352-353, 1986.
[12] W. A. Perera, "Architecture for multiplierless fast Fourier transform hardware implementation in VLSI,"IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, pp. 1750-1760, 1987.
[13] A. Despain, "Fourier transform computers using CORDIC iterations,"IEEE Trans. Comput., vol. 23, pp. 993-1001, 1974.
[14] Y. H. Hu, "Cordic-based VLSI architectures for digital signal processing,"IEEE Signal Processing Mag., 1992.
[15] D. Timmermann, H. Hahn, and B. J. Hosticka, "Low latency time CORDIC algorithm,"IEEE Trans. Comput., vol. 41, pp. 1010-1015, 1992.
[16] F. Argüello, J.D. Bruguera, R. Doallo, T. Lang, and E.L. Zapata, "CORDIC-based application-specific processor for orthogonal transforms," Tech. Rep., Comput. Architecture Dept., Universidad de Málaga, Spain, 1993.
[17] E. L. Zapata and F. Argüello, "Aplication-specific architecture for fast transforms based on the successive doubling method,"IEEE Trans. Signal Processing, vol. 41, no. 3, pp. 1476-1481, 1993.
[18] K. Yamashitaet al., "A wafer-scale 170 000-gate FFT processor with built-in test circuits,"IEEE J. Solid-State Circuits, vol. 23, no. 2, pp. 336-342, Apr. 1988.
[19] J. You and S. S. Wong, "A high performance single-chip FFT array processor for WSI,"Int. Conf. Wafer Scale Integration, 1990, pp. 60-67.
[20] V. K. Jain, H. Hikawa, and E. E. Swartzlander, "Defect tolerance and yield for a wafer scale FFT processor system,"Int. Conf. Wafer Scale Integration, 1990, pp. 54-60.
[21] H. Miyanaga and H. Yamuchi, "A 400 MFLOPS FFT VLSI architecture,"IEICE Trans., vol. E-74, pp. 3845-3851, 1991.
[22] E. E. Swartzlander, Young, and Joseph, "A radix-4 delay commutator for fast Fourier transform processor implementation,"IEEE J. Solid-State Circuits, vol. 19, pp. 702-709, 1984.
[23] H.V. Sorensen et al., "Real-Valued Fast Fourier Transform Algorithms,"IEEE Trans. ASSP, Vol. ASSP-35, No. 6, June 1987, pp. 849-863. (Corrections appear inIEEE Trans. Acoustics, Speech and Signal Processing, Vol. ASSP-35, No. 9, Sept. 1987, p. 1353.)
[24] E. L. Zapata and F. Argüello, "A VLSI constant geometry architecture for the fast Hartley and Fourier transforms,"IEEE Trans. Parallel Distrib. Syst., vol. 3, pp. 58-70, 1992.
[25] R. V. L. Hartley, "A more symmetrical Fourier analysis applied to transmission problems,"Proc. IRE, vol. 30, pp. 142-150, 1942.
[26] R.N. Bracewell,The Hartley Transform, Oxford University Press, New York, 1986.
[27] F. Argüello, R. Doallo, and E.L. Zapata, "A semisystolic architecture for the fast Hartley transform: Decimation in frequency and radix-2,"J. IEE Proc. Part G: Circuits, Devices, Syst., vol. 138, pp. 651-660, 1991.
[28] N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine transform,"IEEE Trans. Commun., vol. 23, pp. 90-93, 1974.
[29] K. Rao and P. Yip,Discrete Cosine Transform - Algorithms, Advantages, Applications, Academic Press, London, 1990.
[30] F. Argüello and E. L. Zapata, "Fast cosine transform based on the succesive doubling method,"J. Electron. Lett., vol. 26, pp. 1616-1618, 1990.
[31] E. Feig and S. Winograd, "Fast algorithms for the discrete cosine transform,"IEEE Trans. Sig. Processing, vol. 40, pp. 2174-2193, 1992.
[32] J. Makhoul, "A fast cosine transform in one and two dimensions,"IEEE Trans. Acous., Speech, Sig. Processing, vol. 28, pp. 27-34, 1980.
[33] T. F. Pena, J. C. Cabaleiro, J.D. Bruguera, and E.L. Zapata, "Filtering with the fastTtransform,"Electron. Lett., vol. 26, pp. 718-720, 1990.

Index Terms:
Index Termsparallel architectures; transforms; Fourier transforms; multiprocessor interconnectionnetworks; parallel algorithms; mathematics computing; parallel architecture; fasttransforms; trigonometric kernel; fast orthogonal transforms; Complex Valued FourierTransform; Real Valued Fourier Transform; Hartley Transform; Cosine Transform;successive doubling method; constant geometry flow; perfect unshuffle; shuffle;hardware modification; circular CORDIC rotator; data recirculations; partitioned parallelarchitecture; cycle loss; systolic computation; butterflies; systolic array
F. Argüello, J.D. Bruguera, R. Doallo, E.L. Zapata, "Parallel Architecture for Fast Transforms with Trigonometric Kernel," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 10, pp. 1091-1099, Oct. 1994, doi:10.1109/71.313124
Usage of this product signifies your acceptance of the Terms of Use.