Subscribe
Issue No.06 - June (2013 vol.62)
pp: 1170-1178
Jiafeng Xie , Central South University, Changsha
Pramod Kumar Meher , Institute for Infocomm Research, Singapore
Jianjun He , Central South University, Changsha
ABSTRACT
This paper presents an efficient decomposition scheme for hardware-efficient realization of discrete cosine transform (DCT) based on distributed arithmetic. We have proposed an efficient design for the implementation of cyclic convolution based on a group distributed arithmetic (GDA) technique where the read-only memory size could be reduced over the existing GDA-based design. The proposed structure for DCT implementation, based on the new decomposition scheme and proposed design of GDA-based cyclic convolution, involves significantly less area complexity than the existing one. For example, to implement the DCT of transform length $(N = 17)$, the proposed design needs a lookup table of 128 words, while the existing design for $(N = 16)$ requires a lookup table of 256 words. From the synthesis results, it is found that proposed design involves significantly less area, gives higher throughput, and consumes less power compared to the existing designs of nearly the same or lower lengths.
INDEX TERMS
Discrete cosine transforms, Convolution, Sparse matrices, Matrix decomposition, Hardware, Read only memory, hardware efficient, Distributed arithmetic (DA), cyclic convolution, discrete cosine transform (DCT)
CITATION
Jiafeng Xie, Pramod Kumar Meher, Jianjun He, "Hardware-Efficient Realization of Prime-Length DCT Based on Distributed Arithmetic", IEEE Transactions on Computers, vol.62, no. 6, pp. 1170-1178, June 2013, doi:10.1109/TC.2012.64
REFERENCES
 [1] K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. Wiley, 1999. [2] S. Saponara et al., "Performance and Complexity Co-Evaluation of the Advanced Video Coding Standard for Cost-Effective Multimedia Communications," EURASIP J. Applied Signal Processing, vol. 2004, no. 2, pp. 220-235, 2004. [3] L. Fanucci et al., "Parametrized and Reusable VLSI Macro Cells for the Low-Power Realization of 2-D Discrete-Cosine-Transform," Microelectronics J., vol. 32, no. 12, pp. 1035-1045, 2001. [4] J. Li, M. Gabbouj, and J. Takala, "Zero-Quantized Inter DCT Coefficient Prediction for Real-Time Video Coding," IEEE Trans. Circuits and Systems for Video Technology, vol. 22, no. 2, pp. 249-259, Feb. 2012. [5] E. Kaminsky et al., "DCT-Domian Coder for Digital Video Applications" J. Real Time Image Processing, vol. 5, no. 6, pp. 259-274, 2010. [6] N.E. L'Insalata et al., "Automatic Synthesis of Cost Effective FFT/IFFT Cores for VLSI OFDM Systems," IEICE Trans. Electronics, vol. E91-C, no. 4, pp. 487-496, 2010. [7] P. Yang and M. Narasimha, "Prime Factor Decomposition of the Discrete Cosine Transform and Its Hardware Realization," Proc. Int'l Conf. Acoustics Speech and Signal Processing (ICASSP), vol. 10, pp. 772-775, 1985. [8] S. Yu and E.E. SwartzianderJr., "DCT Implementation with Distributed Arithmetic," IEEE Trans. Computers, vol. 50, no. 9, pp. 985-991, Sept. 2001. [9] T. Xanthopoulos and A.P. Chandrakasan, "A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization," IEEE J. Solid-State Circuits, vol. 35, no. 5, pp. 740-750, May 2000. [10] A.M. Shama, A. Chidanandan, W. Pan, and M.A. Bayoumi, "NEDA: A Low-Power High-Performance DCT Architecture," IEEE Trans. Signal Processing, vol. 54, no. 3, pp. 955-964, Mar. 2006. [11] P.K. Meher, "Unified Systolic-Like Architecture for DCT and DST Using Distributed Arithmetic," IEEE Trans. Circuits Systems, vol. 53, no. 12, pp. 2656-2663, Dec. 2006. [12] R-X. Yin and W-C. Siu, "A New Fast Algorithm for Computing Prime-Length DCT through Cyclic Convolutions," Signal Processing, vol. 81, no. 5, pp. 895-906, May 2001. [13] C. Cheng and K.K. Parhi, "A Novel Systolic Array Structure for DCT," IEEE Trans. Circuits Systems II, Express Briefs, vol. 52, no. 7, pp. 366-369, July 2005. [14] H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, "A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform," IEEE Trans. Circuits Systems Video Technology, vol. 15, no. 3, pp. 445-453, Mar. 2005. [15] A. Croisier, D.J. Esteban, M.E. Leilion, and V. Rizo, Digital Filter for PCM Encoded Signals, US Patent 3,777,130, 1973. [16] A. Peled and B. Lin, "A New Hardware Realization of Digital Filters," IEEE Trans. Acoustics, Speech, Signal Processing, vol. ASSP-22, no. 6, pp. 456-462, Dec. 1974. [17] R.C. Agarwal and J.W. Cooley, "New Algorithms for Digital Convolution," IEEE Trans. Acoustics, Speech, Signal Processing, vol. ASSP-25, no. 5, pp. 392-410, Oct. 1977. [18] C.M. Rader, "Discrete Fourier Transforms when the Number of Data Samples Is Prime," Proc. IEEE, vol. 56, no. 6, pp. 1107-1108, June 1968. [19] D.F. Chiper, "A Systolic Array Algorithm for an Efficient Unified Memory-Based Implementation of the Inverse Discrete Cosine and Sine Transforms," Proc. IEEE Conf. Image Processing, pp. 764-768, Oct. 1999.