The Community for Technology Leaders
RSS Icon
Issue No.02 - February (2011 vol.22)
pp: 309-322
Gabriel Falcao , University of Coimbra, Coimbra
Leonel Sousa , Technical University of Lisbon, Lisbon
Vitor Silva , University of Coimbra, Coimbra
Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed in this paper to perform LDPC decoding on multicore architectures. To evaluate the efficiency of the proposed parallel algorithms, LDPC decoders were developed on recent multicores, such as off-the-shelf general-purpose x86 processors, Graphics Processing Units (GPUs), and the CELL Broadband Engine (CELL/B.E.). Challenging restrictions, such as memory access conflicts, latency, coalescence, or unknown behavior of thread and block schedulers, were unraveled and worked out. Experimental results for different code lengths show throughputs in the order of 1 \sim 2 Mbps on the general-purpose multicores, and ranging from 40 Mbps on the GPU to nearly 70 Mbps on the CELL/B.E. The analysis of the obtained results allows to conclude that the CELL/B.E. performs better for short to medium length codes, while the GPU achieves superior throughputs with larger codes. They achieve throughputs that in some cases approach very well those obtained with VLSI decoders. From the analysis of the results, we can predict a throughput increase with the rise of the number of cores.
LDPC, data-parallel computing, multicore, graphics processing units, GPU, CUDA, CELL, OpenMP.
Gabriel Falcao, Leonel Sousa, Vitor Silva, "Massively LDPC Decoding on Multicore Architectures", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 2, pp. 309-322, February 2011, doi:10.1109/TPDS.2010.66
[1] R.G. Gallager, "Low-Density Parity-Check Codes," IRE Trans. Information Theory, vol. 8, no. 1, pp. 21-28, Jan. 1962.
[2] D.J.C. Mackay and R.M. Neal, "Near Shannon Limit Performance of Low Density Parity Check Codes," IEE Electronics Letters, vol. 32, no. 18, pp. 1645-1646, Aug. 1996.
[3] R. Tanner, "A Recursive Approach to Low Complexity Codes," IEEE Trans. Information Theory, vol. 27, no. 5, pp. 533-547, Sept. 1981.
[4] J. Chen and M.P.C. Fossorier, "Near Optimum Universal Belief Propagation Based Decoding of Low-Density Parity Check Codes," IEEE Trans. Comm., vol. 50, no. 3, pp. 406-414, Mar. 2002.
[5] L. Ping and W.K. Leung, "Decoding Low Density Parity Check Codes with Finite Quantization Bits," IEEE Comm. Letters, vol. 4, no. 2, pp. 62-64, Feb. 2000.
[6] A.J. Blanksby and C.J. Howland, "A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Code Decoder," IEEE J. Solid-State Circuits, vol. 37, no. 3, pp. 404-412, Mar. 2002.
[7] T. Zhang and K. Parhi, "Joint (3,k)-Regular LDPC Code and Decoder/Encoder Design," IEEE Trans. Signal Processing, vol. 52, no. 4, pp. 1065-1079, Apr. 2004.
[8] J. Dielissen, A. Hekstra, and V. Berg, "Low Cost LDPC Decoder for DVB-S2," Proc. Conf. Design, Automation and Test in Europe (DATE '06), Mar. 2006.
[9] S. Seo, T. Mudge, Y. Zhu, and C. Chakrabarti, "Design and Analysis of LDPC Decoders for Software Defined Radio," Proc. IEEE Workshop Signal Processing Systems, pp. 210-215, Oct. 2007.
[10] G. Blake, R.G. Dreslinski, and T. Mudge, "A Survey of Multicore Processors," IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 26-37, Nov. 2009.
[11] H. Kim and R. Bond, "Multicore Software Technologies," IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 1-10, Nov. 2009.
[12] B. Chapman, G. Jost, and R. Van Der Pas, Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press, 2008.
[13] G. Falcao, L. Sousa, and V. Silva, "Massive Parallel LDPC Decoding on GPU," Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '08), pp. 83-90, Feb. 2008.
[14] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A.E. Lefohn, and T.J. Purcell, "A Survey of General-Purpose Computation on Graphics Hardware," Computer Graphics Forum, vol. 26, no. 1, pp. 80-113, 2007.
[15] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: Stream Computing on Graphics Hardware," ACM Trans. Graphics, vol. 23, no. 3, pp. 777-786, 2004.
[16] N. Goodnight, R. Wang, and G. Humphreys, "Computation on Programmable Graphics Hardware," IEEE Computer Graphics and Applications, vol. 25, no. 5, pp. 12-15, Sept. 2005.
[17] M. McCool, "Scalable Programming Models for Massively Multicore Processors," Proc. IEEE, vol. 96, no. 5, pp. 816-831, May 2008.
[18] CUDA Homepage, html , 2010.
[19] CTM Homepage, , 2010.
[20] S. Yamagiwa and L. Sousa, "Caravela: A Novel Stream-Based Distributed Computing Environment," Computer, vol. 40, no. 5, pp. 70-77, May 2007.
[21] Int'l Business Machines Corporation, "CELL Broadband Engine Architecture," 2006.
[22] H. Hofstee, "Power Efficient Processor Architecture and the Cell Processor," Proc. 11th Int'l Symp. High-Performance Computer Architectures (HPCA), pp. 258-262, 2005.
[23] G. Falcao, V. Silva, and L. Sousa, "High Coded Data Rate and Multicodeword WiMAX LDPC Decoding on Cell/BE," IET Electronics Letters, vol. 44, no. 24, pp. 1415-1417, Nov. 2008.
[24] S.B. Wicker and S. Kim, Fundamentals of Codes, Graphs, and Iterative Decoding. Kluwer Academic Publishers, 2003.
[25] J.B. Lemaire, J.P. Schaefer, L.A. Martin, P. Faris, M.D. Ainslie, and R.D. Hull, "Effectiveness of the Quick Medical Reference as a Diagnostic Tool," Canadian Medical Assoc. J. (CMAJ), vol. 161, no. 6, pp. 725-728, 1999.
[26] S. Chung, G. Forney, T. Richardson, and R. Urbanke, "On the Design of Low-Density Parity-Check Codes within 0.0045 dB of the Shannon Limit," IEEE Comm. Letters, vol. 5, no. 2, pp. 58-60, Feb. 2001.
[27] S. Lin and D.J. Costello, Error Control Coding, second ed. Prentice Hall, 2004.
[28] D.J.C. Mackay, "Good Error-Correcting Codes Based on Very Sparse Matrices," IEEE Trans. Information Theory, vol. 45, no. 2, pp. 399-431, Mar. 1999.
[29] S. Kumar, C.J. Hughes, and A. Nguyen, "Architectural Support for Fine-Grained Parallelism on Multi-Core Architectures," Intel Technology J., vol. 11, no. 3, pp. 217-226, Aug. 2007.
[30] J. Abellán, J. Fernández, and M. Acacio, "CellStats: A Tool to Evaluate the Basic Synchronization and Communication Operations of the Cell BE," Proc. 16th Euromicro Int'l Conf. Parallel, Distributed and Network-Based Processing (PDP '08), Feb. 2008.
[31] C.-H. Liu, S.-W. Yen, C.-L. Chen, H.-C. Chang, C.-Y. Lee, Y.-S. Hsu, and S.-J. Jou, "An LDPC Decoder Chip Based on Self-Routing Network for IEEE 802.16e Applications," IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 684-694, Mar. 2008.
34 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool