The Community for Technology Leaders
RSS Icon
Issue No.08 - August (2011 vol.23)
pp: 1169-1181
Jens Teubner , ETH Zurich, Zurich
Gustavo Alonso , ETH Zurich, Zurich
Computing frequent items is an important problem by itself and as a subroutine in several data mining algorithms. In this paper, we explore how to accelerate the computation of frequent items using field-programmable gate arrays (FPGAs) with a threefold goal: increase performance over existing solutions, reduce energy consumption over CPU-based systems, and explore the design space in detail as the constraints on FPGAs are very different from those of traditional software-based systems. We discuss three design alternatives, each one of them exploiting different FPGA features and each one providing different performance/scalability trade-offs. An important result of the paper is to demonstrate how the inherent massive parallelism of FPGAs can improve performance of existing algorithms but only after a fundamental redesign of the algorithms. Our experimental results show that, e.g., the pipelined solution we introduce can reach more than 100 million tuples per second of sustained throughput (four times the best available results to date) by making use of techniques that are not available to CPU-based solutions. Moreover, and unlike in software approaches, the high throughput is independent of the skew of the Zipf distribution of the input and at a far lower energy cost.
Data mining, reconfigurable hardware, parallelism and concurrency.
Jens Teubner, Gustavo Alonso, "Frequent Item Computation on a Chip", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 8, pp. 1169-1181, August 2011, doi:10.1109/TKDE.2010.216
[1] D. Greaves and S. Singh, "Kiwi: Synthesis of FPGA Circuits from Parallel Programs," Proc. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM), 2008.
[2] Kickfire, http:/, 2010.
[3] Netezza Corp., http:/, 2011.
[4] G. Cormode and M. Hadjieleftheriou, "Finding Frequent Items in Data Streams," Proc. VLDB Endowment, vol. 1, no. 2, pp. 1530-1541, 2008.
[5] A. Metwally, D. Agrawal, and A.E. Abbadi, "An Integrated Efficient Solution for Computing Frequent and Top-$k$ Elements in Data Streams," ACM Trans. Database Systems, vol. 31, no. 3, pp. 1095-1133, Sept. 2006.
[6] J. Teubner, R. Mueller, and G. Alonso, "FPGA Acceleration for the Frequent Item Problem," Proc. 26th Int'l Conf. Data Eng. (ICDE), Mar. 2010.
[7] Virtex-5 FPGA User Guide, Xilinx Inc., May 2009.
[8] N. Bandi, A. Metwally, D. Agrawal, and A.E. Abbadi, "Fast Data Stream Algorithms Using Associative Memories," Proc. ACM SIGMOD, pp. 247-256, June 2007.
[9] An Overview of Multiple CAM Designs in Virtex Family Devices, Application Note 201, Xilinx Inc., Sept. 1999.
[10] Content-Addressable Memory v6.1, Xilinx Inc., Sept. 2008.
[11] R. Mueller, J. Teubner, and G. Alonso, "Data Processing on FPGAs," Proc. VLDB Endowment, vol. 2, no. 1, pp. 910-921, Aug. 2009.
[12] G.E. Blelloch, "Prefix Sums and Their Applications," Synthesis of Parallel Algorithms, J.H. Reif, ed., Morgan Kaufmann, 1993.
[13] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," Proc. 20th Int'l Conf. Very Large Data Bases (VLDB), pp. 487-499, Sept. 1994.
[14] A. DeHon, "Balancing Interconnect and Computation in a Reconfigurable Computing Array (or, Why You Don't Really Want 100% LUT Utilization)," Proc. Int'l Symp. Field Programmable Gate Arrays (FPGA), pp. 125-134, Feb. 1999.
[15] J. Davis, C. Thacker, and C. Chang, "BEE3: Revitalizing Computer Architecture Research," Technical Report MSR-TR-2009-45, Microsoft Research, 2009.
[16] O. Mencer, K.H. Tsoi, S. Craimer, T. Todman, W. Luk, M.Y. Wong, and P.H.W. Leong, "CUBE: A 512-FPGA Cluster," Proc. Southern Programmable Logic Conf. (SPL), 2009.
[17] S. Das, S. Antony, D. Agrawal, and A.E. Abbadi, "Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams," Proc. VLDB Endowment (PVLDB), vol. 2, no. 1, pp. 217-228, Aug. 2009.
[18] H.T. Kung and C.E. Leiserson, "Systolic Arrays (for VLSI)," Proc. Symp. Sparse Matrix Computations, pp. 256-282, Nov. 1978.
[19] H.T. Kung and P.L. Lohman, "Systolic (VLSI) Arrays for Relational Database Operations," Proc. ACM SIGMOD, pp. 105-116, May 1980.
[20] A.R. Hurson, C.R. Petrie, and J.B. Cheng, "A VLSI Join Module," Proc. 21st Hawaii Int'l Conf. System Sciences, pp. 41-49, 1988.
[21] Z.K. Baker and V.K. Prasanna, "Efficient Hardware Data Mining with the Apriori Algorithm on FPGAs," Proc. 13th Symp. Field-Programmable Custom Computing Machines (FCCM), pp. 3-12, Apr. 2005.
102 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool