This Article 
 Bibliographic References 
 Add to: 
Reducing Server Data Traffic Using a Hierarchical Computation Model
October 2005 (vol. 16 no. 10)
pp. 933-943

Abstract—Commercial workloads impose heavy demands on memory and storage subsystems in a server and often result in a large amount of traffic in I/O and memory buses. To reduce the data movement between the storage subsystem and the processing units, we propose a hierarchical computing (HC) system that distributes processing elements across the storage hierarchy. We present a programming model that allows us to decompose database queries into simple operations. These operations are then distributed and executed by the different layers of the hierarchy depending on the affinity of the task to a particular layer. Commands percolate down into the lower layers of the hierarchy and partially processed information flows up into the higher layers, where subsequent operations can be performed. We evaluate the effectiveness of the proposed hierarchical computing model by performing full system simulations of a business decision support system (DSS) workload. On a group of TPC-H-like queries, hierarchical computing systems reduce the amount of data transferred over the processor to memory interconnect by 37-58 percent. We also observe that HC configurations show speedups between 1.14x and 1.45x when compared with CC-NUMA with 32 processors.

[1] A.M.G. Maynard, C.M. Donnelly, and B.R. Olszewski, “Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 145-156, Oct. 1994.
[2] L. Barroso, K. Gharachorloo, and E. Bugnion, “Memory System Characterization of Commercial Workloads,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 3-14, 1998.
[3] W.W. Hsu, A.J. Smith, and H.C. Young, “Characteristics of Production Database Workloads and the TPC Benchmarks,” IBM Systems J., vol. 40, no. 3, pp. 781-802, 2001.
[4] W.W. Hsu, A.J. Smith, and H.C. Young, “I/O Reference Behavior of Production Database Workloads and the TPC Benchmarks— An Analysis at the Logical Level,” ACM Trans. Database Systems, vol. 26, no. 1, 2001.
[5] A. Ailamaki, D.J. DeWitt, M.D. Hill, and D.A. Wood, “DBMSs on a Modern Processor: Where Does Time Go?” Proc. 25th Conf. Very Large Data Bases, pp. 15-26, 1999.
[6] Dell Computer Corporation, “Dell PowerEdge 8450 Specification,” , 2001.
[7] IBM Corp., “IBM p670 Description,” hardware/midrangep670_desc.html, 2001.
[8] G.F. Pfister, In Search of Clusters, second ed. Upper Saddle River, N.J.: Prentice Hall, 1998
[9] K. Gharachorloo, M. Sharma, S. Steely, and S.V. Doren, “Architecture and Design of AlphaServer GS320,” Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 13-24, 2000.
[10] Compaq Corp., “AlphaServer GS80, GS160, and GS320 Systems Technical Summary,” white paper, Feb. 2002.
[11] Transaction Processing Council, “The TPC-H Benchmark Specification,” http://www.tpc.orgtpch, 2002.
[12] D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, “A Case for Intelligent RAM: IRAM,” IEEE Micro, pp. 34-44, Apr. 1997.
[13] G.J. Lipovski, “A Four Megabit Dynamic Systolic Associative Memory Chip,” J. VLSI Signal Processing, vol. 4, no. 1, pp. 37-51, 1992.
[14] D.G. Elliott, W.M. Snelgrove, and M. Stumm, “Computational RAM: A Memory-SIMD Hybrid and Its Application to DSP,” Proc. Custom Integrated Circuits Conf., pp. 30.6.1-30.6.4, May 1992.
[15] M. Oskin, F. Chong, and T. Sherwood, “Active Pages: A Computation Model for Intelligent Memory,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 192-203, 1998.
[16] Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, “FlexRAM: Toward an Advanced Intelligent Memory System,” Proc. Int'l Conf. Computer Design, Oct. 1999.
[17] M. Hall, P.M. Kogge, J. Koller, P. Diniz, J. Chame, J. Drapper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freech, J. Shin, and J. Park, “Mapping Irregular Applications to DIVA, a PIM-Based Data-Intensive Architecture,” Proc. High Performance Networking and Computing Conf., Nov. 1999.
[18] K. Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, and M. Horowitz, “Smart Memories: A Modular Reconfigurable Architecture,” Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 161-171, June 2000.
[19] K. Keeton, D.A. Patterson, and J.M. Hellerstein, “A Case for Intelligent Disks (IDISKs),” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 42-52, June 1998.
[20] A. Acharya, M. Uysal, and J. Saltz, “Active Disks: Programming Model, Algorithms and Evaluation,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 81-91, Oct. 1998.
[21] E. Riedel, G. Gibson, and C. Faloustos, “Active Storage for Large-Scale Data Mining and Multimedia,” Proc. 24th Conf. Very Large Data Bases, pp. 62-73, Aug. 1998.
[22] G. Memik, M.T. Kandemir, and A. Choudhary, “Design and Evaluation of a Smart Disk Cluster for DSS Commercial Workloads,” J. Parallel and Distributed Computing, vol. 61, pp. 1633-1664, Nov. 2001.
[23] Intel Corporation, “Intel IXP Network Processors,” npfamily/, Feb. 2003.
[24] Motorola Inc., “C-3e Network Processor Data Sheet,” C3ENPA1-DS.pdf, Nov. 2002.
[25] D. Daniels, P.G. Selinger, L.M. Haas, B.G. Lindsay, C. Mohan, A. Walker, and P.F. Wilms, “An Introduction to Distributed Query Compilation in R*,” Proc. Second Int'l Symp. Distributed Data Bases, pp. 291-309, 1982.
[26] P. Bodorik, J.S. Riordo, and C. Jacob, “Dynamic Distributed Query Processing Techniques,” Proc. 17th Ann. ACM Conf. Computer Science: Computing Trends in the 1990's, pp. 348-357, Feb. 1989.
[27] K. Arvind and R.S. Nikhil, “Executing a Program on the MIT Tagged-Token Dataflow Architecture,” Parallel Architectures and Languages Europe, Volume 2: Parallel Languages, J.W. de Bakker, A.J. Nijman, and P.C. Treleaven, eds., Berlin, Germany: Springer-Verlag, 1987.
[28] T. von Eicken, D.E. Culler, S.C. Goldstein, and K.E. Schauser, “Active Messages: A Mechanism for Integrated Communication and Computation,” Proc. 19th Ann. Int'l Symp. Computer Architecture, May 1992.
[29] D. Menasce and V. Almeida, “Cost-Performance Analysis of Heterogeneity in Supercomputer Architectures,” Proc. Supercomputing, pp. 169-177, Nov. 1990.
[30] Z. Ben-Miled and J.A.B. Fortes, “A Heterogeneous Hierarchical Solution to Cost-Efficient High Performance Computing,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, Oct. 1996.
[31] Z. Ben-Miled, J.A.B. Fortes, R. Eigenmann, and V. Taylor, “On the Cost-Efficiency of Hierarchical Heterogeneous Machines for Compiler- and Hand-Parallelized Applications,” Int'l J. Parallel and Distributed Systems and Networks, 1998.
[32] R.J.O. Figueiredo and J.A.B. Fortes, “Impact of Heterogeneity on DSM Performance,” Proc. Sixth Int'l Symp. High Performance Computer Architecture, Jan. 2000.
[33] IBM Corp., “SimOS PowerPC,” SimOSppc.html, 2000.
[34] IBM Corp., “IBM DB2 Universal Database,” /, 2001.
[35] M.M. Michael, A.K. Nanda, B.-H. Lim, and M.L. Scott, “Coherence Controller Architectures for SMP-Based CC-NUMA Multiprocessors,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 133-143, June 1997.
[36] R.C. Agarwal, “A Superscalar Sort Algorithm for RISC Processors,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 240-246, June 1996.
[37] Advanced Database Machine Architecture, D.K. Hsiao, ed. Englewood Cliffs, N.J.: Prentice-Hall, 1983.
[38] Parallel Architectures for Data/Knowledge-Based Systems, L.L. Miller, A.R. Hurson, and S.H. Pakzad, eds. Los Alamitos, Calif.: IEEE Press, 1995.
[39] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 24-36, June 1995.

Index Terms:
Distributed architectures, measurement, evaluation, modeling, simulation of multiple-processor systems, I/O interconnections topology, databases.
Juan Rubio, Lizy Kurian John, "Reducing Server Data Traffic Using a Hierarchical Computation Model," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 10, pp. 933-943, Oct. 2005, doi:10.1109/TPDS.2005.127
Usage of this product signifies your acceptance of the Terms of Use.