Subscribe

Issue No.09 - Sept. (2013 vol.19)

pp: 1513-1525

Duksu Kim , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea

Jinkyu Lee , Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA

Junghwan Lee , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea

Insik Shin , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea

J. Kim , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea

Sung-Eui Yoon , Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon, South Korea

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TVCG.2013.71

ABSTRACT

We present a novel, linear programming (LP)-based scheduling algorithm that exploits heterogeneous multicore architectures such as CPUs and GPUs to accelerate a wide variety of proximity queries. To represent complicated performance relationships between heterogeneous architectures and different computations of proximity queries, we propose a simple, yet accurate model that measures the expected running time of these computations. Based on this model, we formulate an optimization problem that minimizes the largest time spent on computing resources, and propose a novel, iterative LP-based scheduling algorithm. Since our method is general, we are able to apply our method into various proximity queries used in five different applications that have different characteristics. Our method achieves an order of magnitude performance improvement by using four different GPUs and two hexa-core CPUs over using a hexa-core CPU only. Unlike prior scheduling methods, our method continually improves the performance, as we add more computing resources. Also, our method achieves much higher performance improvement compared with prior methods as heterogeneity of computing resources is increased. Moreover, for one of tested applications, our method achieves even higher performance than a prior parallel method optimized manually for the application. We also show that our method provides results that are close (e.g., 75 percent) to the performance provided by a conservative upper bound of the ideal throughput. These results demonstrate the efficiency and robustness of our algorithm that have not been achieved by prior methods. In addition, we integrate one of our contributions with a work stealing method. Our version of the work stealing method achieves 18 percent performance improvement on average over the original work stealing method. This result shows wide applicability of our approach.

INDEX TERMS

Computational modeling, Multicore processing, Scheduling algorithms, Optimization, Acceleration,motion planning, Heterogeneous system, proximity query, scheduling, collision detection, ray tracing

CITATION

Duksu Kim, Jinkyu Lee, Junghwan Lee, Insik Shin, J. Kim, Sung-Eui Yoon, "Scheduling in Heterogeneous Computing Environments for Proximity Queries",

*IEEE Transactions on Visualization & Computer Graphics*, vol.19, no. 9, pp. 1513-1525, Sept. 2013, doi:10.1109/TVCG.2013.71REFERENCES

- [1] M. Lin and D. Manocha, "Collision and Proximity Queries,"
Handbook of Discrete and Computational Geometry, 2003.- [2] I. Wald and V. Havran, "On Building Fast KD-Trees for Ray Tracing, and on Doing That in O(N log N),"
Proc. IEEE Symp. Interactive Ray Tracing, pp. 61-69, 2006.- [3] Y.-J. Kim, Y.-T. Oh, S.-H. Yoon, M.-S. Kim, and G. Elber, "Coons BVH for Freeform Geometric Models,"
ACM Trans. Graphics, vol. 30, no. 6, pp. 169:1-169:8, Dec. 2011.- [4] S. Borkar, "Thousand Core Chips - A Technology Perspective,"
Proc. 44th Ann. Design Automation Conf., pp. 746-749, 2007.- [5] NVIDIA, "CUDA Programming Guide 2.0," 2008.
- [6] S. Yeo and H.-H. Lee, "Using Mathematical Modeling in Provisioning a Heterogeneous Cloud Computing Environment,"
Computer, vol. 44, pp. 55-62, 2011.- [7] G. Diamos and S. Yalamanchili, "Harmony: An Execution Model and Runtime for Heterogeneous Many Core Systems,"
Proc. 17th Int'l Symp. High Performance Distributed Computing, pp. 197-200, 2008.- [8] C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, "StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures,"
Concurrency and Computation: Practice and Experience, vol. 23, no. 2, pp. 187-198, 2011.- [9] J.E. Stone, D. Gohara, and G. Shi, "OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems,"
Computing in Science and Eng., vol. 12, no. 3, pp. 66-73, May 2010.- [10]
The OmpSs Programming Model, http://pm.bsc.esompss/, 2013.- [11] M.L. Pinedo,
Scheduling: Theory, Algorithm, and Systems. Springer, 2008.- [12] C.N. Potts, "Analysis of a Linear Programming Heuristic for Scheduling Unrelated Parallel Machines,"
Discrete Applied Math., vol. 10, no. 2, pp. 155-164, 1985.- [13] J. Lenstra, D. Shmoys, and E. Tardos, "Approximation Algorithms for Scheduling Unrelated Parallel Machines,"
Math. Programming, vol. 46, pp. 259-271, 1990.- [14] E.V. Shchepin and N. Vakhania, "An Optimal Rounding Gives a Better Approximation for Scheduling Unrelated Machines,"
Operations Research Letters, vol. 33, pp. 127-133, 2005.- [15] A. Nahapetian, S. Ghiasi, and M. Sarrafzadeh, "Scheduling on Heterogeneous Resources with Heterogeneous Reconfiguration Costs,"
Proc. Fifth IASTED Int'l Conf. Parallel and Distributed Computing and Systems, pp. 916-921, 2003.- [16] I. Al-Azzoni and D.G. Down, "Linear Programming-Based Affinity Scheduling of Independent Tasks on Heterogeneous Computing Systems,"
IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 12, pp. 1671-1682, Dec. 2008.- [17] H. Topcuoglu, S. Hariri, and M.-Y. Wu, "Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing,"
IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 3, pp. 260-274, Mar. 2002.- [18] R. Blumofe and C. Leiserson, "Scheduling Multithreaded Computations by Work Stealing,"
Proc. IEEE Symp. Ann. Foundations of Computer Science, pp. 356-368, 1994.- [19] R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou, "Cilk: An Efficient Multithreaded Runtime System,"
Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, vol. 30, pp. 207-216, 1995.- [20] D. Kim, J.-P. Heo, J. Huh, J. Kim, and S.-E. Yoon, "HPCCD: Hybrid Parallel Continuous Collision Detection,"
Computer Graphics Forum, vol. 28, no. 7, 2009.- [21] E. Hermann, B. Raffin, F. Faure, T. Gautier, and J. Allard, "Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations,"
Proc. 16th Int'l Euro-Par Parallel Processing, pp. 235-246, 2010.- [22] S.F. Hummel, J. Schmidt, R.N. Uma, and J. Wein, "Load-Sharing in Heterogeneous Systems via Weighted Factoring,"
Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 318-328, 1996.- [23] V. Janjic and K. Hammond, "Granularity-Aware Work-Stealing for Computationally-Uniform Grids,"
Proc. IEEE/ACM Int'l Conf. Cluster, Cloud and Grid Computing (CCGrid), pp. 123-134, 2010.- [24] S. Hong and H. Kim, "An Integrated GPU Power and Performance Model,"
ACM SIGARCH Computer Architecture News, vol. 38, pp. 280-289, 2010.- [25] Y. Zhang and J. Owens, "A Quantitative Performance Analysis Model for GPU Architectures,"
Proc. IEEE 17th Int'l Symp. High Performance Computer Architecture (HPCA), pp. 382-393, 2011.- [26] A. Binotto, C. Pereira, and D. Fellner, "Towards Dynamic Reconfigurable Load-Balancing for Hybrid Desktop Platforms,"
Proc. IEEE Int'l Symp. Parallel and Distributed Processing, pp. 1-4, 2010.- [27] M.S. Smith, "Performance Analysis of Hybrid CPU/GPU Environments," Master Thesis, Portland State Univ., 2010.
- [28] M. Tang, D. Manocha, and R. Tong, "Multi-Core Collision Detection Between Deformable Models,"
Proc. SIAM/ACM Joint Conf. Geometric and Solid and Physical Modeling, pp. 355-360, 2009.- [29] C. Lauterbach, Q. Mo, and D. Manocha, "gProximity: Hierarchical GPU-Based Operations for Collision and Distance Queries,"
Computer Graphics Forum, vol. 29, pp. 419-428, 2010.- [30] B. Budge, T. Bernardin, J.A. Stuart, S. Sengupta, K.I. Joy, and J.D. Owens, "Out-of-Core Data Management for Path Tracing on Hybrid Resources,"
Computer Graphics Forum, vol. 28, no. 2, pp. 385-396, 2009.- [31] K. Karmarkar, "A New Polynomial-Time Algorithm for Linear Programming,"
Combinatorica, vol. 4, no. 4, pp. 373-395, 1984.- [32] C. Papadimitriou and K. Steiglitz,
Combinatorial Optimization: Algorithms and Complexity. Dover Publications, 1998.- [33] S. Boyd and L. Vandenberghe,
Convex Optimization. Cambridge Univ. Press, 2004.- [34] S. Yoon, S. Curtis, and D. Manocha, "Ray Tracing Dynamic Scenes Using Selective Restructuring,"
Proc. Eurographics Symp. Rendering, pp. 73-84, 2007.- [35] L. Dagum and R. Menon, "OpenMP: An Industry Standard API for Shared-Memory Programming,"
IEEE Computational Science and Eng., vol. 5, no. 1, pp. 46-55, Jan. 1998.- [36] S.M. LaValle,
Planning Algorithms. Cambridge Univ. Press, 2006.- [37] P. Shirley and R.K. Morley,
Realistic Ray Tracing, second ed. AK Peters, 2003.- [38] K. Ravichandran, S. Lee, and S. Pande, "Work Stealing for Multi-Core HPC Clusters,"
Proc. 17th Int'l Conf. Euro-Par Parallel Processing, pp. 205-217, 2011.- [39] Y. Lee and Y.J. Kim, "Simple and Parallel Proximity Algorithms for General Polygonal Models,"
Computer Animation and Virtual Worlds, vol. 21, nos. 3/4, pp. 365-374, 2010. |