The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - Feb. (2013 vol.62)
pp: 336-350
Yi Wang , Hong Kong Polytechnic Univeristy, Kowloon
Duo Liu , Hong Kong Polytechnic Univeristy, Kowloon
Zhiwei Qin , Hong Kong Polytechnic Univeristy, Kowloon
Zili Shao , Hong Kong Polytechnic Univeristy, Kowloon
ABSTRACT
This paper aims to totally remove intercore communication overhead with joint computation and communication task scheduling for streaming applications on Multiprocessor System-on-Chips (MPSoCs). Our basic idea is to let some computation and communication tasks be executed in earlier periods (the added periods are called the prologue) such that intercore data transfer can be finished before the execution of the tasks that need the data to start. In particular, we solve the following problem: how to do rescheduling in such a way that the schedule length can be minimized with the minimum prologue length (the number of periods in the prologue) while the intercore communication overhead can be totally removed? To solve this problem, we first perform schedulability analysis and obtain the upper bound of the times needed to reschedule each computation task. Then we formulate the problem as an Integer Linear Programming (ILP) formulation and obtain an optimal solution. We evaluate our technique with a set of benchmarks from both real-life streaming applications and synthetic task graphs. The experimental results show that our technique can achieve significant reductions in schedule length and energy consumption compared with the previous work.
INDEX TERMS
Schedules, Computer architecture, Processor scheduling, Energy consumption, Upper bound, Computational modeling, Joints, MPSoC, Task scheduling, intercore communication, retiming, streaming applications
CITATION
Yi Wang, Duo Liu, Zhiwei Qin, Zili Shao, "Optimally Removing Intercore Communication Overhead for Streaming Applications on MPSoCs", IEEE Transactions on Computers, vol.62, no. 2, pp. 336-350, Feb. 2013, doi:10.1109/TC.2011.236
REFERENCES
[1] Y. Wang, D. Liu, M. Wang, Z. Qin, and Z. Shao, “Optimal Task Scheduling by Removing Inter-Core Communication Overhead for Streaming Applications on MPSoC,” Proc. IEEE 16th Real Time and Embedded Technology and Applications Symp. (RTAS '10), pp. 195-204, 2010.
[2] R. Xu, R. Melhem, and D. Mosse, “Energy-Aware Scheduling for Streaming Applications on Chip Multiprocessors,” Proc. IEEE 28th Int'l Real-Time Systems Symp. (RTSS '07), pp. 25-38, 2007.
[3] Y.-H. Lin, C. Tu, C.-S. Shih, and S.-H. Hung, “Zero-buffer Inter-Core Process Communication Protocol for Heterogeneous Multi-Core Platforms,” Proc. IEEE 15th Int'l Conf. Embedded and Real-Time Computing Systems and Applications (RTCSA '09), pp. 69-78, 2009.
[4] S.K. Baruah, L.E. Rosier, and R.R. Howell, “Algorithms and Complexity Concerning the Preemptive Scheduling of Periodic, Real-Time Tasks on One Processor,” J. Real-Time Systems, vol. 2, no. 4, pp. 301-324, 1990.
[5] J.-J. Chen and T.-W. Kuo, “Energy-Efficient Scheduling of Periodic Real-Time Tasks over Homogeneous Multiprocessors,” Proc. Second Int'l Workshop Power-Aware Real-Time Computing (PARC '05), pp. 30-35, 2005.
[6] C.-M. Hung, J.-J. Chen, and T.-W. Kuo, “Energy-Efficient Real-Time Task Scheduling for a DVS System with a Non-DVS Processing Element,” Proc. IEEE 27th Int'l Real-Time Systems Symp. (RTSS '06), pp. 303-312, 2006.
[7] S. Acharya and R. Mahapatra, “A Dynamic Slack Management Technique for Real-Time Distributed Embedded Systems,” IEEE Trans. Computers, vol. 57, no. 2, pp. 215-230, Feb. 2008.
[8] Y.-S. Chen, C.-S. Shih, and T.-W. Kuo, “Dynamic Task Scheduling and Processing Element Allocation for Multi-function SoCs,” Proc. IEEE 13th Real Time and Embedded Technology and Applications Symp. (RTAS '07), pp. 81-90, 2007.
[9] C. Liu and J. Anderson, “Scheduling Suspendable, Pipelined Tasks with Non-Preemptive Sections in Soft Real-Time Multiprocessor Systems,” Proc. IEEE 16th Real Time and Embedded Technology and Applications Symp. (RTAS '10), pp. 23-32, 2010.
[10] Y. Zhang, X.S. Hu, and D.Z. Chen, “Task Scheduling and Voltage Selection for Energy Minimization,” Proc. 39th Ann. Design Automation Conf. (DAC '02), pp. 183-188, 2002.
[11] D. Zhu, R. Melhem, and B.R. Childers, “Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems,” IEEE Trans. Parallel Distributed Systems, vol. 14, no. 7, pp. 686-700, July 2003.
[12] E. Dolif, M. Lombardi, M. Ruggiero, M. Milano, and L. Benini, “Communication-Aware Stochastic Allocation and Scheduling Framework for Conditional Task Graphs in Multi-Processor Systems-on-Chip,” Proc. Seventh ACM & IEEE Int'l Conf. Embedded Software (EMSOFT '07), pp. 47-56, 2007.
[13] A.G. Luca Benini, D. Bertozzi, and M. Milano, “Allocation, Scheduling and Voltage Scaling on Energy Aware MPSoCs,” Proc. Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pp. 44-58, 2006.
[14] M. Ruggiero, A. Guerri, D. Bertozzi, F. Poletti, and M. Milano, “Communication-Aware Allocation and Scheduling Framework for Stream-Oriented Multi-Processor Systems-on-Chip,” Proc. Conf. Design, Automation and Test in Europe (DATE '06), pp. 3-8, 2006.
[15] J. Hu and R. Marculescu, “Energy-Aware Communication and Task Scheduling for Network-on-Chip Architectures Under Real-Time Constraints,” Proc. Conf. Design, Automation and Test in Europe (DATE '04), pp. 234-239, 2004.
[16] C.-L. Chou and R. Marculescu, “User-Aware Dynamic Task Allocation in Networks-on-Chip,” Proc. Conf. Design, Automation and Test in Europe (DATE '08), pp. 1232-1237, 2008.
[17] S. DaeHo and M. Thottethodi, “Disjoint-Path Routing: Efficient Communication for Streaming Applications,” Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '09), pp. 1-12, 2009.
[18] J. Zhu, I. Sander, and A. Jantsch, “Energy Efficient Streaming Applications with Guaranteed Throughput on Mpsocs,” Proc. Eighth ACM Int'l Conf. Embedded Software (EMSOFT '08), pp. 119-128, 2008.
[19] Q. Zhu and G. Agrawal, “Resource Allocation for Distributed Streaming Applications,” Proc. 37th Int'l Conf. Parallel Processing (ICPP '08), pp. 414-421, 2008.
[20] C.Q. Xu, C.J. Xue, B.C. Hu, and E.H.M. Sha, “Computation and Data Transfer Co-Scheduling for Interconnection Bus Minimization,” Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC '09), pp. 311-316, 2009.
[21] P.-C. Hsiu, D.-N. Lee, and T.-W. Kuo, “Multi-Layer Bus Optimization for Real-Time Task Scheduling with Chain-Based Precedence Constraints,” Proc. IEEE 30th Real-Time Systems Symp. (RTSS '09), pp. 479-488, 2009.
[22] Z. Gu, X. He, and M. Yuan, “Optimization of Static Task and Bus Access Schedules for Time-Triggered Distributed Embedded Systems with Model-Checking,” Proc. 44th Ann. Design Automation Conf. (DAC '07), pp. 294-299, 2007.
[23] J. Rosen, A. Andrei, P. Eles, and Z. Peng, “Bus Access Optimization for Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip,” Proc. IEEE 28th Int'l Real-Time Systems Symp. (RTSS '07), pp. 49-60, 2007.
[24] S. Gopalakrishnan, L. Sha, and M. Caccamo, “Hard Real-Time Communication in Bus-Based Networks,” Proc. IEEE 25th Int'l Real-Time Systems Symp. (RTSS '04), pp. 405-414, 2004.
[25] A. Acquaviva, A. Alimonda, S. Carta, and M. Pittau, “Assessing Task Migration Impact on Embedded Soft Real-Time Streaming Multimedia Applications,” EURASIP J. Embedded Systems, vol. 2008, pp. 1-15, 2008.
[26] M. Kim, S. Banerjee, N. Dutt, and N. Venkatasubramanian, “Design Space Exploration of Real-Time Multi-media Mpsocs with Heterogeneous Scheduling Policies,” Proc. Fourth Int'l Conf. Hardware/Software Codesign and System Synthesis (CODES+ISSS '06), pp. 16-21, 2006.
[27] O. Ozturk, M. Kandemir, S.W. Son, and M. Karakoy, “Selective Code/Data Migration for Reducing Communication Energy in Embedded Mpsoc Architectures,” Proc. 16th ACM Great Lakes Symp. VLSI (GLSVLSI '06), pp. 386-391, 2006.
[28] I. Assayad and S. Yovine, “A Scheduler Synthesis Methodology for Joint HW/SW Design Exploration of SoC,” Design Automation for Embedded Systems, vol. 14, no. 2, pp. 75-103, 2010.
[29] P. Pop, P. Eles, Z. Peng, and T. Pop, “Analysis and Optimization of Distributed Real-Time Embedded Systems,” Proc. 41st Ann. Conf. Design Automation (DAC '04), pp. 593-625, 2004.
[30] T. Pop, P. Eles, and Z. Peng, “Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems,” Proc. First IEEE/ACM/IFIP Int'l Conf. Hardware/Software Codesign and System Synthesis (CODES+ISSS '03), pp. 83-89, 2003.
[31] S. Chakraborty and L. Thiele, “A New Task Model for Streaming Applications and Its Schedulability Analysis,” Proc. Conf. Design, Automation and Test in Europe (DATE '05), pp. 486-491, 2005.
[32] T.-Y. Yen and W. Wolf, “Communication Synthesis for Distributed Embedded Systems,” Proc. IEEE/ACM Int'l Conf. Computer-Aided Design (ICCAD '95), pp. 288-294, 1995.
[33] D.-T. Peng and K. Shin, “Static Allocation of Periodic Tasks with Precedence Constraints in Distributed Real-Time Systems,” Proc. Ninth Int'l Conf. Distributed Computing Systems (ICDCS '89), pp. 190 -198, June 1989.
[34] S. Prakash and A.C. Parker, “Readings in Hardware/Software Co-Design,” SOS: Synthesis of Application-Specific Heterogeneous Multiprocessor Systems, G. De Micheli, R. Ernst, and W. Wolf, eds. Kluwer Academic Publishers, pp. 324-337, 2002.
[35] P.V. Knudsen and J. Madsen, “Integrating Communication Protocol Selection with Partitioning in Hardware/Software Codesign,” Proc. 11th Int'l Symp. System Synthesis (ISSS '98), pp. 111-116, 1998.
[36] F.-M. Renner, J. Becker, and M. Glesner, “Automated Communication Synthesis for Architecture-Precise Rapid Prototyping of Real-Time Embedded System,” Proc. IEEE 11th Int'l Workshop Rapid System Prototyping (RSP '00), pp. 154-159, 2000.
[37] Y.-S. Cho, E.-J. Choi, and K.-R. Cho, “Modeling and Analysis of the System Bus Latency on the SoC Platform,” Proc. Int'l Workshop System-Level Interconnect Prediction (SLIP '06), pp. 67-74, 2006.
[38] R. Pellizzoni, A. Schranzhofer, J.-J. Chen, M. Caccamo, and L. Thiele, “Worst Case Delay Analysis for Memory Interference in Multicore Systems,” Proc. Conf. Design, Automation and Test in Europe (DATE '10), pp. 741-746, 2010.
[39] P. Eles, A. Doboli, P. Pop, and Z. Peng, “Scheduling with Bus Access Optimization for Distributed Embedded Systems,” IEEE Trans. Very Large Scale Integration Systems (VLSI), vol. 8, no. 5, pp. 472 -491, Oct. 2000.
[40] S. Stuijk, T. Basten, M.C.W. Geilen, and H. Corporaal, “Multiprocessor Resource Allocation for Throughput-constrained Synchronous Dataflow Graphs,” Proc. 44th Ann. Design Automation Conf. (DAC '07), pp. 777-782, 2007.
[41] G. Varatkar and R. Marculescu, “Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization,” Proc. IEEE/ACM Int'l Conf. Computer-Aided Design (ICCAD '03), pp. 510-517, 2003.
[42] S. Hua, G. Qu, and S.S. Bhattacharyya, “Energy-Efficient Embedded Software Implementation on Multiprocessor System-on-Chip with Multiple Voltages,” ACM Trans. Embeddded Computing Systems, vol. 5, no. 2, pp. 321-341, 2006.
[43] N.K. Bambha and S.S. Bhattacharyya, “A Joint Power/Performance Optimization Algorithm for Multiprocessor Systems Using a Period Graph Construct,” Proc. 13th Int'l Symp. System Synthesis (ISSS '00), pp. 91-97, 2000.
[44] J. Luo and N.K. Jha, “Power-Conscious Joint Scheduling of Periodic Task Graphs and Aperiodic Tasks in Distributed Real-Time Embedded Systems,” Proc. IEEE/ACM Int'l Conf. Computer-Aided Design (ICCAD '00), pp. 357-364, 2000.
[45] G. Quan and X.S. Hu, “Energy Efficient DVS Schedule for Fixed-Priority Real-Time Systems,” ACM Trans. Embedded Computing Systems, vol. 6, no. 4, p. 29, 2007.
[46] G. Quan, L. Niu, X.S. Hu, and B. Mochocki, “Fixed Priority Scheduling for Reducing Overall Energy on Variable Voltage Processors,” Proc. IEEE 25th Int'l Real-Time Systems Symp. (RTSS '04), pp. 309-318, 2004.
[47] K. Srinivasan and K.S. Chatha, “Integer Linear Programming and Heuristic Techniques for System-Level Low Power Scheduling on Multiprocessor Architectures under Throughput Constraints,” Integrated VLSI J., vol. 40, no. 3, pp. 326-354, 2007.
[48] C.-Y. Yang, J.-J. Chen, and T.-W. Kuo, “Energy-efficiency for Multiframe Real-Time Tasks on a Dynamic Voltage Scaling Processor,” Proc. IEEE/ACM Seventh Int'l Conf. Hardware/Software Codesign and System Synthesis (CODES+ISSS '09), pp. 211-220, 2009.
[49] C.-Y. Yang, J.-J. Chen, T.-W. Kuo, and L. Thiele, “An Approximation Scheme for Energy-Efficient Scheduling of Real-Time Tasks in Heterogeneous Multiprocessor Systems,” Proc. Conf. Design, Automation and Test in Europe (DATE '09), pp. 694-699, 2009.
[50] V. Kianzad, S.S. Bhattacharyya, and G. Qu, “Casper: An Integrated Energy-Driven Approach for Task Graph Scheduling on Distributed Embedded Systems,” Proc. IEEE Int'l Conf. Application-Specific Systems, Architecture Processors (ASAP '05), pp. 191-197, 2005.
[51] ARM, “ARM11 MPCore Microarchitecture,” http://www.arm. com/products/CPUsARM11MPCoreMultiprocessor.html , 2010.
[52] K.S. Vallerio and N.K. Jha, “Task Graph Extraction for Embedded System Synthesis,” Proc. 16th Int'l Conf. VLSI Design (VLSID '03), pp. 480-486, 2003.
[53] R. Dick, D. Rhodes, and W. Wolf, “TGFF: Task Graphs for Free,” Proc. Sixth Int'l Workshop Hardware/Software Codesign (CODES '98), pp. 97-101, 1998.
[54] A. Mahalanobis, B.V.K.V. Kumar, and S.R.F. Sims, “Distance-Classifier Correlation Filters for Multiclass Target Recognition,” Applied Optics, vol. 35, no. 17, pp. 3127-3133, 1996.
[55] N. Kim, M. Ryu, S. Hong, M. Saksena, C.-H. Choi, and H. Shin, “Visual Assessment of a Real-Time System Design: A Case Study on a CNC Controller,” Proc. IEEE 17th Real-Time Systems Symp. (RTSS '96), pp. 300-310, 1996.
[56] F. Sun, S. Ravi, A. Raghunathan, and N.K. Jha, “Synthesis of Application-Specific Heterogeneous Multiprocessor Architectures Using Extensible Processors,” Proc. 18th Int'l Conf. VLSI Design (VLSID '05), pp. 551-556, 2005.
[57] C. Yang and A. Orailoglu, “Towards No-Cost Adaptive Mpsoc Static Schedules through Exploitation of Logical-to-Physical Core Mapping Latitude,” Proc. Conf. Design, Automation and Test in Europe (DATE '09), pp. 63-69, 2009.
[58] C.E. Leiserson and J.B. Saxe, “Retiming Synchronous Circuitry,” Algorithmica, vol. 6, pp. 5-35, 1991.
[59] J.M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits, second ed. Prentice Hall, 2002.
[60] Freescale “i.MX35 Applications Processors,” http://www. freescale.com/webapp/sps/site taxonomy.jsp?code=IMX35_ FAMILY, 2010.
[61] “Free Software Foundation, Inc., Lp_solve 5.5,” http://lpsolve. sourceforge.net5.5/, 2010.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool