This Article 
 Bibliographic References 
 Add to: 
Data Marshaling for Multicore Systems
January/February 2011 (vol. 31 no. 1)
pp. 56-64
M. Aater Suleman, University of Texas at Austin
Onur Mutlu, Carnegie Mellon University
Jose A. Joao, University of Texas at Austin
Khubaib Khubaib, University of Texas at Austin
Yale Patt, The University of Texas at Austin, Austin
Yale N. Patt, University of Texas at Austin

Dividing a program into segments and executing each segment at the core best suited to run it can improve performance and save power. When consecutive segments run on different cores, accesses to intersegment data incur cache misses. Data Marshaling eliminates such cache misses by identifying and marshaling the necessary intersegment data when a segment is shipped to a remote core.

1. M.A. Suleman et al., "Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures," Proc. 14th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (Asplos 09), ACM Press, 2009, pp. 253-264.
2. K. Chakraborty, P.M. Wells, and G.S. Sohi, "Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-The-Fly," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (Asplos 06), ACM Press, 2006, pp. 283-292.
3. R.D. Blumofe et al., "Cilk: An Efficient Multithreaded Runtime System," Proc. 5th ACM Sigplan Symp. Principles and Practice of Parallel Programming, ACM Press, 1995, pp. 201-216.
4. "Grand Central Dispatch," tech. brief, Apple, 2009; docsGrandCentral_TB_brief_20090903.pdf .
5. S. Boyd-Wickizer, R. Morris, and M.F. Kaashoek, "Reinventing Scheduling for Multicore Systems," Proc. 12th Conf. Hot Topics in Operating Systems, Usenix Assoc., 2009, no. 21; boyd-wickizerboyd-wickizer.pdf.
6. S. Harizopoulos and A. Ailamaki, "StagedDB: Designing Database Servers for Modern Hardware," IEEE Data Eng. Bull., vol. 28, no. 2, 2005, pp. 11-16.
7. W. Thies, M. Karczmarek, and S.P. Amarasinghe, "Streamit: A Language for Streaming Applications," Proc. 11th Int'l Conf. Compiler Construction (CC 02), LNCS 2304, Springer, 2002, pp. 179-196.
8. M. Annavaram, E. Grochowski, and J. Shen, "Mitigating Amdahl's Law through EPI Throttling," Proc. 32nd Ann. Int'l Symp. Computer Architecture (ISCA 05), IEEE CS Press, 2005, pp. 298-309.
9. M.A. Suleman et al., "Data Marshaling for Multi-Core Architectures," Proc. 37th Ann. Int'l Symp. Computer Architecture (ISCA 10), ACM Press, 2010, pp. 441-450.
1. N.P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," Proc. 17th Ann. Int'l Symp. Computer Architecture (ISCA 90), ACM Press, 1990, pp. 364-373.
2. E. Ebrahimi, O. Mutlu, and Y.N. Patt, "Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems," Proc. IEEE 15th Int'l Symp. High-Performance Computer Architecture (HPCA 09), IEEE Press, 2009, pp. 7-17.
3. S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," Proc. IEEE 13th Int'l Symp. High-Performance Computer Architecture (HPCA 07), IEEE Press, 2007, pp. 63-74.
4. C.-L. Yang and A.R. Lebeck, "Push vs. Pull: Data Movement for Linked Data Structures," Proc. 14th Int'l Conf. Supercomputing (ICS 00), ACM Press, 2000, pp. 176-186.
5. H. Hossain, S. Dwarkadas, and M.C. Huang, "DDCache: Decoupled and Delegable Cache Data and Metadata," Proc. 18th Int'l Conf. Parallel Architectures and Compilation Techniques, IEEE Press, 2009, pp. 227-236, doi:10.1109/ PACT.2009.24.
6. P. Trancoso and J. Torrellas, "The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding," Proc. 1996 Int'l Conf. Parallel Processing, vol. 3, IEEE Press, pp. 79-86, doi:10.1109/ICPP.1996.538562.
7. P. Ranganathan et al., "The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems," Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA 97), ACM Press, 1997, pp. 144-156.
8. A.D. Birrell and B.J. Nelson, "Implementing Remote Procedure Calls," ACM Trans. Computer Systems, vol. 2, no. 1, 1984, pp. 39-59.

Index Terms:
Staged execution, critical sections, pipelining, CMP, multicore, pipeline parallelism, parallel programming, communication misses, heterogeneous multicore, remote execution
M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib ., Khubaib Khubaib, Yale Patt, Yale N. Patt, "Data Marshaling for Multicore Systems," IEEE Micro, vol. 31, no. 1, pp. 56-64, Jan.-Feb. 2011, doi:10.1109/MM.2010.105
Usage of this product signifies your acceptance of the Terms of Use.