|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib ., Khubaib Khubaib, Yale Patt, Yale N. Patt, "Data Marshaling for Multicore Systems," IEEE Micro, vol. 31, no. 1, pp. 56-64, January/February, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/MM.2010.105, author = {M. Aater Suleman and Onur Mutlu and Jose A. Joao and Khubaib . and Khubaib Khubaib and Yale Patt and Yale N. Patt}, title = {Data Marshaling for Multicore Systems}, journal ={IEEE Micro}, volume = {31}, number = {1}, issn = {0272-1732}, year = {2011}, pages = {56-64}, doi = {http://doi.ieeecomputersociety.org/10.1109/MM.2010.105}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - MGZN JO - IEEE Micro TI - Data Marshaling for Multicore Systems IS - 1 SN - 0272-1732 SP56 EP64 EPD - 56-64 A1 - M. Aater Suleman, A1 - Onur Mutlu, A1 - Jose A. Joao, A1 - Khubaib ., A1 - Khubaib Khubaib, A1 - Yale Patt, A1 - Yale N. Patt, PY - 2011 KW - Staged execution KW - critical sections KW - pipelining KW - CMP KW - multicore KW - pipeline parallelism KW - parallel programming KW - communication misses KW - heterogeneous multicore KW - remote execution VL - 31 JA - IEEE Micro ER - | |||
Dividing a program into segments and executing each segment at the core best suited to run it can improve performance and save power. When consecutive segments run on different cores, accesses to intersegment data incur cache misses. Data Marshaling eliminates such cache misses by identifying and marshaling the necessary intersegment data when a segment is shipped to a remote core.
1. M.A. Suleman et al., "Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures," Proc. 14th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (Asplos 09), ACM Press, 2009, pp. 253-264.
2. K. Chakraborty, P.M. Wells, and G.S. Sohi, "Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-The-Fly," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (Asplos 06), ACM Press, 2006, pp. 283-292.
3. R.D. Blumofe et al., "Cilk: An Efficient Multithreaded Runtime System," Proc. 5th ACM Sigplan Symp. Principles and Practice of Parallel Programming, ACM Press, 1995, pp. 201-216.
4. "Grand Central Dispatch," tech. brief, Apple, 2009; http://images.apple.com/macosx/technology/ docsGrandCentral_TB_brief_20090903.pdf .
5. S. Boyd-Wickizer, R. Morris, and M.F. Kaashoek, "Reinventing Scheduling for Multicore Systems," Proc. 12th Conf. Hot Topics in Operating Systems, Usenix Assoc., 2009, no. 21;www.usenix.org/event/hotos09/tech/full_papers/ boyd-wickizerboyd-wickizer.pdf.
6. S. Harizopoulos and A. Ailamaki, "StagedDB: Designing Database Servers for Modern Hardware," IEEE Data Eng. Bull., vol. 28, no. 2, 2005, pp. 11-16.
7. W. Thies, M. Karczmarek, and S.P. Amarasinghe, "Streamit: A Language for Streaming Applications," Proc. 11th Int'l Conf. Compiler Construction (CC 02), LNCS 2304, Springer, 2002, pp. 179-196.
8. M. Annavaram, E. Grochowski, and J. Shen, "Mitigating Amdahl's Law through EPI Throttling," Proc. 32nd Ann. Int'l Symp. Computer Architecture (ISCA 05), IEEE CS Press, 2005, pp. 298-309.
9. M.A. Suleman et al., "Data Marshaling for Multi-Core Architectures," Proc. 37th Ann. Int'l Symp. Computer Architecture (ISCA 10), ACM Press, 2010, pp. 441-450.
1. N.P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," Proc. 17th Ann. Int'l Symp. Computer Architecture (ISCA 90), ACM Press, 1990, pp. 364-373.
2. E. Ebrahimi, O. Mutlu, and Y.N. Patt, "Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems," Proc. IEEE 15th Int'l Symp. High-Performance Computer Architecture (HPCA 09), IEEE Press, 2009, pp. 7-17.
3. S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," Proc. IEEE 13th Int'l Symp. High-Performance Computer Architecture (HPCA 07), IEEE Press, 2007, pp. 63-74.
4. C.-L. Yang and A.R. Lebeck, "Push vs. Pull: Data Movement for Linked Data Structures," Proc. 14th Int'l Conf. Supercomputing (ICS 00), ACM Press, 2000, pp. 176-186.
5. H. Hossain, S. Dwarkadas, and M.C. Huang, "DDCache: Decoupled and Delegable Cache Data and Metadata," Proc. 18th Int'l Conf. Parallel Architectures and Compilation Techniques, IEEE Press, 2009, pp. 227-236, doi:10.1109/ PACT.2009.24.
6. P. Trancoso and J. Torrellas, "The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding," Proc. 1996 Int'l Conf. Parallel Processing, vol. 3, IEEE Press, pp. 79-86, doi:10.1109/ICPP.1996.538562.
7. P. Ranganathan et al., "The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems," Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA 97), ACM Press, 1997, pp. 144-156.
8. A.D. Birrell and B.J. Nelson, "Implementing Remote Procedure Calls," ACM Trans. Computer Systems, vol. 2, no. 1, 1984, pp. 39-59.

