Search For:

Displaying 1-12 out of 12 total
The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications
Found in: IEEE Transactions on Parallel and Distributed Systems
By Eddy Zheng Zhang,Yunlian Jiang,Xipeng Shen
Issue Date:February 2012
pp. 367-374
Cache sharing on modern Chip Multiprocessors (CMPs) reduces communication latency among corunning threads, and also causes interthread cache contention. Most previous studies on the influence of cache sharing have concentrated on the design or management o...
 
The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions
Found in: IEEE Transactions on Parallel and Distributed Systems
By Yunlian Jiang, Kai Tian, Xipeng Shen, Jinghe Zhang, Jie Chen, Rahul Tripathi
Issue Date:July 2011
pp. 1192-1205
In Chip Multiprocessors (CMPs) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs t...
 
Speculation with Little Wasting: Saving Cost in Software Speculation through Transparent Learning
Found in: Parallel and Distributed Systems, International Conference on
By Yunlian Jiang, Feng Mao, Xipeng Shen
Issue Date:December 2009
pp. 543-550
Software speculation has shown promise in parallelizing programs with coarse-grained dynamic parallelism. However, most speculation systems use offline profiling for the selection of speculative regions. The mismatch with the input-sensitivity of dynamic p...
 
Adaptive Software Speculation for Enhancing the Cost-Efficiency of Behavior-Oriented Parallelization
Found in: Parallel Processing, International Conference on
By Yunlian Jiang, Xipeng Shen
Issue Date:September 2008
pp. 270-278
Recently, software speculation has shown promising results in parallelizing complex sequential programs by exploiting dynamic high-level parallelism. The speculation however is cost-inefficient. Failed speculations may cause unnecessary shared resource con...
 
Adaptive speculation in behavior-oriented parallelization
Found in: Parallel and Distributed Processing Symposium, International
By Yunlian Jiang, Xipeng Shen
Issue Date:April 2008
pp. 1-5
Behavior-oriented parallelization is a technique for parallelizing complex sequential programs that have dynamic parallelism. Although the technique shows promising results, the software speculation mechanism it uses is not cost-efficient. Failed speculati...
 
An Effective Scheduling Algorithm for Homogeneous System
Found in: Grid and Cloud Computing, International Conference on
By Yipeng Zhou, Guangzhong Sun, Yunlian Jiang, Yinlong Xu
Issue Date:October 2006
pp. 71-77
Efficient application scheduling is critical for achieving high performance in homogeneous computing environment. The application scheduling problem has been shown to be NP-complete. However, because of its key importance, this problem has been extensively...
 
Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU
Found in: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13)
By Bo Wu, Eddy Zheng Zhang, Xipeng Shen, Yunlian Jiang, Zhijia Zhao
Issue Date:February 2013
pp. 57-68
The performance of Graphic Processing Units (GPU) is sensitive to irregular memory references. Some recent work shows the promise of data reorganization for eliminating non-coalesced memory accesses that are caused by irregular references. However, all pre...
     
Exploiting inter-sequence correlations for program behavior prediction
Found in: Proceedings of the ACM international conference on Object oriented programming systems languages and applications (OOPSLA '12)
By Bo Wu, Raul Silvera, Xipeng Shen, Yaoqing Gao, Yunlian Jiang, Zhijia Zhao
Issue Date:October 2012
pp. 851-866
Prediction of program dynamic behaviors is fundamental to program optimizations, resource management, and architecture reconfigurations. Most existing predictors are based on locality of program behaviors, subject to some inherent limitations. In this pape...
     
An input-centric paradigm for program dynamic optimizations
Found in: Proceedings of the ACM international conference on Object oriented programming systems languages and applications (OOPSLA '10)
By Eddy Z. Zhang, Kai Tian, Xipeng Shen, Yunlian Jiang
Issue Date:October 2010
pp. 125-139
Accurately predicting program behaviors (e.g., locality, dependency, method calling frequency) is fundamental for program optimizations and runtime adaptations. Despite decades of remarkable progress, prior studies have not systematically exploited program...
     
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Found in: Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel computing (PPoPP '10)
By Eddy Z. Zhang, Xipeng Shen, Yunlian Jiang
Issue Date:January 2010
pp. 203-212
Most modern Chip Multiprocessors (CMP) feature shared cache on chip. For multithreaded applications, the sharing reduces communication latency among co-running threads, but also results in cache contention. A number of studies have examined the influence o...
     
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors
Found in: Proceedings of the 6th ACM conference on Computing frontiers (CF '09)
By Kai Tian, Xipeng Shen, Yunlian Jiang
Issue Date:May 2009
pp. 227-227
Cache sharing in Chip Multiprocessors brings cache contention among corunning processes, which often causes considerable degradation of program performance and system fairness. Recent studies have seen the effectiveness of job co-scheduling in alleviating ...
     
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Found in: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT '08)
By Jie Chen, Rahul Tripathi, Xipeng Shen, Yunlian Jiang
Issue Date:October 2008
pp. 133-133
Cache sharing among processors is important for Chip Multiprocessors to reduce inter-thread latency, but also brings cache contention, degrading program performance considerably. Recent studies have shown that job co-scheduling can effectively alleviate th...
     
 1