Search For:

Displaying 1-50 out of 87 total
A Theory for Software-Hardware Co-Scheduling for ASIPs and Embedded Processors
Found in: Application-Specific Systems, Architectures and Processors, IEEE International Conference on
By R. Govindarajan, Erik R. Altman, Guang R. Gao
Issue Date:July 2000
pp. 329
Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded processors. Existing techniques deal with either scheduling hardware pipelines to o...
 
Power-Performance Trade-Offs for Energy-Efficient Architectures: A Quantitative Study
Found in: Computer Design, International Conference on
By Hongbo Yang, R. Govindarajan, Guang R. Gao, Kevin B. Theobald
Issue Date:September 2002
pp. 174
The drastic increase in power consumption by modern processors emphasizes the need for power-performance trade-offs in architecture design space exploration and compiler optimizations. This paper reports a quantitative study on the power-performance trade-...
 
An Adaptive Meta-Clustering Approach: Combining the Information from Different Clustering Results
Found in: Computational Systems Bioinformatics Conference, International IEEE Computer Society
By Yujing Zeng, Jianshan Tang, Javier Garcia-Frias, Guang R. Gao
Issue Date:August 2002
pp. 276
With the development of microarray techniques, there is an increasing need of information processing methods to analyze the high throughput data. Clustering is one of the most promising candidates because of its simplicity, flexibility and robustness. Howe...
 
Automatically Partitioning Threads Based on Remote Paths
Found in: Parallel and Distributed Systems, International Conference on
By Xinan Tang, Guang R. Gao
Issue Date:December 1998
pp. 632
In order to program multithreaded architectures effectively compiler support to automatically partition programs into threads is essential. This paper proposes a remote-path based thread partitioning framework, which can generate low-level threads from pro...
 
A Framework for Resource-Constrained Rate-Optimal Software Pipelining
Found in: IEEE Transactions on Parallel and Distributed Systems
By R. Govindarajan, Erik R. Altman, Guang R. Gao
Issue Date:November 1996
pp. 1133-1149
<p><b>Abstract</b>—The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, w...
 
A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures
Found in: 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
By Elkin Garcia,Daniel Orozco,Robert Pavel,Guang R. Gao
Issue Date:May 2012
pp. 1591-1600
The recent evolution of many-core architectures has resulted in chips where the number of processor elements (PEs) are in the hundreds and continue to increase every day. In addition, many-core processors are more and more frequently characterized by the d...
 
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Hongbo Rong, Zhizhong Tang, R. Govindarajan, Alban Douillet, Guang R. Gao
Issue Date:March 2004
pp. 163
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. In this paper, we propose a three-step approach, called Single-dimension Software Pipelining (SSP), to software pipel...
 
Sustained Petaflop and Beyond: Can Parallel Computing Systems Meet The Challenges?
Found in: Parallel and Distributed Processing Symposium, International
By Guang R. Gao
Issue Date:April 2005
pp. 76
No summary available.
   
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices
Found in: 2013 Euromicro Conference on Digital System Design (DSD)
By Marco Solinas,Rosa M. Badia,Francois Bodin,Albert Cohen,Paraskevas Evripidou,Paolo Faraboschi,Bernhard Fechner,Guang R. Gao,Arne Garbade,Sylvain Girbal,Daniel Goodman,Behran Khan,Souad Koliai,Feng Li,Mikel Lujan,Laurent Morin,Avi Mendelson,Nacho Navarro,Antoniu Pop,Pedro Trancoso,Theo Ungerer,Mateo Valero,Sebastian Weis,Ian Watson,Stephane Zuckermann,Roberto Giorgi
Issue Date:September 2013
pp. 272-279
Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable systems with 1000+ general purpose cores per chip, probably by 2020. Three major challenges have...
 
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Hongbo Rong, Alban Douillet, R. Govindarajan, Guang R. Gao
Issue Date:March 2004
pp. 175
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to the outer loops. In a companion paper, we proposed a scheduling method, called Single-dimension Software Pipelining (SSP), to soft...
 
An Executable Analytical Performance Evaluation Approach for Early Performance Prediction
Found in: Parallel and Distributed Processing Symposium, International
By Adeline Jacquet, Vincent Janot, Clement Leung, Guang R. Gao, R. Govindarajan, Thomas L. Sterling
Issue Date:April 2003
pp. 268a
Percolation has recently been proposed as a key component of an advanced program execution model for future generation high-end machines featuring adaptive data/code transformation and movement for effective latency tolerance. An early evaluation of the pe...
 
Measurement and Modeling of EARTH-MANNA Multithreaded Architecture
Found in: Modeling, Analysis, and Simulation of Computer Systems, International Symposium on
By Shashank S. Nemawarkar, Guang R. Gao
Issue Date:February 1996
pp. 109
In this paper, we develop and apply an analytical model to analyze the performance of the EARTH-MANNA multithreaded multiprocessor system. The performance model is based on closed queuing networks. We develop heuristics to account for the realistic subsyst...
 
Co-Scheduling Hardware and Software Pipelines
Found in: High-Performance Computer Architecture, International Symposium on
By R. Govindarajan, Erik R. Altman, Guang R. Gao
Issue Date:February 1996
pp. 52
No summary available.
 
Automatic Locality Exploitation in the Codelet Model
Found in: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)
By Chen Chen, Yao Wu,Joshua Suetterlein, Long Zheng, Minyi Guo,Guang R. Gao
Issue Date:July 2013
pp. 853-862
State-of-the-art codelet scheduling focuses on dynamic workload balance of codelets (similar to tasks). While this approach may achieve reasonable performance since computation resources are fully utilized, it may not attain optimal energy savings. In this...
 
Automatic Program Segment Similarity Detection in Targeted Program Performance Improvement
Found in: Parallel and Distributed Processing Symposium, International
By Haiping Wu, Eunjung Park, Mihailo Kaplarevic, Yingping Zhang, Murat Bolat, Xiaoming Li, Guang R. Gao
Issue Date:March 2007
pp. 452
Targeted optimization of program segments can provide an additional program speedup over the highest default optimization level, such as -O3 in GCC. The key challenge is how to automatically search for performance sensitive program segments in a given code...
 
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures
Found in: IEEE Transactions on Computers
By R. Govindarajan, Hongbo Yang, José Nelson Amaral, Chihong Zhang, Guang R. Gao
Issue Date:January 2003
pp. 4-20
<p><b>Abstract</b>—In this paper, we address the problem of generating an optimal instruction sequence <tmath>S</tmath> for a Directed Acyclic Graph (DAG), where <tmath>S</tmath> is optimal in terms of the number o...
 
Next Generation System Software for Future High-End Computing Systems
Found in: Parallel and Distributed Processing Symposium, International
By Guang R. Gao, Kevin B. Theobald, Ziang Hu, Haiping Wu, Jizhu Lu, Thomas L. Sterling, Keshav Pingali, Paul Stodghill, Rick Stevens, Mark Hereld
Issue Date:April 2002
pp. 0175b
Future high-end computers will offer great performance improvements over today's machines, enabling applications of far greater complexity. However, designers must solve the challenge of exploiting massive parallelism efficiency in the face of very high la...
 
Programming Models and System Software for Future High-End Computing Systems: Work-in-Progress
Found in: Parallel and Distributed Processing Symposium, International
By Guang R. Gao, Kevin B. Theobald, R. Govindarajan, Clement Leung, Ziang Hu, Haiping Wu, Jizhu Lu, Juan del Cuvillo, Adeline Jacquet, Vincent Janot, Thomas L. Sterling
Issue Date:April 2003
pp. 206b
Future high-end computers which promise very high performance require sophisticated program execution models and languages in order to deal with very high latencies across the memory hierarchy and to exploit massive parallelism. This paper presents our pro...
 
Visualizing Biosequence Data Using Texture Mapping
Found in: Information Visualization, IEEE Symposium on
By Praveen R. Thiagarajan, Guang R. Gao
Issue Date:October 2002
pp. 103
Data-mining of information by the process of pattern discovery in protein sequences has been predominantly algorithm based. In this paper we discuss a visualization approach, which uses texture mapping and blending techniques to perform visual data-mining ...
 
Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures
Found in: IEEE Transactions on Parallel and Distributed Systems
By Haitao Wei,Junqing Yu,Huafei Yu,Mingkang Qin,Guang R. Gao
Issue Date:December 2012
pp. 2338-2350
Stream programming model has been productively applied to a number of important application domains. Software pipelining is an important code scheduling technique for stream programs. However, the multicore evolution has presented a new dimension of challe...
 
Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution Models
Found in: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
By Chen Chen,Yao Wu,Stephane Zuckerman,Guang R. Gao
Issue Date:May 2013
pp. 1607-1617
The code let model is a fine-grain dataflow-inspired program execution model that balances the parallelism and overhead of the runtime system. It plays an important role in terms of performance, scalability, and energy efficiency in exascale studies such a...
 
Software-Pipelining on Multi-Core Architectures
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Alban Douillet, Guang R. Gao
Issue Date:September 2007
pp. 39-48
It is becoming increasingly evident that multi-core chip architecture are emerging as a solution to efficiently amortizing the ever-growing number of transistors on a chip. However the success of such multi-core chips depends on the advances in system soft...
 
Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures
Found in: IEEE Transactions on Parallel and Distributed Systems
By Guangming Tan, Ninghui Sun, Guang R. Gao
Issue Date:February 2009
pp. 261-274
Dynamic programming (DP) is a popular technique which is used to solve combinatorial search and optimization problems. This paper focuses on one type of DP, which is called nonserial polyadic dynamic programming (NPDP). Owing to the nonuniform data depende...
 
Optimizing the Fast Fourier Transform on a Multi-core Architecture
Found in: Parallel and Distributed Processing Symposium, International
By Long Chen, Ziang Hu, Junmin Lin, Guang R. Gao
Issue Date:March 2007
pp. 449
The rapid revolution in microprocessor chip architecture due to multicore technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the parallelism potential due to such multi-...
 
Location Consistency-A New Memory Model and Cache Consistency Protocol
Found in: IEEE Transactions on Computers
By Guang R. Gao, Vivek Sarkar
Issue Date:August 2000
pp. 798-813
<p><b>Abstract</b>—Existing memory models and cache consistency protocols assume the <it>memory coherence</it> property which requires that all processors observe the same ordering of write operations to the same location. In ...
 
Source Code Partitioning in Program Optimization
Found in: Parallel and Distributed Systems, International Conference on
By Murat Bolat,Kirk Kelsey,Xiaoming Li,Guang R. Gao
Issue Date:December 2011
pp. 56-63
Program analysis and program optimization seek to improve program performance. There are optimization techniques which are applied to various scopes such as a source file, function or basic block. Inter-procedural program optimization techniques have the s...
 
Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems
Found in: Cluster Computing, IEEE International Conference on
By Long Chen,Oreste Villa,Guang R. Gao
Issue Date:September 2011
pp. 386-394
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. However, when using multiple GPUs concurrently, the conventional data parallel GPU programming paradigms, e.g., CUDA, cannot satisfactorily address certain issu...
 
TiNy threads on BlueGene/P: Exploring many-core parallelisms beyond The traditional OS
Found in: Parallel and Distributed Processing Workshops and PhD Forum, 2011 IEEE International Symposium on
By Handong Ye,Robert Pavel,Aaron Landwehr,Guang R. Gao
Issue Date:April 2010
pp. 1-8
Operating Systems have been considered as a cornerstone of the modern computer system, and the conventional operating system model targets computers designed around the sequential execution model. However, with the rapid progress of the multi-core/manycore...
 
Open64 compiler infrastructure for emerging multicore/manycore architecture All Symposium Tutorial
Found in: Parallel and Distributed Processing Symposium, International
By Sun C. Chan, Guang R. Gao, Barbara Chapman, Tony Linthicum, Anshuman Dasgupta
Issue Date:April 2008
pp. 1
Open64 was originally developed by SGI and released as the MIPSpro compiler. It has been well recognized as an industrial-strength production compiler for high-performance computing. It includes advanced inter-procedural optimizations, loop nest optimizati...
   
Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture
Found in: Parallel and Distributed Processing Symposium, International
By Ge Gan, Ziang Hu, Juan del Cuvillo, Guang R. Gao
Issue Date:March 2007
pp. 487
The IBM Cyclops-64 (C64) chip employs a multithreaded architecture that integrates a large number of hardware thread units on a single chip. A cellular super-computer is being developed based on a 3D-mesh connection of the C64 chips. This paper introduces ...
 
ParalleX: A Study of A New Parallel Computation Model
Found in: Parallel and Distributed Processing Symposium, International
By Guang R. Gao, Thomas Sterling, Rick Stevens, Mark Hereld, Weirong Zhu
Issue Date:March 2007
pp. 294
This paper proposes the study of a new computation model that attempts to address the underlying sources of performance degradation (e.g. latency, overhead, and starvation) and the difficulties of programmer productivity (e.g. explicit locality management ...
 
On the Role of Deterministic Fine-Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era
Found in: Parallel and Distributed Processing Symposium, International
By Weirong Zhu, Ziang Hu, Guang R. Gao
Issue Date:March 2007
pp. 489
The design of microprocessor chip for high-end computing systems is moving towards many-core architectures with 10s or 100+ processing units. An important class of the target applications for such architectures are scientific numerical computations, many o...
 
On the Role of Deterministic Fine-Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era
Found in: Parallel and Distributed Processing Symposium, International
By Weirong Zhu, Ziang Hu, Guang R. Gao
Issue Date:March 2007
pp. 490
The design of microprocessor chip for high-end computing systems is moving towards many-core architectures with 10s or 100+ processing units. An important class of the target applications for such architectures are scientific numerical computations, many o...
 
Experience of Optimizing FFT on Intel Architectures
Found in: Parallel and Distributed Processing Symposium, International
By Daniel Orozco, Liping Xue, Murat Bolat, Xiaoming Li, Guang R. Gao
Issue Date:March 2007
pp. 448
Automatic library generators, such as ATLAS [11], Spiral [8] and FFTW [2], are promising technologies to generate efficient code for different computer architectures. The library generators usually tune programs using two layers of optimizations: the searc...
 
Discriminating Transmembrane Proteins From Signal Peptides Using SVM-Fisher Approach
Found in: Machine Learning and Applications, Fourth International Conference on
By Robel Y. Kahsay, Guang R. Gao, Li Liao
Issue Date:December 2005
pp. 151-155
Most computational methods for transmembrane protein topology prediction rely on compositional bias of amino acids to locate those hydrophobic domains in transmembrane proteins. Because signal peptides also contain hydrophobic segments, these computational...
 
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
Found in: Parallel and Distributed Processing Symposium, International
By Juan del Cuvillo, Weirong Zhu, Ziang Hu, Guang R. Gao
Issue Date:April 2005
pp. 265b
This paper presents the design and implementation of a thread virtual machine, called TNT (or TiNy-Threads) for the IBM Cyclops64 architecture (the latest Cyclops architecture that employs a unique multiprocessor-on-a-chip design with a very large number o...
 
Performance Portability on EARTH: A Case Study across Several Parallel Architectures
Found in: Parallel and Distributed Processing Symposium, International
By Weirong Zhu, Yanwei Niu, Guang R. Gao
Issue Date:April 2005
pp. 268a
With the rapidly increasing diversity of parallel architectures and the increasing time and labor for developing parallel applications, the performance portability of parallel programs is becoming increasingly important and should be considered when design...
 
A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model
Found in: Cluster Computing, IEEE International Conference on
By Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen, Guang R. Gao
Issue Date:December 2003
pp. 30
Hmmpfam is a widely used computation-intensive bioinformatics software for sequence classi.cation. The contribution of this paper is the .rst largely scalable and robust cluster-based solution of parallel hmmpfam based on EARTH (Ef.cient Architecture for R...
 
Implementing Parallel Hmm-pfam on the EARTH Multithreaded Architecture
Found in: Computational Systems Bioinformatics Conference, International IEEE Computer Society
By Weirong Zhu, Yanwei Niu, Jizhu Lu, Guang R. Gao
Issue Date:August 2003
pp. 549
No summary available.
   
Bridging the Gap between ISA Compilers and Silicon Compilers a Challenge for Future SoC Design
Found in: System Synthesis, International Symposium on
By Guang R. Gao
Issue Date:October 2001
pp. 93-93
The emerging technology of the System-on-Chip (SoC) is presenting new challenges at both the hardware and software stages of the design process. At present, system software engineers, e.g. high-level programming language (e.g. C/C++) compiler writers for p...
   
Multithreaded Algorithms for Pricing a Class of Complex Options
Found in: Parallel and Distributed Processing Symposium, International
By Ruppa K. Thulasiram, Lubomir Litov, Hassan Nojumi, Christopher T. Downing, Guang R. Gao
Issue Date:April 2001
pp. 10018b
In this paper, we study multithreaded algorithms for pricing American Style options. We describe the algorithms, explain their relative complexities, and study their performance. The binomial lattice problem has been formulated in two distinct ways. In the...
 
Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System
Found in: Parallel and Distributed Processing Symposium, International
By Wen-Yen Lin, Jean-Luc Gaudiot, Jose Nelson Amaral, Guang R. Gao
Issue Date:May 2000
pp. 589
We present the design, implementation, and evaluation of single assignment data structures and of a software controlled cache in an existing multi-threaded architecture platform -- the Efficient Architecture for Running Threads (EARTH). The software-contro...
 
A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes
Found in: Parallel Processing Symposium, International
By Gerd Heber, Guang R. Gao, Rupak Biswas
Issue Date:April 1999
pp. 360
Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors.Furthermore, special atten...
 
Load Adaptive Algorithms and Implementations for the 2D Discrete Wavelet Transform on Fine-Grain Multithreaded Architectures
Found in: Parallel Processing Symposium, International
By Ashfaq A. Khokhar, Gerd Heber, Parimala Thulasiraman, Guang R. Gao
Issue Date:April 1999
pp. 458
In this paper we present a load adaptive parallel algorithm and implementation to compute 2D Discrete Wavelet Transform (DWT) on multithreading machines. In a 2D DWT computation, the problem sizes reduces at every decomposition level and the lengths of the...
 
Elastic History Buffer: A Low-Cost Method to Improve Branch Prediction Accuracy
Found in: Computer Design, International Conference on
By Maria-Dana Tarlescu, Kevin B. Theobald, Guang R. Gao
Issue Date:October 1997
pp. 82
Two-level dynamic branch predictors try to predict the outcomes of conditional branches using both a table of state counters associated with specific branch instructions and a buffer of recent branch outcomes to correlate the counters with specific branch ...
 
Multithreading Implementation of a Distributed Shortest Path Algorithm on EARTH Multiprocessor
Found in: High-Performance Computing, International Conference on
By Parimala Thulasiraman, Xin-Min Tian, Guang R. Gao
Issue Date:December 1996
pp. 336
Network optimization refers to those optimization problems defined on weighted graphs. These problems include the shortest path problem, the max-flow problem, the transshipment problem etc. and they have been extensively used to model several applications....
 
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices
Found in: 2013 Euromicro Conference on Digital System Design (DSD)
By Marco Solinas,Rosa M. Badia,Francois Bodin,Albert Cohen,Paraskevas Evripidou,Paolo Faraboschi,Bernhard Fechner,Guang R. Gao,Arne Garbade,Sylvain Girbal,Daniel Goodman,Behran Khan,Souad Koliai,Feng Li,Mikel Lujan,Laurent Morin,Avi Mendelson,Nacho Navarro,Antoniu Pop,Pedro Trancoso,Theo Ungerer,Mateo Valero,Sebastian Weis,Ian Watson,Stephane Zuckermann,Roberto Giorgi
Issue Date:September 2013
pp. 272-279
Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable systems with 1000+ general purpose cores per chip, probably by 2020. Three major challenges have...
 
Implementation of a Non-strict Functional Programming Language V on a Threaded Architecture EARTH
Found in: Innovative Architecture for Future Generation High-Performance Processors and Systems, International Workshop on
By Shigeru Kusakabe, Kentaro Inenaga, Makoto Amamiya, Xinan Tang, Andres Marquez, Guang R. Gao
Issue Date:October 1998
pp. 95
The combination of a language with fine-grain implicit parallelism and a data ow evaluation scheme is suitable for high-level programming on massively parallel architectures. We are developing a compiler of V, a non-strict functional programming language, ...
 
Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling
Found in: Computer Architecture, International Symposium on
By Guang R. Gao, Herbert H. J. Hum, Kevin B. Theobald, Xin-Min Tian, Olivier Maquelin
Issue Date:May 1996
pp. 179
Parallel systems supporting multithreading, or message passing in general, have typically used either polling or interrupts to handle incoming messages. Neither approach is ideal; either may lead to excessive overheads or message-handling latencies, depend...
 
An efficient parallel algorithm for all pairs examination
Found in: SC Conference
By Kevin B. Theobald, Guang R. Gao
Issue Date:November 1991
pp. 742-753
No abstract available
 
 1  2 Next >>