Search For:

Displaying 1-25 out of 25 total
Godson-3: A Scalable Multicore RISC Processor with x86 Emulation
Found in: IEEE Micro
By Weiwu Hu, Jian Wang, Xiang Gao, Yunji Chen, Qi Liu, Guojie Li
Issue Date:March 2009
pp. 17-29
<p>The Godson-3 microprocessor aims at high-throughput server applications, high-performance scientific computing, and high-end embedded applications. It offers a scalable network on chip, hardware support for x86 emulation, and a reconfigurable arch...
 
LDet: Determinizing Asynchronous Transfer for Postsilicon Debugging
Found in: IEEE Transactions on Computers
By Yunji Chen,Tianshi Chen,Ling Li,Lei Li,Liang Yang,Menghao Su,Weiwu Hu
Issue Date:September 2013
pp. 1732-1744
To efficiently and effectively debug silicon bugs, a promising solution is to determinize the chip, so that the buggy silicon behaviors can be faithfully reproduced on a RTL simulator. In this paper, we propose a novel scheme, named LDet, to determinize a ...
 
Program Regularization in Memory Consistency Verification
Found in: IEEE Transactions on Parallel and Distributed Systems
By Yunji Chen,Lei Li,Tianshi Chen,Ling Li,Lei Wang,Xiaoxue Feng,Weiwu Hu
Issue Date:November 2012
pp. 2163-2174
A widely adopted methodology for verifying the memory subsystem of a Chip Multiprocessor (CMP) is to verify executions of parallel test programs on the CMP against the given memory consistency model, which has been long known to be time consuming in both t...
 
Statistical performance comparisons of computers
Found in: High-Performance Computer Architecture, International Symposium on
By Tianshi Chen,Yunji Chen,Qi Guo,Olivier Temam,Yue Wu,Weiwu Hu
Issue Date:February 2012
pp. 1-12
As a fundamental task in computer architecture research, performance comparison has been continuously hampered by the variability of computer performance. In traditional performance comparisons, the impact of performance variability is usually ignored (i.e...
 
Empirical design bugs prediction for verification
Found in: 2011 Design, Automation & Test in Europe
By Qi Guo, Tianshi Chen, Haihua Shen, Yunji Chen, Yue Wu, Weiwu Hu
Issue Date:March 2011
pp. 1-6
Coverage model is the main technique to evaluate the thoroughness of dynamic verification of a Design-under-Verification (DUV). However, rather than achieving a high coverage, the essential purpose of verification is to expose as many bugs as possible. In ...
   
Linear Time Memory Consistency Verification
Found in: IEEE Transactions on Computers
By Weiwu Hu,Yunji Chen,Tianshi Chen,Cheng Qian,Lei Li
Issue Date:April 2012
pp. 502-516
Verifying the execution of a parallel program against a given memory consistency model (memory consistency verification) is a crucial problem in the functional validation of Chip Multiprocessor (CMP). In the absence of additional information, the above pro...
 
On-the-Fly Reduction of Stimuli for Functional Verification
Found in: Asian Test Symposium
By Qi Guo, Tianshi Chen, Haihua Shen, Yunji Chen, Weiwu Hu
Issue Date:December 2010
pp. 448-454
As a primary method for functional verification of microprocessors, simulation-based verification has received extensive studies over the last decade. Most investigations have been dedicated to the generation of stimuli (test cases), while relatively few h...
 
Design of Low-Cost High-Performance Floating-Point Fused Multiply-Add with Reduced Power
Found in: VLSI Design, International Conference on
By Zichu Qi, Qi Guo, Ge Zhang, Xiangku Li, Weiwu Hu
Issue Date:January 2010
pp. 206-211
This paper presents a floating-point fused multiply-add (FMA) unit with low-cost and low power techniques. To improve the performance, two single-precision operations can be performed concurrently with one double-precision datapath, which is very useful in...
 
An efficient methodology for power modeling and simulation of modern cell-based microprocessors
Found in: Circuits and Systems, Midwest Symposium on
By Ge Zhang, Weiwu Hu
Issue Date:August 2009
pp. 1126-1129
This paper presents a methodology for high-level power modeling of cell-based processors. A flexible power model library, which can automatically generate detailed power data for actual circuits of each part of given processor, is developed and annotated d...
 
Fetching Primary and Redundant Instructions in Turn for a Fault-Tolerant Embedded Microprocessor
Found in: Pacific Rim International Symposium on Dependable Computing, IEEE
By Shijian Zhang, Weiwu Hu
Issue Date:December 2008
pp. 1-8
With the development of semiconductor technology, microprocessors become more and more susceptible to transient faults. Some proposed schemes support redundant execution of a program in a superscalar processor for fault tolerance. However, they require a h...
 
An interconnect-aware power efficient cache coherence protocol for CMPs
Found in: Parallel and Distributed Processing Symposium, International
By Hongbo Zeng, Jun Wang, Ge Zhang, Weiwu Hu
Issue Date:April 2008
pp. 1-11
The continuing shrinking of technology enables more and more processor cores to reside on a single chip. However, the power consumption and delay of global wires have presented a great challenge in designing future chip multi-processors. With these overhea...
 
A High Speed CMOS Transmitter and Rail-to-Rail Receiver
Found in: Electronic Design, Test and Applications, IEEE International Workshop on
By Feng Zhang, Zongren Yang, Wei Feng, Hao Cui, Lingyi Huang, Weiwu Hu
Issue Date:January 2008
pp. 67-70
This paper presents a high speed low voltage differential signal (LVDS) interface circuit for CPU, LCD, FPGA and other fast links. In the proposed transmitter a stabile reference and a common mode feedback circuit are applied into the LVDS drivers, which e...
 
CREA: A Checkpoint Based Reliable Micro-architecture for Superscalar Processors
Found in: Asian Test Symposium
By Shijian Zhang, Weiwu Hu
Issue Date:October 2007
pp. 313-318
Conventional temporal redundant techniques to detect transient faults have resulted in considerable performance loss. One major reason for this problem is the reclamation of some critical resources, such as the instruction window and physical registers, is...
 
A Comparison of Two Strategies of Dynamic Data Prefetching in Software DSM
Found in: Parallel and Distributed Processing Symposium, International
By Haiming Liu, Weiwu Hu
Issue Date:April 2001
pp. 10062a
A major overhead of software DSM is the long remote access latency when the accessed page is not in the local cache. One method for tolerating the remote access latency is to prefetch the pages before they are accessed. This paper compares two methods of d...
 
Running Real Applications on Software DSMs
Found in: High-Performance Computing in the Asia-Pacific Region, International Conference on
By Weiwu Hu, Fuxin Zhang, Li Ren, Weisong Shi, Zhimin Tang
Issue Date:May 2000
pp. 148
This paper introduces our experiences with some real applications on the home-based software DSM JIAJIA and discusses techniques of parallelizing a sequential program to run on software DSM. It categorizes parallel program segments into five patterns: sing...
 
Adaptive Write Detection in Home-Based Software DSMs
Found in: High-Performance Distributed Computing, International Symposium on
By Weiwu Hu, Weisong Shi, Zhimin Tang
Issue Date:August 1999
pp. 27
Write detection is essential in multiple-writer protocols to identify writes to shared pages so that these writes can be correctly propagated. Software DSMs that implement multiple-writer protocol normally employ the virtual memory page fault to detect wri...
 
Dynamic Task Migration in Home-based Software DSM Systems
Found in: High-Performance Distributed Computing, International Symposium on
By Weisong Shi, Weiwu Hu, Zhimin Tang, M. Rasit Eskicioglu
Issue Date:August 1999
pp. 20
Dynamic task migration is an effective strategy to maximize the performance and resource utilization in meta-computing environments. Traditionally, however, a
 
Reducing System Overheads in Home-based Software DSMs
Found in: Parallel Processing Symposium, International
By Weiwu Hu, Weisong Shi, Zhimin Tang
Issue Date:April 1999
pp. 167
Software DSM systems suffer from the high communication and coherence-induced overheads that limit performance. This paper introduces our efforts in reducing system overheads of a home-based software DSM called JIAJIA. Three measures, including eliminating...
 
Evaluation of the JIAJIA Software DSM System on High Performance Computer Architectures
Found in: Hawaii International Conference on System Sciences
By M. Rasit Eskicioglu, T. Anthony Marsland, Weiwu Hu, Weisong Shi
Issue Date:January 1999
pp. 8012
Distributed Shared Memory (DSM) combines the scalability of loosely coupled multicomputer systems with the ease of usability of tightly coupled multiprocessors, and allows transparent replication and caching of data. DSM has received much attention in the ...
   
A shared virtual memory network with fast remote direct memory access and message passing
Found in: Cluster Computing, IEEE International Conference on
By Gang Shi, Mingchang Hu, Hongda Yin, Weiwu Hu, Zhimin Tang
Issue Date:September 2004
pp. 495
The communication overhead has become one of the bottlenecks of SVM (shared virtual memory). Many methods have been taken to improve the performance of SVM. However, these can't obtain the improvement as expected. In order to get further utility of communi...
   
Communication with Threads in Software DSMs
Found in: Cluster Computing, IEEE International Conference on
By Weiwu Hu Gang Shi, Fuxin Zhang
Issue Date:October 2001
pp. 149
Most software DSMs use the interrupt mechanism for asynchronous message arrival notification. Interrupt, however, is expensive and may not be available in user-level communication protocols. This paper studies the performance of software DSMs using threads...
 
Deterministic Replay Using Global Clock
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Weiwu Hu, Yunji Chen
Issue Date:April 2013
pp. 1-28
Debugging parallel programs is a well-known difficult problem. A promising method to facilitate debugging parallel programs is using hardware support to achieve deterministic replay on a Chip Multi-Processor (CMP). As a Design-For-Debug (DFD) feature, a pr...
     
Brief announcement: program regularization in verifying memory consistency
Found in: Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11)
By Cheng Qian, Lei Li, Ling Li, Tianshi Chen, Weiwu Hu, Yunji Chen
Issue Date:June 2011
pp. 265-266
Verifying memory consistency, which is to verify the executions of parallel test programs on a multiprocessor system against the given memory consistency model, is NP-hard. To accelerate verifying memory consistency in practice, we devise a technique calle...
     
LReplay: a pending period based deterministic replay scheme
Found in: Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10)
By Ruiyang Wu, Tianshi Chen, Weiwu Hu, Yunji Chen
Issue Date:June 2010
pp. 72-ff
Debugging parallel program is a well-known difficult problem. A promising method to facilitate debugging parallel program is using hardware support to achieve deterministic replay. A hardware-assisted deterministic replay scheme should have a small log siz...
     
A multi-FPGA based platform for emulating a 100m-transistor-scale processor with high-speed peripherals (abstract only)
Found in: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays (FPGA '10)
By Dan Tang, Huandong Wang, Weiwu Hu, Xiang Gao, Yunji Chen
Issue Date:February 2010
pp. 283-283
This paper describes a multi-FPGA based platform for emulating the Loongson-2G micro-processor on different mother boards. This platform is developed targeting at verification and evaluation of the Loongson-2G micro-processor, which is the next generation ...
     
 1