Search For:

Displaying 1-41 out of 41 total
Predict-More Router: A Low Latency NoC Router with More Route Predictions
Found in: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
By Yuan He,Hiroshi Sasaki,Shinobu Miwa,Hiroshi Nakamura
Issue Date:May 2013
pp. 842-850
Network-on-Chip (NoC) is a critical part of the memory hierarchy of emerging multicores. Lowering its communication latency while preserving its bandwidth is key to achieving high system performance. By now, one of the most effective methods helps achievin...
 
Cooperative shared resource access control for low-power chip multiprocessors
Found in: Low Power Electronics and Design, International Symposium on
By Noriko Takagi, Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura
Issue Date:August 2009
pp. 177-182
In a single-chip multiprocessor (CMP), the last-level cache and its lower memory hierarchy components are typically shared by multiple processors. Conflicts in these resources lead to poor overall performance of the CMP and/or unpredictable performance of ...
 
MegaProto: 1 TFlops/10kW Rack Is Feasible Even with Only Commodity Technology
Found in: SC Conference
By Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, Satoshi Matsuoka, Daisuke Takahashi, Yoshihiko Hotta
Issue Date:November 2005
pp. 28
In our research project
 
MegaProto: A Low-Power and Compact Cluster for High-Performance Computing
Found in: Parallel and Distributed Processing Symposium, International
By Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, Daisuke Takahashi, Yoshihiko Hotta
Issue Date:April 2005
pp. 231b
No summary available.
 
Control Signal Sharing Using Data-Path Delay Information at Control Data Flow Graph Descriptions
Found in: Asynchronous Circuits and Systems, International Symposium on
By Hiroshi Saito, Euiseok Kim, Nattha Sretasereekul, Masashi Imai, Hiroshi Nakamura, Takashi Nanya
Issue Date:May 2003
pp. 184
Due to state explosion problem, signal transition graph based asynchronous circuit synthesis cannot handle large specifications. To overcome this problem, we propose two control signal sharing methods by using the delay information of data-path circuit. Si...
 
McRouter: Multicast within a router for high performance network-on-chips
Found in: 2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)
By Yuan He,Hiroshi Sasaki,Shinobu Miwa,Hiroshi Nakamura
Issue Date:September 2013
pp. 319-329
The inevitable advent of the multi-core era has driven an increasing demand for low latency on-chip inter-connection networks (or NoCs). Being a critical part of the memory hierarchy for modern chip multi-processors (CMPs), these networks face stringent de...
   
Fast Abstracts
Found in: Dependable Systems and Networks, International Conference on
By Hiroshi Nakamura
Issue Date:June 2007
pp. 812
Fast Abstracts are brief two page presentations, either on new ideas, opinion pieces, or a project update. They cover wide variety of issues within the field of dependable systems and networks. They are also designed to offer an opportunity for late-breaki...
   
A Scalable 3D Heterogeneous Multicore with an Inductive ThruChip Interface
Found in: IEEE Micro
By Noriyuki Miura,Yusuke Koizumi,Yasuhiro Take,Hiroki Matsutani,Tadahiro Kuroda,Hideharu Amano,Ryuichi Sakamoto,Mitaro Namiki,Kimiyoshi Usami,Masaaki Kondo,Hiroshi Nakamura
Issue Date:November 2013
pp. 6-15
The authors developed a scalable heterogeneous multicore processor. 3D heterogeneous chip stacking of a general-purpose CPU and reconfigurable multicore accelerators enables various trade-offs between performance and energy consumption. The stacked chips i...
 
Integrating Multi-GPU Execution in an OpenACC Compiler
Found in: 2013 42nd International Conference on Parallel Processing (ICPP)
By Toshiya Komoda,Shinobu Miwa,Hiroshi Nakamura,Naoya Maruyama
Issue Date:October 2013
pp. 260-269
GPUs have become promising computing devices in current and future computer systems due to its high performance, high energy efficiency, and low price. However, lack of high level GPU programming models hinders the wide spread of GPU applications. To resol...
 
Communication Library to Overlap Computation and Communication for OpenCL Application
Found in: 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
By Toshiya Komoda,Shinobu Miwa,Hiroshi Nakamura
Issue Date:May 2012
pp. 567-573
User-friendly parallel programming environments, such as CUDA and OpenCL are widely used for accelerators. They provide programmers with useful APIs, but the APIs are still low level primitives. Therefore, in order to apply communication optimization techn...
 
Cool Mega-Arrays: Ultralow-Power Reconfigurable Accelerator Chips
Found in: IEEE Micro
By Nobuaki Ozaki,Yoshihiro Yasuda,Yoshiki Saito,Daisuke Ikebuchi,Masayuki Kimura,Hideharu Amano,Hiroshi Nakamura,Kimiyoshi Usami,Mitaro Namiki,Masaaki Kondo
Issue Date:November 2011
pp. 6-18
Cool Mega-Array (CMA) is an energy-efficient reconfigurable accelerator for battery-driven mobile devices. It has a large processing-element array without memory elements for mapping an application's data-flow graph, a simple programmable microcontroller f...
 
SLD-1(Silent Large Datapath): A ultra low power reconfigurable accelerator
Found in: IEEE Cool Chips
By Nobuaki Ozaki,Kimiyoshi Usami,Hideharu Amano,Mitaro Namiki,Hiroshi Nakamura,Masaaki Kondo
Issue Date:April 2011
pp. 1-3
SLD(Silent Large Datapath)-1 is a prototype accelerator for media processing consisting of a large Processing Element (PE) array which includes 24bit 8 × 8 PEs with combinatorial circuits and a small micro-controller for data memory access. It was fabricat...
 
Power Reduction Scheme of Fans in a Blade System by Considering the Imbalance of CPU Temperatures
Found in: IEEE-ACM International Conference on Green Computing and Communications and International Conference on Cyber, Physical and Social Computing
By Yuetsu Kodama, Satoshi Itoh, Toshiyuki Shimizu, Satoshi Sekiguchi, Hiroshi Nakamura, Naohiko Mori
Issue Date:December 2010
pp. 81-87
In order to develop a data center power efficiency index, we built a test bed of a data center and measured power components and environmental variables in some detail, including the power consumption and temperature of each node, rack and air conditioning...
 
Ultra Fine-Grained Run-Time Power Gating of On-chip Routers for CMPs
Found in: Networks-on-Chip, International Symposium on
By Hiroki Matsutani, Michihiro Koibuchi, Daisuke Ikebuchi, Kimiyoshi Usami, Hiroshi Nakamura, Hideharu Amano
Issue Date:May 2010
pp. 61-68
This paper proposes an ultra fine-grained run-time power gating of on-chip router, in which power supply to each router component (e.g., VC queue, crossbar MUX, and output latch) can be individually controlled in response to the applied workload.As only th...
 
Design and Implementation of Fine-Grain Power Gating with Ground Bounce Suppression
Found in: VLSI Design, International Conference on
By Kimiyoshi Usami, Toshiaki Shirai, Tasunori Hashida, Hiroki Masuda, Seidai Takeda, Mitsutaka Nakata, Naomi Seki, Hideharu Amano, Mitaro Namiki, Masashi Imai, Masaaki Kondo, Hiroshi Nakamura
Issue Date:January 2009
pp. 381-386
This paper describes a design and implementation methodology for fine-grain power gating. Since sleep-in and wakeup are controlled in a fine granularity in run time, shortening the transition time between the sleep and active states is strongly required. I...
 
Detecting Inconsistent Values Caused by Interaction Faults Using Automatically Located Implicit Redundancies
Found in: Pacific Rim International Symposium on Dependable Computing, IEEE
By Bogdan Tomoyuki Nassu, Takashi Nanya, Hiroshi Nakamura
Issue Date:December 2008
pp. 138-145
This paper addresses the problem of detecting inconsistent values caused by interaction faults originated from an external system.This type of error occurs when a correctly formatted message that is not corrupted during transmission is generated with a fie...
 
Discovering Implicit Redundancies in Network Communications for Detecting Inconsistent Values
Found in: Data Mining Workshops, International Conference on
By Bogdan Tomoyuki Nassu, Takashi Nanya, Hiroshi Nakamura
Issue Date:December 2008
pp. 144-153
Detecting inconsistent values received in a communication is a challenging problem faced in networked systems. Inconsistent values occur when a message contains incorrect data, even though the syntax is correct and there is no corruption due to transmissio...
 
Design and Power Performance Evaluation of On-Chip Memory Processor with Arithmetic Accelerators
Found in: Innovative Architecture for Future Generation High-Performance Processors and Systems, International Workshop on
By Chikafumi Takahashi, Mitsuhisa Sato, Daisuke Takahashi, Taisuke Boku, Akira Ukawa, Hiroshi Nakamura, Hidetaka Aoki, Hideo Sawamoto, Naonobu Sukegawa
Issue Date:January 2008
pp. 51-57
In this paper, we design an on-chip memory processor with arithmetic accelerators, which are expected to improve power consumption. In addition, we evaluate the power performance of the processor. We propose implementing vector-type arithmetic accelerators...
 
A Proposal of New Dependable Database Middleware with Consistency and Concurrency Control
Found in: Pacific Rim International Symposium on Dependable Computing, IEEE
By Takeshi Mishima, Hiroshi Nakamura
Issue Date:December 2007
pp. 334-337
We propose a new dependable database middleware that can synchronize off-the-shelf database servers for consistency and execute write queries concurrently for high throughput. Our proposal also helps to realize low cost system since both existing servers a...
 
A High Performance Cluster System Design by Adaptie Power Control
Found in: Parallel and Distributed Processing Symposium, International
By Masaaki Kondo, Yoshimichi Ikeda, Hiroshi Nakamura
Issue Date:March 2007
pp. 345
The first order design constraint in dense packaged clusters is power consumption. The currently developed cluster systems are conservatively designed so that the expected peak power does not exceed the power limit. However, practical power consumption sel...
 
A Small, Fast and Low-Power Register File by Bit-Partitioning
Found in: High-Performance Computer Architecture, International Symposium on
By Masaaki Kondo, Hiroshi Nakamura
Issue Date:February 2005
pp. 40-49
A large multi-ported register file is indispensable for exploiting instruction level parallelism (ILP) in today's dynamically scheduled superscalar processors. The number of ports and the size of the register file must be enlarged as the issue width and in...
 
Skewed Checkpointing for Tolerating Multi-Node Failures
Found in: Reliable Distributed Systems, IEEE Symposium on
By Hiroshi Nakamura, Takuro Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, Takashi Nanya
Issue Date:October 2004
pp. 116-125
Large cluster systems have become widely utilized because they achieve a good performance/cost ratio especially in high performance computing. Although these cluster systems are distributed memory systems, coordinated check-pointing is a promising way to m...
 
A Method to Verify Originality of Sequences Secretly on Distributed Computing Environment
Found in: High Performance Computing and Grid in Asia Pacific Region, International Conference on
By Ken-ichi Kurata, Hiroshi Nakamura, Vincent Breton
Issue Date:July 2004
pp. 310-319
In the field of molecular biology, it is important to find gene sequences related to some phenomena, such as disease and chemical reaction. Once a target gene has been sequenced, it must be confirmed whether the sequence is already known or not in the worl...
 
Data Movement Optimization for Software-Controlled On-Chip Memory
Found in: Interaction between Compilers and Computer Architecture, Annual Workshop on
By Motonobu Fujita, Masaaki Kondo, Hiroshi Nakamura
Issue Date:February 2004
pp. 120-127
<p>In order to overcome performance degradation caused by performance disparity between processor and main memory, there have been proposed several new VLSI architectures which have software controlled on-chip memory in addition to the conventional c...
 
The Standard SpecC Language
Found in: System Synthesis, International Symposium on
By Hiroshi Nakamura, Masahiro Fujita
Issue Date:October 2001
pp. 81-86
This paper introduces SpecC language, a system level description language based on C, and its consortium, SpecC Technology Open Consortium (STOC). Currently SpecC language version 1.0 is publicly available. SpecC technology covers SpecC-based design
 
Performance Evaluation of Cascade ALU Architecture for Asynchronous Super-Scalar Processors
Found in: Asynchronous Circuits and Systems, International Symposium on
By Motokazu Ozawa, Masashi Imai, Hiroshi Nakamura, Takashi Nanya, Yoichiro Ueno
Issue Date:March 2001
pp. 162
Current out-of-order architectures have the critical path in the memory structure. Since the memory access delay mainly consists of wire delays, the feature size reduction will make little contribution on the critical path reduction. Therefore, the perform...
 
SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing
Found in: Computer Design, International Conference on
By Masaaki Kondo, Hideki Okawara, Hiroshi Nakamura, Taisuke Boku
Issue Date:September 2000
pp. 105
Processor performance has been improved due to clock acceleration and ILP extraction techniques. Performance of main memory, however, has not been improved so much. The performance gap between processor and memory will be growing further in the future. Thi...
 
SCIMA: A Novel Architecture for High Performance Computing
Found in: Innovative Architecture for Future Generation High-Performance Processors and Systems, International Workshop on
By Hiroshi Nakamura, Hideki Okawara, Shuichi Sakai, Taisuk Boku, Masaaki Kondo
Issue Date:November 1999
pp. 45
Technological trends have brought the growing disparity between processor and memory speeds. This memory wall problem is becoming very serious especially in high performance computing. In this paper, we propose a new architecture SCIMA for solving this pro...
 
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
Found in: IEEE Transactions on Computers
By Preeti Ranjan Panda, Hiroshi Nakamura, Nikil D. Dutt, Alexandru Nicolau
Issue Date:February 1999
pp. 142-149
<p><b>Abstract</b>—Loop blocking (tiling) is a well-known compiler optimization that helps improve cache performance by dividing the loop iteration space into smaller blocks (tiles); reuse of array elements within each tile is maximized b...
 
The Architecture of Massively Parallel Processor CP-PACS
Found in: Parallel Algorithms / Architecture Synthesis, AIZU International Symposium on
By Taisuke Boku, Yoichi IwasakiI, Hiroshi Nakamura, Kisaburo Nakazawa
Issue Date:March 1997
pp. 31
CP-PACS (Computational Physics by Parallel Array Computer System) is a massively parallel processor with 2048 Processing Units built at Center for Computational Physics, University of Tsukuba. The node processor of CP-PACS is a RISC microprocessor enhanced...
 
A Scalable 3D Heterogeneous Multi-Core Processor with Inductive-Coupling ThruChip Interface
Found in: IEEE Micro
By Yusuke Koizumi,Noriyuki Miura,Eiichi Sasaki,Yasuhiro Take,Hiroki Matsutani,Tadahiro Kuroda,Hideharu Amano,Ryuichi Sakamoto,Mitaro Namiki,Kimiyoshi Usami,Masaaki Kondo,Hiroshi Nakamura
Issue Date:December 2013
pp. 1
A scalable heterogeneous multi-core processor is developed. 3D heterogeneous chip stacking of a general-purpose CPU and reconfigurable multicore accelerators enables various trade-off between performance and energy consumption. The stacked chips interconne...
 
Scalability-based manycore partitioning
Found in: Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12)
By Hiroshi Nakamura, Hiroshi Sasaki, Koji Inoue, Teruo Tanimoto
Issue Date:September 2012
pp. 107-116
Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a number of active proce...
     
Cooperative shared resource access control for low-power chip multiprocessors
Found in: Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design (ISLPED '09)
By Hiroshi Nakamura, Hiroshi Sasaki, Masaaki Kondo, Noriko Takagi
Issue Date:August 2009
pp. 1-2
In a single-chip multiprocessor (CMP), the last-level cache and its lower memory hierarchy components are typically shared by multiple processors. Conflicts in these resources lead to poor overall performance of the CMP and/or unpredictable performance of ...
     
An intra-task dvfs technique based on statistical analysis of hardware events
Found in: Proceedings of the 4th international conference on Computing frontiers (CF '07)
By Hiroshi Nakamura, Hiroshi Sasaki, Masaaki Kondo, Yoshimichi Ikeda
Issue Date:May 2007
pp. 123-130
The importance and demand for various types of optimization techniques for program execution is growing rapidly. In particular, dynamic optimization techniques are regarded as important. Although conventional techniques usually generated an execution model...
     
Energy-efficient dynamic instruction scheduling logic through instruction grouping
Found in: Proceedings of the 2006 international symposium on Low power electronics and design (ISLPED '06)
By Hiroshi Nakamura, Hiroshi Sasaki, Masaaki Kondo
Issue Date:October 2006
pp. 43-48
Dynamic instruction scheduling logic is quite complex and dissipates significant energy in microprocessors that support superscalar and out-of-order execution. We propose a novel microarchitectural technique to reduce the complexity and energy consumption ...
     
Formal Verification of a Pipelined Processor with New Memory
Found in: Pacific Rim International Symposium on Dependable Computing, IEEE
By Hiroshi NAKAMURA, Takanori ARAI, Masahiro FUJITA
Issue Date:December 2002
pp. 321
Recently, model checkers have become commercially available. To investigate their ability, Solidify is selected as the representative of them and applied to a verification of a new processor. The processor adopts new memory hierarchy and new instructions. ...
 
Interactive presentation: Task scheduling under performance constraints for reducing the energy consumption of the GALS multi-processor SoC
Found in: Proceedings of the conference on Design, automation and test in Europe (DATE '07)
By Hiroshi Nakamura, Masaaki Kondo, Masashi Imai, Ryo Watanabe, Takashi Nanya
Issue Date:April 2007
pp. 797-802
The present paper focuses on applications that are periodic and have both latency and throughput constraints. For these applications, pipeline scheduling is effective for reducing energy consumption. Thus, the present paper proposes a pipelined task schedu...
     
SCIMA-SMP: on-chip memory processor architecture for SMP
Found in: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture (WMPI '04)
By Chikafumi Takahashi, Daisuke Takahashi, Hiroshi Nakamura, Masaaki Kondo, Mitsuhisa Sato, Taisuke Boku
Issue Date:June 2004
pp. 121-128
In this paper, we propose a processor architecture with programmable on-chip memory for a high-performance SMP (symmetric multi-processor) node named SCIMA-SMP (Software Controlled Integrated Memory Architecture for SMP) with the intent of solving the perf...
     
The standard SpecC language
Found in: Proceedings of the 14th international symposium on Systems synthesis (ISSS '01)
By Hiroshi Nakamura, Masahiro Fujita
Issue Date:September 2001
pp. 81-86
This paper introduces SpecC language, a system level description language based on C, and its consortium, SpecC Technology Open Consortium (STOC). Currently SpecC language version 1.0 is publicly available. SpecC technology covers SpecC-based design "metho...
     
CP-PACS: a massively parallel processor for large scale scientific calculations
Found in: Proceedings of the 11th international conference on Supercomputing (ICS '97)
By Hiroshi Nakamura, Ken'ichi Itakura, Kisaburo Nakazawa, Taisuke Boku
Issue Date:July 1997
pp. 108-115
To minimize the amount of computation and storage for parallel sparse factorization, sparse matrices have to be reordered prior to factorization. We show that none of the popular ordering heuristics proposed before, namely, mulitple minimum degree and nest...
     
A scalar architecture for pseudo vector processing based on slide-windowed registers
Found in: Proceedings of the 7th international conference on Supercomputing (ICS '93)
By Hideo Wada, Hiromitsu Imori, Hiroshi Nakamura, Ikuo Nakata, Kisaburo Nakazawa, Taisuke Boku, Yasuhiro Inagami, Yoshiyuki Yamashita
Issue Date:July 1993
pp. 298-307
In this paper, we present a new scalar architecture for high-speed vector processing. Without using cache memory, the proposed architecture tolerates main memory access latency by introducing slide-windowed floating-point registers with data preloading fea...
     
 1