Search For:

Displaying 1-50 out of 59 total
Gemini NI: An Integration of Two Network Interfaces
Found in: Networking, Architecture, and Storage, International Conference on
By Kai Wang, Xiaomin Li, Xuejun An, Ninghui Sun
Issue Date:July 2009
pp. 439-446
According to the development of the TOP500, the performance of the high performance computers (HPCs) is increasing rapidly. The incredible performance increment of the HPCs should be largely attributed to the development of their communication systems, bec...
 
Building a Personal High Performance Computer with Heterogeneous Processors
Found in: Grid and Cloud Computing, International Conference on
By Qiang Li, Zhigang Huo, Ninghui Sun
Issue Date:November 2010
pp. 223-228
Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, compatibility and applicability. In thi...
 
Adding an Expressway to Accelerate the Neighborhood Communication
Found in: High Performance Computing and Communications, 10th IEEE International Conference on
By Kai Wang, Fei Chen, Zheng Cao, Xuejun An, Ninghui Sun
Issue Date:September 2010
pp. 43-48
The blade system is very popular in high performance computing. In a blade system, the blade is a fundamental element in which are symmetric multi-processors (SMP). About ten blades constitute a blade box, several blade boxes constitute a cabinet and some ...
 
EthSpeeder: A High-performance Scalable Fault-Tolerant Ethernet Network Architecture for Data Center
Found in: Networking, Architecture, and Storage, International Conference on
By Dawei Wang,Xian-He Sun,Nongda Hu,Ninghui Sun
Issue Date:July 2011
pp. 355-363
Modern data centers accommodate tens or even hundreds of thousands of servers. The sheer volume of servers in these data centers greatly increases the requirements of the supporting network with regards to scalable bisection bandwidth, network latency, fau...
 
Decentralized NIC-Switching Architecture Using SR-IOV PCI Express Network Device
Found in: IEEE Micro
By Dawei Zang,Zheng Cao,Zhan Wang,Xiaoli Liu,Lin Wang,Ninghui Sun
Issue Date:September 2014
pp. 42-50
To increase the flexibility and bandwidth of intrarack communication, the authors propose a decentralized network-interface-controller (NIC) switching architecture that enables rack-level network bandwidth disaggregation. This is the first solution that us...
 
Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms
Found in: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
By Jie Yan,Guangming Tan,Xiuxia Zhang,Erlin Yao,Ninghui Sun
Issue Date:February 2013
pp. 1-10
For graph traversal applications, fine synchronization is required to exploit massive fine parallelism. However, in the conventional solution using fine-grained locks, locks themselves suffer huge memory cost as well as poor locality for inherent irregular...
 
Micro-architectural characterization of desktop cloud workloads
Found in: 2012 IEEE International Symposium on Workload Characterization (IISWC)
By Tao Jiang,Rui Hou,Lixin Zhang,Ke Zhang,Licheng Chen,Mingyu Chen,Ninghui Sun
Issue Date:November 2012
pp. 131-140
Desktop cloud replaces traditional desktop computers with completely virtualized systems from the cloud. It is becoming one of the fastest growing segments in the cloud computing market. However, as far as we know, there is little work done to understand t...
 
ALWP: A Workload Partition Method for the Efficient Parallel Simulation of Manycores
Found in: 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)
By Shuai Jiao,Da Wang,Xiaochun Ye,Weizhi Xu,Hao Zhang,and Ninghui Sun
Issue Date:June 2012
pp. 135-142
this paper addresses the workload partition strategies in simulating many-core architectures. The key observation behind this paper is: compared to multicore, manycore features with more non-uniform memory access and unpredictable network traffic; these fe...
 
PartitionSim: A Parallel Simulator for Many-cores
Found in: 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)
By Shuai Jiao,Da Wang,Xiaochun Ye,Weizhi Xu,Hao Zhang,Ninghui Sun
Issue Date:June 2012
pp. 119-126
This paper introduces PartitionSim, a parallel simulator for future thousand-core processors. The purpose of PartitionSim is to improve the simulation performance of many-core architectures at the expense of little accuracy sacrifice. To achieve this goal,...
 
A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism
Found in: 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
By Erlin Yao,Rui Wang,Mingyu Chen,Guangming Tan,Ninghui Sun
Issue Date:May 2012
pp. 438-448
Fault tolerance overhead of high performance computing (HPC) applications is becoming critical to the efficient utilization of HPC systems at large scale. Today's HPC applications typically tolerate fail-stop failures by check pointing. However, check poin...
 
High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers
Found in: 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
By Jianfeng Zhan,Lixin Zhang,Ninghui Sun,Lei Wang,Zhen Jia,Chunjie Luo
Issue Date:May 2012
pp. 1712-1721
For the first time, this paper systematically identifies three categories of throughput oriented workloads in data centers: services, data processing applications, and interactive real-time applications, whose targets are to increase the volume of throughp...
 
Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture
Found in: 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
By Wendi Wang,Wen Tang,Linchuan Li,Guangming Tan,Peiheng Zhang,Ninghui Sun
Issue Date:May 2012
pp. 665-674
Next Generation Sequencing (NGS) is gaining interests due to the increased requirements and the decreased sequencing cost. The important and prerequisite step of most NGS applications is the mapping of short sequences, called reads, to the template referen...
 
Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism
Found in: IEEE Micro
By Dongrui Fan,Hao Zhang,Da Wang,Xiaochun Ye,Fenglong Song,Guojie Li,Ninghui Sun
Publication Date: April 2012
pp. N/A
Godson-T is a research many-core processor designed for parallel scientific computing. It delivers efficient performance and flexible programmability simultaneously. On the one hand, Godson-T has many features to achieve high efficiency for on-chip resourc...
 
Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator
Found in: Field-Programmable Custom Computing Machines, Annual IEEE Symposium on
By Wen Tang,Wendi Wang,Bo Duan,Chunming Zhang,Guangming Tan,Peiheng Zhang,Ninghui Sun
Issue Date:May 2012
pp. 184-187
The explosion of Next Generation Sequencing (NGS) data with over one billion reads per day poses a great challenge to the capability of current computing systems. In this paper, we proposed a CPU-FPGA heterogeneous architecture for accelerating a short rea...
 
Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism
Found in: IEEE Micro
By Dongrui Fan,Hao Zhang,Da Wang,Xiaochun Ye,Fenglong Song,Guojie Li,Ninghui Sun
Issue Date:March 2012
pp. 38-47
Godson-T is a research many-core processor designed for parallel scientific computing that delivers efficient performance and flexible programmability simultaneously. It also has many features to achieve high efficiency for on-chip resource utilization, su...
 
Characterization of real workloads of web search engines
Found in: IEEE Workload Characterization Symposium
By Huafeng Xi,Jianfeng Zhan,Zhen Jia,Xuehai Hong,Lei Wang,Lixin Zhang,Ninghui Sun,Gang Lu
Issue Date:November 2011
pp. 15-25
Search is the most heavily used web application in the world and is still growing at an extraordinary rate. Understanding the behaviors of web search engines, therefore, is becoming increasingly important to the design and deployment of data center systems...
 
Fast implementation of DGEMM on Fermi GPU
Found in: SC Conference
By Guangming Tan,Linchuan Li,Sean Triechle,Everett Phillips,Yungang Bao,Ninghui Sun
Issue Date:November 2011
pp. 1-11
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEM-M) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of th...
 
Fast and Compact Regular Expression Matching Using Character Substitution
Found in: Symposium On Architecture For Networking And Communications Systems
By Xingkui Liu,Xinchun Liu,Ninghui Sun
Issue Date:October 2011
pp. 85-86
Regular expression (Reg Ex) matching plays an important role in many modern intrusion detection systems (IDS). DFA is an effective way to perform regular expression matching. However, the prohibitive memory requirement makes DFAs impractical for many real ...
 
Optimizing MPI Alltoall Communication of Large Messages in Multicore Clusters
Found in: Parallel and Distributed Computing Applications and Technologies, International Conference on
By Qiang Li,Zhigang Huo,Ninghui Sun
Issue Date:October 2011
pp. 257-262
MPI All to all communication is widely used in many high performance computing (HPC) applications. In All to all communication, each process sends a distinct message to all other participating processes. In multicore clusters, processes within a node simul...
 
Design of HPC Node with Heterogeneous Processors
Found in: Cluster Computing, IEEE International Conference on
By Zheng Cao,Hongwei Tang,Qiang Li,Bo Li,Fei Chen,Kai Wang,Xuejun An,Ninghui Sun
Issue Date:September 2011
pp. 130-138
Heterogeneous Computing is becoming an important technology trend in HPC, where more and more heterogeneous processors are used. However, in traditional node architecture, heterogeneous processors are always used as coprocessors. Such usage increases the c...
 
Accelerating 2D FFT with Non-Power-of-Two Problem Size on FPGA
Found in: Reconfigurable Computing and FPGAs, International Conference on
By Wendi Wang, Bo Duan, Chunming Zhang, Peiheng Zhang, Ninghui Sun
Issue Date:December 2010
pp. 208-213
The emergence of embedded and multimedia applications, which have a data-centric favor to them, have great influences on the design methodology of future systems. The 2D FFT is of particular importance to these applications. In this paper, leveraging the r...
 
Integrating DBMSs as a Read-Only Execution Layer into Hadoop
Found in: Parallel and Distributed Computing Applications and Technologies, International Conference on
By Mingyuan An, Yang Wang, Weiping Wang, Ninghui Sun
Issue Date:December 2010
pp. 17-26
To obtain the efficiency of DBMS, HadoopDB combines Hadoop and DBMS, and claims the superiority over Hadoop in terms of performance. However, the approach of HadoopDB is simply putting Map Reduce onto unmodified single-machined DBMSs which has several obvi...
 
HPP Controller: A System Controller Dedicated for Message Passing
Found in: Parallel and Distributed Computing Applications and Technologies, International Conference on
By Kai Wang, Fei Chen, Zheng Cao, Xuejun An, Ninghui Sun
Issue Date:December 2010
pp. 261-266
The traditional system controller in symmetric multi-processors (SMP) controls the memory, so it is suitable for the shared memory programming model. With the emergence of the processors which integrate memory controllers, the system controller seems less ...
 
P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation
Found in: Parallel and Distributed Simulation, Workshop on
By Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun
Issue Date:May 2010
pp. 1-8
Multi-core processors are commonly available now, but most traditional computer architectural simulators still use single-thread execution. In this paper we use parallel discrete event simulation (PDES) to speedup a cycle-accurate event-driven many-core pr...
 
Adaptive and scalable metadata management to support a trillion files
Found in: SC Conference
By Jing Xing, Jin Xiong, Ninghui Sun, Jie Ma
Issue Date:November 2009
pp. 1-11
Nowadays more and more applications require file systems to efficiently maintain million or more files. How to provide high access performance with such a huge number of files and such large directories is a big challenge for cluster file systems. Limited ...
 
A Virtualized Self-Adaptive Parallel Programming Framework for Heterogeneous High Productivity Computers
Found in: Parallel and Distributed Processing with Applications, International Symposium on
By Hua Cheng, Zuoning Chen, Ninghui Sun, Fenbin Qi, Chaoqun Dong, Laiwang Cheng
Issue Date:August 2009
pp. 543-548
This paper proposed a Virtualized Self-Adaptive Heterogeneous High Productivity Computers Parallel Programming Framework (VAPPF), which is composed of Virtualization-Based Runtime System (VRTS) and Virtualized Adaptive Parallel Programming Model (VAPPM). V...
 
Memory Based Metadata Server for Cluster File Systems
Found in: Grid and Cloud Computing, International Conference on
By Jing Xing, Jin Xiong, Jie Ma, Ninghui Sun
Issue Date:October 2008
pp. 287-291
In high performance computing environment, the metadata servers of distributed file system become critical to impact overall system performance. An approach of memory based metadata server is proposed, instead of the disk based approach. We present a metad...
 
HPP Switch: A Novel High Performance Switch for HPC
Found in: High-Performance Interconnects, Symposium on
By Dawei Wang, Zheng Cao, Xinchun Liu, Ninghui Sun
Issue Date:August 2008
pp. 145-153
The high performance switch plays a critical role in the high performance computer (HPC) system. The applications of HPC not only demand on the low latency and high bandwidth of the switch, but also need the effective support of collective communication, s...
 
Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures
Found in: IEEE Transactions on Parallel and Distributed Systems
By Guangming Tan, Ninghui Sun, Guang R. Gao
Issue Date:February 2009
pp. 261-274
Dynamic programming (DP) is a popular technique which is used to solve combinatorial search and optimization problems. This paper focuses on one type of DP, which is called nonserial polyadic dynamic programming (NPDP). Owing to the nonuniform data depende...
 
A layered design methodology of cluster system stack
Found in: Cluster Computing, IEEE International Conference on
By Jianfeng Zhan, Lei Wang, Bibo Tu, Zhihong Zhang, Yu Wen, Yuansheng Chen, Wei Zhou, Dan Meng, Ninghui Sun
Issue Date:September 2007
pp. 404-409
The application range of cluster has expanded beyond scientific computing, but the present cluster system software fails to provide a flexible architecture to promote code reuse and facilitate building cluster system software for different computing contex...
 
Design of NIC Based on I/O Processor for Cluster Interconnect Network
Found in: Networking, Architecture, and Storage, International Conference on
By Xiaojun Yang, Dongdong Wu, Ninghui Sun
Issue Date:July 2007
pp. 3-8
An effective interconnect network interface card (NIC) is critical to the achievement of a high-performance cluster system. An original NIC architecture based on the Intel IOP310 I/O processor chipset is presented in this paper. The NIC is a part of DCNet,...
 
United-FS: A Logical File System Providing a Single Image of Multiple Physical File Systems on NFS Server
Found in: Parallel and Distributed Processing Symposium, International
By Huan Chen, Yi Zhao, Jin Xiong, Jie Ma, Ninghui Sun
Issue Date:March 2007
pp. 368
NFS is considered to be the bottleneck in cluster computing environment because of its limited resources and centralized data management. With the development of hardware, NFS server has more than one I/O channel, more storage space and more powerful CPU. ...
 
Locality and Parallelism Optimization for Dynamic Programming Algorithm in Bioinformatics
Found in: SC Conference
By Guangming Tan, Shengzhong Feng, Ninghui Sun
Issue Date:November 2006
pp. 41
Dynamic programming has been one of the most efficient approaches to sequence analysis and structure prediction in biology. However, their performance is limited due to the drastic increase in both the number of biological data and variety of the computer ...
 
Research on Key Technologies of Load Balancing for NFS Server with Multiple Network Paths
Found in: Grid and Cooperative Computing Workshops, International Conference on
By Huan Chen, Rongfeng Tang, Yi Zhao, Jin Xiong, Jie Ma, Ninghui Sun
Issue Date:October 2006
pp. 407-411
NFS server is designed to run on a single node. Even if NFS server is configured with multiple network interfaces, each client can only access NFS server through one network interface of the server. In this paper, we design and implement the Multi-path loa...
 
PhoenixG: A Unified Management Framework for Industrial Information Grid
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Jianfeng Zhan, Gengpu Liu, Lei Wang, Bibo Tu, Yi Jin, Yang Li, Yan Hao, Xuehai Hong, Dan Meng, Ninghui Sun
Issue Date:May 2006
pp. 489-496
The Industrial Information Grid is a special kind of system, the users of which exclusively own geographically distributed computing resources for business service, and try to maintain the lowest total cost of ownership while guaranteeing quality of servic...
 
Improving locality of nonserial polyadic dynamic programming
Found in: Parallel and Distributed Processing Symposium, International
By Guangming Tan, Ninghui Sun, Dongbo Bu
Issue Date:April 2006
pp. 461
Dynamic programming (DP) is a commonly used technique for solving a wide variety of discrete optimization problems, which have different variants of dynamic programming formulation. This paper investigated one important DP formulation, which called nonseri...
 
An experimental study of optimizing bioinformatics applications
Found in: Parallel and Distributed Processing Symposium, International
By Guangming Tan, Lin Xu, Shengzhong Feng, Ninghui Sun
Issue Date:April 2006
pp. 284
As bioinformatics is an emerging application of high performance computing, this paper first evaluates the memory performance of several representative bioinformatics applications so that some appropriate optimization methods can be applied. Based on the c...
 
High Performance Sockets over Kernel Level Virtual Interface Architecture
Found in: High Performance Computing and Grid in Asia Pacific Region, International Conference on
By Zhigang Huo, Yansong Yu, Ninghui Sun
Issue Date:December 2005
pp. 220-226
The Sockets application programming interface is the de facto standard in network programming. Sockets emulation over high performance networks has being pursued by many researchers. Most projects in this area favor user level communication, but this appro...
 
Parallel Multiple Sequences Alignment in SMP Cluster
Found in: High Performance Computing and Grid in Asia Pacific Region, International Conference on
By Guangming Tan, Shengzhong Feng, Ninghui Sun
Issue Date:December 2005
pp. 426-431
Multiple sequences alignment is a fundamental and challenging problem in computational molecular biology. It is commonly used to analyse the DNA/protein sequences. To develop a high efficient parallel algorithm is a very important solution to speedup this ...
 
A Storage Space Management Policy for a Cluster File System
Found in: High Performance Computing and Grid in Asia Pacific Region, International Conference on
By Jin Xiong, Rongfeng Tang, Zhihua Fan, Hui Li, Jie Ma, Dan Meng, Ninghui Sun
Issue Date:December 2005
pp. 240-248
The major challenge in designing cluster file systems is to provide high aggregate I/O bandwidth and high metadata processing throughput for applications running on large-scale cluster systems. And with the rapid increase of required data storage, how to i...
 
Fire Phoenix Cluster Operating System Kernel and its Evaluation
Found in: Cluster Computing, IEEE International Conference on
By Jianfeng Zhan, Ninghui Sun
Issue Date:September 2005
pp. 1-9
Fire Phoenix cluster operating system kernel (Phoenix kernel) is a minimum set of cluster core junctions with scalability and fault-tolerance support. In this paper, we define components of cluster operating system kernel, and introduce its internal mechan...
 
An Efficient Metadata Distribution Policy for Cluster File Systems
Found in: Cluster Computing, IEEE International Conference on
By Jin Xiong, Rongfeng Tang, Sining Wu, Dan Meng, Ninghui Sun
Issue Date:September 2005
pp. 1-10
How to distribute the items in the file system hierarchy across a group of metadata servers is an important issue that determines the holistic metadata processing performance (HMPP) of a cluster file system which manages its metadata by a group of metadata...
 
Load Balancing Algorithm in Cluster-based RNA secondary structure Prediction
Found in: Parallel and Distributed Computing, International Symposium on
By Guangming Tan, Shengzhong Feng, Ninghui Sun
Issue Date:July 2005
pp. 91-96
RNA secondary structure prediction remains one of the most compelling, yet elusive areas of computational biology. Many computational methods have been proposed in an attempt to predict RNA secondary structures. A popular dynamic programming (DP) algorithm...
 
An Optimized Algorithm of High Spatial-temporal Efficiency for MegaBlast
Found in: Parallel and Distributed Systems, International Conference on
By Guangming Tan, Lin Xu, Yishan Jiao, Shengzhong Feng, Dongbo Bu, Ninghui Sun
Issue Date:July 2005
pp. 704-708
<p>BLAST (Basic local alignment search tool), as a heuristic algorithm, is one of the most widely used sequence similarity search tools. MegaBlast, as an improved version of BLAST, speeds up the searches and improves the total throughput owing to gre...
 
Impact of Page Size on Communication Performance
Found in: Parallel and Distributed Processing Symposium, International
By Xiaocheng Zhou, Zhigang Huo, Ninghui Sun, Yingchao Zhou
Issue Date:April 2005
pp. 212b
In this paper, the impact of page size on the communication performance is studied. In the interconnection communication of cluster system, the address translation table (ATT), which is located in the memory of the network interface card (NIC) and can in a...
 
Destructive Transaction: Human-Oriented Cluster System Management Mechanism
Found in: Parallel and Distributed Processing Symposium, International
By Taoying Liu, Zhiwei Xu, Ninghui Sun, Dan Meng
Issue Date:April 2005
pp. 298b
Traditional cluster system management tools seldom consider the relevance between managed objects. Such relevance is the reason of related fault and may also lead to human operation errors. Because of this defect, traditional tools do not have the capabili...
 
Design and Performance of the Dawning Cluster File System
Found in: Cluster Computing, IEEE International Conference on
By Jin Xiong, Sining Wu, Dan Meng, Ninghui Sun, Guojie Li
Issue Date:December 2003
pp. 232
Cluster file system is a key component of system software of clusters. It attracts more and more attention in recent years. In this paper, we introduce the design and implementation of DCFS 1 (the Dawning Cluster File System) — a cluster file system develo...
 
Cluster and Grid Superservers: The Dawning Experiences in China
Found in: Cluster Computing, IEEE International Conference on
By Zhiwei Xu, Ninghui Sun, Dan Meng, Wei Li
Issue Date:October 2001
pp. 351
This paper summarizes recent activities at Institute of Computing Technology, Chinese Academy of Sciences, in developing superservers for cluster and grid computing. We first identify market and technical trends observed from a Chinese perspective. Then we...
 
DPVM: PVM for Dawning Cluster Systems
Found in: High-Performance Computing in the Asia-Pacific Region, International Conference on
By Xingfu Wu, Ninghui Sun
Issue Date:May 2000
pp. 88
This paper depicts the design, implementation and performance of DPVM, which is a port of the PVM (Parallel Virtual Machine) to the Dawning cluster systems, through some design algorithms and experimental results. DPVM is derived from the original PVM 3.3....
 
Group-by Query Process in Middleware of Large Scale Data Intensive Systems
Found in: Networking, Architecture, and Storage, International Conference on
By Huaiming Song, Mingyuan An, Yang Wang, Weiping Wang, Ninghui Sun
Issue Date:July 2009
pp. 82-89
Large scale data intensive systems are available in many fields in recent years, and it’s a severe challenge for group-by query of large volume of data in a cluster based on shared-nothing architecture. This paper proposes a design of a parallel query engi...
 
 1  2 Next >>