Search For:

Displaying 1-50 out of 51 total
Preliminary results in accelerating profile HMM search on FPGAs
Found in: Parallel and Distributed Processing Symposium, International
By Arpith C. Jacob, Joseph M. Lancaster, Jeremy D. Buhler, Roger D. Chamberlain
Issue Date:March 2007
pp. 257
Comparison between biosequences and probabilistic models is an increasingly important part of modern DNA and protein sequence analysis. The large and growing number of such models in today's databases demands computational approaches to searching these dat...
 
Fair Scheduling in an Optical Interconnection Network
Found in: Modeling, Analysis, and Simulation of Computer Systems, International Symposium on
By Ch'ng Shi Baw, Roger D. Chamberlain, Mark A. Franklin
Issue Date:March 1999
pp. 56
Existing fair scheduling schemes have focused primarily on scheduling multiple flows to a single output. The limited work that has focused on scheduling multiple flows to multiple outputs has assumed a non-blocking, slotted-time, packet-based network with ...
 
Hierarchical Discrete-Event Simulation on Hypercube Architectures
Found in: IEEE Micro
By Roger D. Chamberlain, Mark A. Franklin
Issue Date:July 1990
pp. 10-20
<p>The simulation of systems that include components at varying levels of abstraction is addressed. A performance model of a hierarchical discrete-event simulation algorithm running on a hypercube architecture is presented. The model allows the perfo...
 
Optically Interconnected Multicomputers Using Inverted-Graph Topologies
Found in: IEEE Micro
By Roger D. Chamberlain, Robert R. Krchnavek
Issue Date:April 1995
pp. 59-69
Optical technology has made significant contributions to the state-of-the-art for long distance communications, including high reliability, low interference, security benefits, and (most important) very high bandwidth. However, the technology has yet to be...
 
Bloom Filter Performance on Graphics Engines
Found in: Parallel Processing, International Conference on
By Lin Ma,Roger D. Chamberlain,Jeremy D. Buhler,Mark A. Franklin
Issue Date:September 2011
pp. 522-531
Bloom filters are a probabilistic technique for large-scale set membership tests. They exhibit no false negative test results but are susceptible to false positive results. They are well-suited to both large sets and large numbers of membership tests. We i...
 
Crossing Boundaries in TimeTrial: Monitoring Communications across Architecturally Diverse Computing Platforms
Found in: Embedded and Ubiquitous Computing, IEEE/IFIP International Conference on
By Joseph M. Lancaster,Joseph G. Wingbermuehle,Jonathan C. Beard,Roger D. Chamberlain
Issue Date:October 2011
pp. 280-287
Time Trial is a low-impact performance monitor that supports streaming data applications deployed on a variety of architecturally diverse computational platforms, including multicore processors and field-programmable gate arrays. Communication between reso...
 
ScalaPipe: A Streaming Application Generator
Found in: Field-Programmable Custom Computing Machines, Annual IEEE Symposium on
By Joseph G. Wingbermuehle,Roger D. Chamberlain,Ron K. Cytron
Issue Date:May 2012
pp. 244
ScalaPipe is a streaming application generator for heterogeneous platforms. By using a collection of domain-specific languages (DSLs) embedded in the Scala programming language, ScalaPipe allows creation of streaming applications that can run on a variety ...
 
Auto-Pipe: Streaming Applications on Architecturally Diverse Systems
Found in: Computer
By Roger D. Chamberlain, Mark A. Franklin, Eric J. Tyson, James H. Buckley, Jeremy Buhler, Greg Galloway, Saurabh Gayen, Michael Hall, E.F. Berkley Shands, Naveen Singla
Issue Date:March 2010
pp. 42-49
No summary available.
 
Asking for Performance: Exploiting Developer Intuition to Guide Instrumentation with TimeTrial
Found in: High Performance Computing and Communications, 10th IEEE International Conference on
By Joseph M. Lancaster,Joseph G. Wingbermuehle,Roger D. Chamberlain
Issue Date:September 2011
pp. 321-330
Architecturally-diverse systems (containing co-processors such as reconfigurable logic and graphics engines) have received significant attention recently in the high performance computing community. They are new enough, however, that application developmen...
 
Understanding the performance of streaming applications deployed on hybrid systems
Found in: Parallel and Distributed Processing Symposium, International
By Joseph Lancaster, Ron Cytron, Roger D. Chamberlain
Issue Date:April 2008
pp. 1-5
Significant performance gains have been reported by exploiting the specialized characteristics of hybrid computing architectures for a number of streaming applications. While it is straightforward to physically construct these hybrid systems, application d...
 
Design of an Interconnection Network Using VLSI Photonics and Free-Space Optical Technologies
Found in: Parallel Interconnects, International Conference on
By Ch'ng Shi Baw, Roger D. Chamberlain, Mark A. Franklin
Issue Date:October 1999
pp. 52
This paper presents the design and initial analysis of an optically interconnected multiprocessor based on the use of VCSELs (Vertical Cavity Surface Emitting Laser) and free-space optical interconnects. The design is oriented to applications where the per...
 
Optimization of Application-Specific Memories
Found in: IEEE Computer Architecture Letters
By Joseph G. Wingbermuehle,Ron K. Cytron,Roger D. Chamberlain
Issue Date:January 2014
pp. 1-1
Memory access times are the primary bottleneck for many applications today. This “memory wall” is due to the performance disparity between processor cores and main memory. To address the performance gap, we propose the use of custom memory subsystems tailo...
 
Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications
Found in: 2013 IEEE 21st International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)
By Jonathan C. Beard,Roger D. Chamberlain
Issue Date:August 2013
pp. 345-349
Current state of the art systems contain various types of multicore processors, General Purpose Graphics Processing Units (GPGPUs) and occasionally Digital Signal Processors (DSPs) or Field-Programmable Gate Arrays (FPGAs). With heterogeneity comes multipl...
 
Convexity in Non-convex Optimizations of Streaming Applications
Found in: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS)
By Shobana Padmanabhan,Yixin Chen,Roger D. Chamberlain
Issue Date:December 2012
pp. 668-675
Streaming data applications are frequently pipelined and deployed on application-specific systems to meet performance requirements and resource constraints. Typically, there are several design parameters in the algorithms and architectures used that impact...
 
A Memory Access Model for Highly-threaded Many-core Architectures
Found in: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS)
By Lin Ma,Kunal Agrawal,Roger D. Chamberlain
Issue Date:December 2012
pp. 339-347
Many-core architectures are excellent in hiding memory-access latency by low-overhead context switching among a large number of threads. The speedup of algorithms carried out on these machines depends on how well the latency is hidden. If the number of thr...
 
ScalaPipe: A Streaming Application Generator
Found in: 2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)
By Joseph G. Wingbermuehle,Roger D. Chamberlain,Ron K. Cytron
Issue Date:July 2012
pp. 44-53
ScalaPipe is a streaming application generator for heterogeneous platforms. By using a collection of domain-specific languages (DSLs) embedded in the Scala programming language, ScalaPipe allows creation of streaming applications that can run on a variety ...
 
Accelerating Nussinov RNA secondary structure prediction with systolic arrays on FPGAs
Found in: Application-Specific Systems, Architectures and Processors, IEEE International Conference on
By Arpith Jacob, Jeremy Buhler, Roger D. Chamberlain
Issue Date:July 2008
pp. 191-196
RNA structure prediction, or folding, is a compute-intensive task that lies at the core of several search applications in bioinformatics. We begin to address the need for high-throughput RNA folding by accelerating the Nussinov folding algorithm using a 2D...
 
Analytic performance models for bounded queueing systems
Found in: Parallel and Distributed Processing Symposium, International
By Praveen Krishnamurthy, Roger D. Chamberlain
Issue Date:April 2008
pp. 1-8
Pipelined computing applications often have their performance modeled using queueing techniques. While networks with infinite capacity queues have well understood properties, networks with finite capacity queues and blocking between servers have resisted c...
 
Application development on hybrid systems
Found in: SC Conference
By Roger D. Chamberlain, Mark A. Franklin, Eric J. Tyson, Jeremy Buhler, Saurabh Gayen, Patrick Crowley, James H. Buckley
Issue Date:November 2007
pp. 1-10
Hybrid systems consisting of a multitude of different computing device types are interesting targets for high-performance applications. Chip multiprocessors, FPGAs, DSPs, and GPUs can be readily put together into a hybrid system; however, it is not at all ...
 
Direct-Attached Disk Subsystem Performance Assessment
Found in: Storage Network Architecture and Parallel I/Os, IEEE International Workshop on
By Roger D. Chamberlain, Berkley Shands
Issue Date:September 2007
pp. 71-78
Direct-attached storage has historically had the reputation of being less capable than equivalently sized SAN installations. Here, we empirically demonstrate the performance achievable in multiple-terabyte, direct-attached disk subsystems. A number of para...
 
FPGA-accelerated seed generation in Mercury BLASTP
Found in: Field-Programmable Custom Computing Machines, Annual IEEE Symposium on
By Arpith Jacob, Joseph Lancaster, Jeremy Buhler, Roger D. Chamberlain
Issue Date:April 2007
pp. 95-106
BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more runtime or a cluster of machines to keep pace. To addre...
 
Massively Parallel Data Mining Using Reconfigurable Hardware: Approximate String Matching
Found in: Parallel and Distributed Processing Symposium, International
By Qiong Zhang, Roger D. Chamberlain, Ronald S. Indeck, Benjamin M. West, Jason White
Issue Date:April 2004
pp. 259a
Data mining is an application that is commonly executed on massively parallel systems, often using clusters with hundreds of processors. With a disk-based data store, however, the data must first be delivered to the processors before effective mining can t...
 
Gemini: An Optical Interconnection Network for Parallel Processing
Found in: IEEE Transactions on Parallel and Distributed Systems
By Roger D. Chamberlain, Mark A. Franklin, Ch'ng Shi Baw
Issue Date:October 2002
pp. 1038-1055
<p><b>Abstract</b>—The <it>Gemini</it> interconnect is a dual technology (optical and electrical) interconnection network designed for use in tightly-coupled multicomputer systems. It consists of a circuit-switched optical dat...
 
Breaking the Memory Bottleneck with an Optical Data Path
Found in: Simulation Symposium, Annual
By Jason Fritts, Roger D. Chamberlain
Issue Date:April 2002
pp. 0352
This paper demonstrates the capability of optical buses in enabling orders of magnitude greater bandwidth between the processor and off-chip memory in a uniprocessor computer system. Through a simulation-based performance analysis of a 1 GHz processor mode...
 
Performance Predictions for Speculative, Synchronous, VLSI Logic Simulation
Found in: Simulation Symposium, Annual
By Bradley L. Noble, J. Cris Wade, Roger D. Chamberlain
Issue Date:April 2001
pp. 0056
Abstract: VLSI logic simulation is an application area in which execution time improvements can have direct economic benefits. Here, we investigate the use of parallel simulation techniques to improve the performance of VLSI logic simulation, including the...
 
Analytic Performance Model for Speculative, Synchronous, Discrete-Event Simulation
Found in: Parallel and Distributed Simulation, Workshop on
By Bradley L. Noble, Roger D. Chamberlain
Issue Date:May 2000
pp. 35
Performance models exist that reliably describe the execution time and efficiency of parallel discrete-event simulations executed in a synchronous iterative fashion. These performance models incorporate the effects of processor heterogeneity, other process...
 
The Gemini Interconnect: Data Path Measurements and Performance Analysis
Found in: Parallel Interconnects, International Conference on
By Ch'ng Shi Baw, Roger D. Chamberlain, Mark A. Franklin, Michael G. Wrighton
Issue Date:October 1999
pp. 21
The Gemini interconnect is a dual technology (optical and electrical) interconnection network designed for use in tightly-coupled multi-computer systems. It consists of a circuit-switched optical data path in parallel with a packet-switched electrical cont...
 
Parallel Logic Simulation of VLSI Systems
Found in: Design Automation Conference
By Roger D. Chamberlain
Issue Date:June 1995
pp. 139-143
Design verification via simulation is an important component in the development of digital systems. However, with continuing increases in the capabilities of VLSI systems, the simulation task has become a significant bottleneck in the design process. As a ...
 
Performance of a Globally-Clocked Parallel Simulator
Found in: Parallel Processing, International Conference on
By Gregory D. Peterson, Roger D. Chamberlain
Issue Date:August 1993
pp. 289-298
A performance model for a globally-clocked, discrete-event queueing network simulator is developed and validated against measured results. The use of architectural enhancements for improving the performance of the algorithm is investigated. Both scaled and...
 
Performance Model for Speculative Simulation Using Predictive Optimism
Found in: Hawaii International Conference on System Sciences
By Bradley L. Noble, Roger D. Chamberlain
Issue Date:January 1999
pp. 8049
Performance models exist that reliably describe the execution time and efficiency of discrete-event simulations executed in a synchronous iterative fashion. These performance models incorporate the effects of processor heterogeneity, other processor loads ...
   
Rapid RNA Folding: Analysis and Acceleration of the Zuker Recurrence
Found in: Field-Programmable Custom Computing Machines, Annual IEEE Symposium on
By Arpith C. Jacob, Jeremy D. Buhler, Roger D. Chamberlain
Issue Date:May 2010
pp. 87-94
RNA folding is a compute-intensive task that lies at the core of search applications in bioinformatics such as RNAfold and UNAFold. In this work, we analyze the Zuker RNA folding algorithm, which is challenging to accelerate because it is resource intensiv...
 
A Performance Model for Memory Bandwidth Constrained Applications on Graphics Engines
Found in: 2012 IEEE 23rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)
By Lin Ma,Roger D. Chamberlain
Issue Date:July 2012
pp. 24-31
Graphics engines are excellent execution platforms for high-throughput computations that exploit a large degree of available parallelism. The achieved performance is, however, highly dependent on the access patterns that the applicationimposes on the memor...
 
Beyond Execution Time: Expanding the Use of Performance Models
Found in: IEEE Concurrency
By Gregory D. Peterson, Roger D. Chamberlain
Issue Date:June 1994
pp. 37-49
Improved performance is a major motivation for using parallel computation. However, performance models are frequently used only to predict an algorithm's execution time, not to accurately evaluate how the choices of architecture, operating system, interpro...
 
Towards More Effective Spectrum Use Based on Memory Allocation Models
Found in: Computer Software and Applications Conference, Annual International
By John Meier,Christopher Gill,Roger D. Chamberlain
Issue Date:July 2011
pp. 426-435
Modern embedded systems are increasingly likely to be distributed across multiple devices and platforms that must interact with high precision across wireless networks. Traditional ways of managing the wireless radio spectrum suffer from two fundamental li...
 
A Federated Simulation Environment for Hybrid Systems
Found in: Parallel and Distributed Simulation, Workshop on
By Saurabh Gayen, Eric J. Tyson, Mark A. Franklin, Roger D. Chamberlain
Issue Date:June 2007
pp. 198-210
Hybrid computing systems consisting of multiple platform types (e.g., general purpose processors, FPGAs etc.) are increasingly being used to achieve higher performance and lower costs than can be obtained with homogeneous systems (e.g., processor clusters)...
 
Wireless Data Path for a Mobile, Modular Computer System
Found in: Parallel Interconnects, International Conference on
By Jim R. Gilman, Richard A. Livingstonl, Kam Chan, Takayuki D. Kimura, Roger D. Chamberlain
Issue Date:October 1999
pp. 165
We present a comparison of two technologies for use in implementing a wireless data path. The target environment is a mobile, modular computer system that aims to improve the economics and productivity of users that currently use multiple PCs. An inductive...
 
MEMS-Based Optical Switch Design for Reconfigurable, Fault-Tolerant Optical Backplanes
Found in: Parallel Interconnects, International Conference on
By Nicholas R. Jankowski, Christopher Bobcowski, David Zipkin, Robert R. Krchnavek, Roger D. Chamberlain
Issue Date:October 1999
pp. 149
One of the critical components in developing reconfigurable, fault-tolerant optical backplanes is a low-cost optical switch. Microelectromechanical systems (MEMS) are a strong candidate for fabricating a low-cost optical switch. In this paper, we discuss t...
 
Use of simple analytic performance models for streaming data applications deployed on diverse architectures
Found in: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
By Jonathan C. Beard,Roger D. Chamberlain
Issue Date:April 2013
pp. 138-139
Modern hardware is often heterogeneous. With heterogeneity comes multiple abstraction layers that hide underlying complex systems. This complexity makes quantitative performance modeling a difficult task. Designers of high-performance streaming application...
   
Efficient deadlock avoidance for streaming computation with filtering
Found in: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '12)
By Jeremy D. Buhler, Kunal Agrawal, Peng Li, Roger D. Chamberlain
Issue Date:February 2012
pp. 235-246
Parallel streaming computations have been studied extensively, and many languages, libraries, and systems have been designed to support this model of computation. In particular, we consider acyclic streaming computations in which individual nodes can choos...
     
Design space exploration of throughput-optimized arrays from recurrence abstractions (abstract only)
Found in: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays (FPGA '10)
By Arpith C. Jacob, Jeremy D. Buhler, Roger D. Chamberlain
Issue Date:February 2010
pp. 286-286
Many compute-bound software applications have seen order-of-magnitude speedups using application-specific accelerators built on specialized architectures such as field-programmable gate arrays. These architectures are particularly good at implementing syst...
     
Evaluating the use of pre-simulation in VLSI circuit partitioning
Found in: Proceedings of the eighth workshop on Parallel and distributed simulation (PADS '94)
By Cheryl D. Henderson, Roger D. Chamberlain
Issue Date:July 1994
pp. 1-ff
One of the significant difficulties in partitioning logic circuits for distributed simulation is the lack of a priori knowledge concerning the evaluation frequency of individual circuit elements. A number of researchers have resorted to pre-simulation to e...
     
Exploiting lookahead in synchronous parallel simulation
Found in: Proceedings of the 25th conference on Winter simulation (WSC '93)
By Gregory D. Peterson, Roger D. Chamberlain
Issue Date:December 1993
pp. 706-712
This paper describes the CONVERSIM simulation language. CONVERSIM is a developmental general-purpose, discrete-event language which has been used in the classroom to introduce the use and operation of simulators prior to the introduction of languages such ...
     
Theoretical analysis of classic algorithms on highly-threaded many-core GPUs
Found in: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '14)
By Kunal Agrawal, Lin Ma, Roger D. Chamberlain
Issue Date:February 2014
pp. 391-392
The Threaded many-core memory (TMM) model provides a framework to analyze the performance of algorithms on GPUs. Here, we investigate the effectiveness of the TMM model by analyzing algorithms for 3 classic problems -- suffix tree/array for string matching...
     
Assessing the appropriateness of using markov decision processes for RF spectrum management
Found in: Proceedings of the 16th ACM international conference on Modeling, analysis & simulation of wireless and mobile systems (MSWiM '13)
By Benjamin Karaus, Christopher Gill, John Meier, Roger D. Chamberlain, Sreeharsha Sistla, Terry Tidwell
Issue Date:November 2013
pp. 41-48
The stochastic nature of wireless communication suggests a Markov Decision Process (MDP) as a formalism for identifying and evaluating spectrum control policies. However, in practice numerous factors influence the success or failure of a transmission, so t...
     
Decomposition techniques for optimal design-space exploration of streaming applications
Found in: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13)
By Roger D. Chamberlain, Shobana Padmanabhan, Yixin Chen
Issue Date:February 2013
pp. 285-286
Streaming data programs are an important class of applications, for which queueing network models are frequently available. While the design space can be large, decomposition techniques can be effective at design space reduction. We introduce two decomposi...
     
Sorting on architecturally diverse computer systems
Found in: Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA '09)
By Narayan Ganesan, Roger D. Chamberlain
Issue Date:November 2009
pp. 39-46
Sorting is an important problem that forms an essential component of many high-performance applications. Here, we explore the design space of sorting algorithms in recon-figurable hardware, looking to maximize the benefit associated with high-bandwidth, mu...
     
Mercury BLASTP: Accelerating Protein Sequence Alignment
Found in: ACM Transactions on Reconfigurable Technology and Systems (TRETS)
By Arpith Jacob, Brandon Harris, Jeremy Buhler, Joseph Lancaster, Roger D. Chamberlain
Issue Date:June 2008
pp. 1-44
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence...
     
Application development on hybrid systems
Found in: Proceedings of the 2007 ACM/IEEE conference on Supercomputing (SC '07)
By Eric J. Tyson, James H. Buckley, Jeremy Buhler, Mark A. Franklin, Patrick Crowley, Roger D. Chamberlain, Saurabh Gayen
Issue Date:November 2007
pp. 24-31
Hybrid systems consisting of a multitude of different computing device types are interesting targets for high-performance applications. Chip multiprocessors, FPGAs, DSPs, and GPUs can be readily put together into a hybrid system; however, it is not at all ...
     
Empirical performance assessment using soft-core processors on reconfigurable hardware
Found in: Proceedings of the 2007 workshop on Experimental computer science (ExpCS '07)
By Jason Fritts, John Lockwood, Praveen Krishnamurthy, Richard Hough, Roger D. Chamberlain, Ron K. Cytron
Issue Date:June 2007
pp. 18-es
Simulation has been the de facto standard method for performance evaluation of newly proposed ideas in computer architecture for many years. While simulation allows for theoretically arbitrary fidelity (at least to the level of cycle accuracy) as well as t...
     
Parallel logic simulation of VLSI systems
Found in: Proceedings of the 32nd ACM/IEEE conference on Design automation conference (DAC '95)
By Roger D. Chamberlain
Issue Date:June 1995
pp. 139-143
A new gridless router accelerated by Content Addressable Memory (CAM) is presented. A gridless version of the line-expansion algorithm is implemented, which always finds a path if one exists. The router runs in linear time by means of the CAM-based acceler...
     
 1  2 Next >>