Search For:

Displaying 1-13 out of 13 total
Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures
Found in: 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC)
By Mikhail Smelyanskiy,Jason Sewall,Dhiraj D. Kalamkar,Nadathur Satish,Pradeep Dubey,Nikita Astafiev,Ilya Burylov,Andrey Nikolaev,Sergey Maidanov,Shuo Li,Sunil Kulkarni,Charles H. Finan,Ekaterina Gonina
Issue Date:November 2012
pp. 1154-1162
Abstract -- In the past 20 years, computerization has driven explosive growth in the volume of financial markets and in the variety of traded financial instruments. Increasingly sophisticated mathematical and statistical methods and rapidly expanding compu...
Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing
Found in: 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
By Nadathur Satish,Changkyu Kim,Jatin Chhugani,Pradeep Dubey
Issue Date:November 2012
pp. 1-11
Graph traversal is a widely used algorithm in a variety of fields, including social networks, business analytics, and high-performance computing among others. There has been a push for HPC machines to be rated not just in Petaflops, but also in "GigaT...
DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing
Found in: IEEE Micro
By Venkatraman Govindaraju,Chen-Han Ho,Tony Nowatzki,Jatin Chhugani,Nadathur Satish,Karthikeyan Sankaralingam,Changkyu Kim
Issue Date:September 2012
pp. 38-51
The DySER (Dynamically Specializing Execution Resources) architecture supports both functionality specialization and parallelism specialization. By dynamically specializing frequently executing regions and applying parallelism mechanisms, DySER provides ef...
Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency
Found in: 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
By Jatin Chhugani,Nadathur Satish,Changkyu Kim,Jason Sewall,Pradeep Dubey
Issue Date:May 2012
pp. 378-389
Graph-based structures are being increasingly used to model data and relations among data in a number of fields. Graph-based databases are becoming more popular as a means to better represent such data. Graph traversal is a key component in graph algorithm...
Designing efficient sorting algorithms for manycore GPUs
Found in: Parallel and Distributed Processing Symposium, International
By Nadathur Satish,Mark Harris,Michael Garland
Issue Date:May 2009
pp. 1-10
We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort and our merge sort is the fastest comparison-base...
An automated exploration framework for FPGA-based soft multiprocessor systems
Found in: Hardware/software codesign and system synthesis, International conference on
By Kurt Keutzer, Kaushik Ravindran, Nadathur Satish, Yujia Jin
Issue Date:September 2005
pp. 273-278
FPGA-based soft multiprocessors are viable system solutions for high performance applications. They provide a software abstraction to enable quick implementations on the FPGA. The multiprocessor can be customized for a target application to achieve high pe...
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
Found in: SC Conference
By Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, Pradeep Dubey
Issue Date:November 2010
pp. 1-13
Stencil computation sweeps over a spatial grid over multiple time steps to perform nearest-neighbor computations. The bandwidth-to-compute requirement for a large class of stencil kernels is very high, and their performance is bound by the available memory...
Can traditional programming bridge the Ninja performance gap for parallel computing applications?
Found in: Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12)
By Changkyu Kim, Hideki Saito, Jatin Chhugani, Mikhail Smelyanskiy, Milind Girkar, Nadathur Satish, Pradeep Dubey, Rakesh Krishnaiyer
Issue Date:June 2012
pp. 440-451
Current processor trends of integrating more cores with wider SIMD units, along with a deeper and complex memory hierarchy, have made it increasingly more challenging to extract performance from applications. It is believed by some that traditional approac...
Designing fast architecture-sensitive tree search on modern multicore/many-core processors
Found in: ACM Transactions on Database Systems (TODS)
By Anthony D. Nguyen, Changkyu Kim, Eric Sedlar, Jatin Chhugani, Nadathur Satish, Pradeep Dubey, Scott A. Brandt, Tim Kaldewey, Victor W. Lee
Issue Date:December 2011
pp. 1-34
In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures ...
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Found in: Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10)
By Anthony D. Nguyen, Changkyu Kim, Daehyun Kim, Jatin Chhugani, Michael Deisher, Mikhail Smelyanskiy, Nadathur Satish, Per Hammarlund, Pradeep Dubey, Ronak Singhal, Srinivas Chennupaty, Victor W. Lee
Issue Date:June 2010
pp. 72-ff
Recent advances in computing have led to an explosion in the amount of data being generated. Processing the ever-growing data in a timely manner has made throughput computing an important aspect for emerging applications. Our analysis of a set of important...
ClearPath: highly parallel collision avoidance for multi-agent simulation
Found in: Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '09)
By Changkyu Kim, Dinesh Manocha, Jatin Chhugani, Ming Lin, Nadathur Satish, Pradeep Dubey, Stephen. J. Guy
Issue Date:August 2009
pp. 177-187
We present a new local collision avoidance algorithm between multiple agents for real-time simulations. Our approach extends the notion of velocity obstacles from robotics and formulates the conditions for collision free navigation as a quadratic optimizat...
A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors
Found in: Proceedings of the conference on Design, automation and test in Europe (DATE '07)
By Kaushik Ravindran, Kurt Keutzer, Nadathur Satish
Issue Date:April 2007
pp. 57-62
We present a decomposition strategy to speed up constraint optimization for a representative multiprocessor scheduling problem. In the manner of Benders decomposition, our technique solves relaxed versions of the problem and iteratively learns constraints ...
Soft multiprocessor systems for network applications (abstract only)
Found in: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays (FPGA '05)
By Kaushik Ravindran, Kurt Keutzer, Nadathur Satish, William Plishker, Yujia Jin
Issue Date:February 2005
pp. 271-271
Modern network applications require devices that provide high-performance at gigabit line rates with the flexibility to support diverse application standards and services. However, prohibitive product design costs and shrinking market windows restrict the ...