Search For:

Displaying 1-50 out of 131 total
Scaling and Parallelizing a Scientific Feature Mining Application Using a Cluster Middleware
Found in: Parallel and Distributed Processing Symposium, International
By Leonid Glimcher, Xuan Zhang, Gagan Agrawal
Issue Date:April 2004
pp. 87b
<p>As scientific simulations are generating large amounts of data, analyzing this data to gain insights into scientific phenomenon is increasingly becoming a challenge. In this paper, we present a case study on the use of a cluster middleware for rap...
 
Light-Weight Data Management Solutions for Visualization and Dissemination of Massive Scientific Datasets - Position Paper
Found in: 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC)
By Gagan Agrawal,Yu Su
Issue Date:November 2012
pp. 1296-1300
With growing computational capabilities of parallel machines, scientific simulations are being performed at finer temporal and spatial scales, leading to an explosion of the output data. At the same time, memory capacity of parallel machines, memory access...
 
MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
Found in: 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
By Wei Jiang,Gagan Agrawal
Issue Date:May 2012
pp. 644-655
Clusters of GPUs have rapidly emerged as the means for achieving extreme-scale, cost-effective, and powerefficient high performance computing. At the same time, high level APIs like map-reduce are being used for developing several types of high-end and/or ...
 
Ex-MATE: Data Intensive Computing with Large Reduction Objects and Its Application to Graph Mining
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Wei Jiang, Gagan Agrawal
Issue Date:May 2011
pp. 475-484
Map-reduce framework has been widely used as the infrastructure for processing large-scale datasets in various domains. Recent work has shown that an alternate API MATE(Mapreduce with an Alternate API), where a reduction object is explicitly maintained and...
 
Translating Chapel to Use FREERIDE: A Case Study in Using an HPC Language for Data-Intensive Computing
Found in: Parallel and Distributed Processing Workshops and PhD Forum, 2011 IEEE International Symposium on
By Bin Ren,Gagan Agrawal,Brad Chamberlain,Steve Deitz
Issue Date:May 2011
pp. 1242-1249
In the last few years, the growing significance of data-intensive computing has been closely tied to the emergence and popularity of new programming paradigms for this class of applications, including Map-Reduce, and new high-level languages for data-inten...
 
Parallelizing an Information Theoretic Co-clustering Algorithm Using a Cloud Middleware
Found in: Data Mining Workshops, International Conference on
By Venkatram Ramanathan, Wenjing Ma, Vignesh T. Ravi, Tantan Liu, Gagan Agrawal
Issue Date:December 2010
pp. 186-193
The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our pri...
 
A Middleware for Developing and Deploying Scalable Remote Mining Services
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Leonid Glimcher, Gagan Agrawal
Issue Date:May 2008
pp. 242-249
In this paper, we consider the problem of developing service-oriented??implementations of data-intensive applications that process data on remote servers. While the existing grid and web-service frameworks allow interoperab ility and flexible resource util...
 
Supporting high performance bioinformatics flat-file data processing using indices
Found in: Parallel and Distributed Processing Symposium, International
By Xuan Zhang, Gagan Agrawal
Issue Date:April 2008
pp. 1-8
As an essential part of in vitro analysis, biological database query has become more and more important in the research process. A few challenges that are specific to bioinformatics applications are data heterogeneity, large data volume and exponential dat...
 
A Performance Prediction Framework for Grid-Based Data Mining Applications
Found in: Parallel and Distributed Processing Symposium, International
By Leonid Glimcher, Gagan Agrawal
Issue Date:March 2007
pp. 84
For a grid middleware to perform resource allocation, prediction models are needed, which can determine how long an application will take for completion on a particular platform or configuration. In this paper, we take the approach that by focusing on the ...
 
Parallelizing XQuery In a Cluster Environment
Found in: Database Engineering and Applications Symposium, International
By Xiaogang Li, Gagan Agrawal
Issue Date:December 2006
pp. 291-294
In this paper, we report on a parallel implementation of XQuery. As XQuery is being used for processing large datasets, and/or for computeintensive applications, efficiency of XQuery implementations is becoming an important issue. Our work has specifically...
 
FREERIDE-G: Supporting Applications that Mine Remote FREERIDE-G: Supporting Applications that Mine Remote
Found in: Parallel Processing, International Conference on
By Leonid Glimcher, Ruoming Jin, Gagan Agrawal
Issue Date:August 2006
pp. 109-118
Analysis of large geographically distributed scientific datasets, also referred to as distributed data-intensive science, has emerged as an important area in recent years. An application that processes data from a remote repository needs to be broken into ...
 
A Compilation Framework for Distributed Memory Parallelization of Data Mining Algorithms
Found in: Parallel and Distributed Processing Symposium, International
By Xiaogang Li, Ruoming Jin, Gagan Agrawal
Issue Date:April 2003
pp. 7a
<p>With the availability of large datasets in a variety of scientific and commercial domains, data mining has emerged as an important area within the last decade. Data mining techniques focus on finding novel and useful patterns or models from large ...
 
A Map-Reduce System with an Alternate API for Multi-core Environments
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Wei Jiang, Vignesh T. Ravi, Gagan Agrawal
Issue Date:May 2010
pp. 84-93
Map-reduce framework has received a significant attention and is being used for programming both large-scale clusters and multi-core systems. While the high productivity aspect of map-reduce has been well accepted, it is not clear if the API results in eff...
 
Supporting fault-tolerance in streaming grid applications
Found in: Parallel and Distributed Processing Symposium, International
By Qian Zhu, Liang Chen, Gagan Agrawal
Issue Date:April 2008
pp. 1-12
This paper considers the problem of supporting and efficiently implementing fault-tolerance for tightly-coupled and pipelined applications, especially streaming applications, in a grid environment. We provide an alternative to basic checkpointing and use t...
 
Supporting Dynamic Migration in Tightly Coupled Grid Applications
Found in: SC Conference
By Liang Chen, Qian Zhu, Gagan Agrawal
Issue Date:November 2006
pp. 28
<p>In recent years, there has been a growing trend towards supporting more tightly coupled applications on the grid, including scientific workflows, applications that use pipelined or data-flow like processing, and distributed streaming applications....
 
A Tool for Supporting Integration Across Multiple Flat?File Datasets
Found in: Bioinformatic and Bioengineering, IEEE International Symposium on
By Xuan Zhang, Gagan Agrawal
Issue Date:October 2006
pp. 141-148
<p>Traditionally, biologists focused on a single research subject. New high-throughput experimental and analytical technologies, such as microarray and BLAST programs, have changed this. An important functionality required now is the ability to proce...
 
Assigning Schema Labels Using Ontology And Hueristics
Found in: Bioinformatic and Bioengineering, IEEE International Symposium on
By Xuan Zhang, Ruoming Jin, Gagan Agrawal
Issue Date:October 2006
pp. 269-280
<p>Bioinformatics data is growing at a phenomenal rate. Besides the exponential growth of individual databases, the number of data depositories is increasing too. Because of the complexity of the biological concepts, bioinformatics data usually has c...
 
Design and Evaluation of a High-Level Interface for Data Mining
Found in: Parallel and Distributed Processing Symposium, International
By Ruoming Jin, Gagan Agrawal
Issue Date:April 2002
pp. 0106
This paper presents a case study in developing an application class specific high-level interface for shared memory parallel programming. The application class we focus on is data mining. With the availability of large datasets in areas like bioinformatics...
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Yi Wang,Wei Jiang,Gagan Agrawal
Issue Date:May 2012
pp. 443-450
Despite the popularity of MapReduce, there are several obstacles to applying it for developing scientific data analysis applications. Current MapReduce implementations require that data be loaded into specialized file systems, like the Hadoop Distributed F...
 
Porting irregular reductions on heterogeneous CPU-GPU configurations
Found in: High-Performance Computing, International Conference on
By Xin Huo,Vignesh T. Ravi,Gagan Agrawal
Issue Date:December 2011
pp. 1-10
Heterogeneous architectures are playing a significant role in High Performance Computing (HPC) today, with the popularity of accelerators like the GPUs, and the new trend towards the integration of CPUs and GPUs. Developing applications that can effectivel...
 
Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Bin Ren,Gagan Agrawal
Issue Date:October 2011
pp. 68-77
Programmer productivity considerations are increasing the popularity of interpreted languages like Python. At the same time, for applications where performance is important, these languages clearly lack even on uniprocessors. In addition, the use of dynami...
 
Stratified Sampling for Data Mining on the Deep Web
Found in: Data Mining, IEEE International Conference on
By Tantan Liu, Fan Wang, Gagan Agrawal
Issue Date:December 2010
pp. 324-333
In recent years, one mode of data dissemination has become extremely popular, which is the deep web. Like any other data source, data mining on the deep web can produce important insights or summary of results. However, data mining on the deep web is chall...
 
Resource Allocation for Distributed Streaming Applications
Found in: Parallel Processing, International Conference on
By Qian Zhu, Gagan Agrawal
Issue Date:September 2008
pp. 414-421
We consider resource allocation for distributed streaming applications running in a grid environment, where continuously streaming data needs to be aggregated andprocessed to produce output streams. Because such an application??comprises a pipeline of proc...
 
Filter Decomposition for Supporting Coarse-Grained Pipelined Parallelism
Found in: Parallel Processing, International Conference on
By Wei Du, Gagan Agrawal
Issue Date:June 2005
pp. 539-546
<p>We consider the filter decomposition problem in supporting coarse-grained pipelined parallelism. This form of parallelism is suitable for data-driven applications in scenarios where the data is available on a repository or a data collection site o...
 
Using Tiling to Scale Parallel Data Cube Construction
Found in: Parallel Processing, International Conference on
By Ruoming Jin, Karthik Vaidyanathan, Ge Yang, Gagan Agrawal
Issue Date:August 2004
pp. 365-372
<p>Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider paral...
 
Packet Size Optimization for Supporting Coarse-Grained Pipelined Parallelism
Found in: Parallel Processing, International Conference on
By Wei Du, Gagan Agrawal
Issue Date:August 2004
pp. 259-266
<p>The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. In this paper, we focus on the problem of choosing packet size, i.e., the u...
 
Compiler and Runtime Analysis for Efficient Communication in Data Intensive Applications
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Renato Ferreira, Joel Saltz, Gagan Agrawal
Issue Date:September 2001
pp. 0231
Abstract: Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler that processes data intensive applications written in a dialect of Java and compiles them for...
 
Interprocedural Compilation of Irregular Applications for Distributed Memory Machines
Found in: SC Conference
By Gagan Agrawal, Joel Saltz
Issue Date:December 1995
pp. 48
Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applicatio...
 
An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Found in: IEEE Transactions on Parallel and Distributed Systems
By Gagan Agrawal, Alan Sussman, Joel Saltz
Issue Date:July 1995
pp. 747-754
<p><it>Abstract</it>—In compiling applications for distributed memory machines, runtime analysis is required when data to be communicated cannot be determined at compile-time. One such class of applications requiring runtime analysis is b...
 
GMRace: Detecting Data Races in GPU Programs via a Low-Overhead Scheme
Found in: IEEE Transactions on Parallel and Distributed Systems
By Mai Zheng,Vignesh T. Ravi,Feng Qin,Gagan Agrawal
Issue Date:January 2014
pp. 104-115
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. While languages like CUDA and OpenCL have eased GPU programming for nongraphical applications, they are still explicitly parallel languages. All paralle...
 
Cost and Accuracy Aware Scientific Workflow Composition for Service-Oriented Environments
Found in: IEEE Transactions on Services Computing
By David Chiu,Gagan Agrawal
Issue Date:October 2013
pp. 470-483
Large-scale scientific data analysis projects have catalyzed service-based workflow management systems. We present an approach for integrating user preferences on completion time and workflow accuracy in a workflow composition system. The relationship betw...
 
A Compression Framework for Multidimensional Scientific Datasets
Found in: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
By Tekin Bicer,Gagan Agrawal
Issue Date:May 2013
pp. 2250-2253
Scientific simulations and instruments can generate tremendous amount of data in short time periods. Since the generated data is used for inferring new knowledge, it is important to efficiently store and provide it to the scientific endeavors. Although par...
 
Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications
Found in: 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
By Tekin Bicer,Jian Yin,David Chiu,Gagan Agrawal,Karen Schuchardt
Issue Date:May 2013
pp. 1205-1216
Compute cycles in high performance systems are increasing at a much faster pace than both storage and wide area bandwidths. To continue improving the performance of large-scale data analytics applications, compression has therefore become promising approac...
 
Supporting a Light-Weight Data Management Layer over HDF5
Found in: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
By Yi Wang,Yu Su,Gagan Agrawal
Issue Date:May 2013
pp. 335-342
Scientific simulations are now being performed at finer temporal and spatial scales, leading to an explosion of the output data, and challenges in storing, managing, disseminating, analyzing, and visualizing these datasets. Tools commonly used today for di...
 
SIMD parallelization of applications that traverse irregular data structures
Found in: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
By Bin Ren,Gagan Agrawal,James R. Larus,Todd Mytkowicz,Tomi Poutanen,Wolfram Schulte
Issue Date:February 2013
pp. 1-10
Fine-grained data parallelism is increasingly common in mainstream processors in the form of longer vectors and on-chip GPUs. This paper develops support for exploiting such data parallelism for a class of non-numeric, non-graphic applications, which perfo...
 
Accelerating MapReduce on a coupled CPU-GPU architecture
Found in: 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
By Linchuan Chen,Xin Huo,Gagan Agrawal
Issue Date:November 2012
pp. 1-11
The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, Ma...
 
ValuePack: Value-based scheduling framework for CPU-GPU clusters
Found in: 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
By Vignesh T. Ravi,Michela Becchi,Gagan Agrawal,Srimat Chakradhar
Issue Date:November 2012
pp. 1-12
Heterogeneous computing nodes are becoming commonplace today, and recent trends strongly indicate that clusters, supercomputers, and cloud environments will increasingly host more heterogeneous resources, with some being massively parallel (e.g., GPU). Wit...
 
Resource Provisioning with Budget Constraints for Adaptive Applications in Cloud Environments
Found in: IEEE Transactions on Services Computing
By Qian Zhu,Gagan Agrawal
Issue Date:September 2012
pp. 497-511
The recent emergence of clouds is making the vision of utility computing realizable, i.e., computing resources and services can be delivered, utilized, and paid for as utilities such as water or electricity. This, however, creates new resource provisioning...
 
Indexing and Parallel Query Processing Support for Visualizing Climate Datasets
Found in: 2012 41st International Conference on Parallel Processing (ICPP)
By Yu Su,Gagan Agrawal,Jonathan Woodring
Issue Date:September 2012
pp. 249-258
With increasing emphasis on analysis of large-scale scientific data, and with growing dataset sizes, a number of new challenges are arising. Particularly, novel data management solutions are needed, which can work together with the existing tools. This pap...
 
Supporting User-Defined Subsetting and Aggregation over Parallel NetCDF Datasets
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Yu Su,Gagan Agrawal
Issue Date:May 2012
pp. 212-219
While dissemination of scientific data is becoming crucial for facilitating scientific discoveries, a key challenge being faced by these efforts is that the dataset sizes continue to grow rapidly. Coupled with the fact that wide area data transfer bandwidt...
 
Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Tekin Bicer,David Chiu,Gagan Agrawal
Issue Date:May 2012
pp. 636-643
Purpose-built clusters permeate many of today's organizations, providing both large-scale data storage and computing. Within local clusters, competition for resources complicates applications with deadlines. However, given the emergence of the cloud's pay-...
 
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Vignesh T. Ravi,Michela Becchi,Wei Jiang,Gagan Agrawal,Srimat Chakradhar
Issue Date:May 2012
pp. 140-147
Heterogeneous architectures comprising a multicore CPU and many-core GPU(s) are increasingly being used within cluster and cloud environments. In this paper, we study the problem of optimizing the overall throughput of a set of applications deployed on a c...
 
A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis
Found in: eScience, IEEE International Conference on
By Ashish Nagavaram,Gagan Agrawal,Michael A. Freitas,Kelly H. Telu,Gaurang Mehta,Rajiv G. Mayani,Ewa Deelman
Issue Date:December 2011
pp. 47-54
There is a growing interest in the use of cloud computing for scientific applications, including scientific workflows. Key attractions of the cloud include the pay-as-you-go model and elasticity. While the elasticity offered by clouds can be beneficial for...
 
A dynamic scheduling framework for emerging heterogeneous systems
Found in: High-Performance Computing, International Conference on
By Vignesh T. Ravi,Gagan Agrawal
Issue Date:December 2011
pp. 1-10
A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Recently, it has become very common for a desktop or a notebook computer to be equipped with both a multi-core CPU and a GPU. App...
 
Parameterized Micro-benchmarking: An Auto-tuning Approach for Complex Applications
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Wenjing Ma,Sriram Krishnamoorthy,Gagan Agrawal
Issue Date:October 2011
pp. 181-182
Auto-tuning has emerged as an important practical method for creating highly optimized code. However, the growing complexity of architectures and applications has resulted in a prohibitively large search space that preclude empirical auto-tuning. Here, we ...
 
A Framework for Data-Intensive Computing with Cloud Bursting
Found in: Cluster Computing, IEEE International Conference on
By Tekin Bicer,David Chiu,Gagan Agrawal
Issue Date:September 2011
pp. 169-177
For many organizations, one attractive use of cloud resources can be through what is referred to as cloud bursting or the hybrid cloud. These refer to scenarios where an organization acquires and manages in-house resources to meet its base need, but can us...
 
An Autonomic Framework for Time and Cost Driven Execution of MPI Programs on Cloud Environments
Found in: Grid Computing, IEEE/ACM International Workshop on
By Aarthi Raveendran,Tekin Bicer,Gagan Agrawal
Issue Date:September 2011
pp. 218-219
This paper gives an overview of a framework for making existing MPI applications elastic, and executing them with user-specified time and cost constraints in a cloud framework. Considering the limitations of the MPI implementations currently available, we ...
 
Evaluating and Optimizing Indexing Schemes for a Cloud-Based Elastic Key-Value Store
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By David Chiu, Apeksha Shetty, Gagan Agrawal
Issue Date:May 2011
pp. 362-371
Cloud computing has emerged to provide virtual, pay-as-you-go computing and storage services over the Internet, where the usage cost directly depends on consumption. One compelling feature in Clouds is elasticity, where a user can demand, and be immediatel...
 
A Framework for Elastic Execution of Existing MPI Programs
Found in: Parallel and Distributed Processing Workshops and PhD Forum, 2011 IEEE International Symposium on
By Aarthi Raveendran,Tekin Bicer,Gagan Agrawal
Issue Date:May 2011
pp. 940-947
There is a clear trend towards using cloud resources in the scientific or the HPC community, with a key attraction of cloud being the {em elasticity} it offers. In executing HPC applications on a cloud environment, it will clearly be desirable to exploit e...
 
Active learning based frequent itemset mining over the deep web
Found in: Data Engineering, International Conference on
By Tantan Liu,Gagan Agrawal
Issue Date:April 2011
pp. 219-230
In recent years, one mode of data dissemination has become extremely popular, which is the deep web. A key characteristics of deep web data sources is that data can only be accessed through the limited query interface they support. This paper develops a me...
 
 1  2 Next >>