Search For:

Displaying 1-10 out of 10 total
Fault Tolerance and Scaling in e-Science Cloud Applications: Observations from the Continuing Development of MODISAzure
Found in: eScience, IEEE International Conference on
By Jie Li, Marty Humphrey, You-Wei Cheah, Youngryel Ryu, Deb Agarwal, Keith Jackson, Catharine van Ingen
Issue Date:December 2010
pp. 246-253
It can be natural to believe that many of the traditional issues of scale have been eliminated or at least greatly reduced via cloud computing. That is, if one can create a seemingly well functioning cloud application that operates correctly on small or mo...
Bridging the Gap between Desktop and the Cloud for eScience Applications
Found in: 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD 2010)
By Yogesh Simmhan,Catharine van Ingen,Girish Subramanian, Jie Li
Issue Date:July 2010
pp. 474-481
The widely discussed scientific data deluge creates a need to computationally scale out eScience applications beyond the local desktop and cope with variable loads over time. Cloud computing offers a scalable, economic, on-demand model well matched to thes...
eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform
Found in: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
By Jie Li,Marty Humphrey,Deb Agarwal,Keith Jackson,Catharine van Ingen, Youngryel Ryu
Issue Date:April 2010
pp. 1-10
The combination of low-cost sensors, low-cost commodity computing, and the Internet is enabling a new era of data-intensive science. The dramatic increase in this data availability has created a new challenge for scientists: how to process the data. Scient... Publication and Curation of Shared Scientific Climate and Earth Sciences Data
Found in: e-Science and Grid Computing, International Conference on
By Marty Humphrey, Deb Agarwal, Catharine van Ingen
Issue Date:December 2009
pp. 118-125
Many of today’s large-scale scientific projects attempt to collect data from a diverse set of sources. The traditional campaign-style approach to “synthesis” efforts gathers data through a single concentrated effort, and the data contributors know in advan...
Building Reliable Data Pipelines for Managing Community Data Using Scientific Workflows
Found in: e-Science and Grid Computing, International Conference on
By Yogesh Simmhan, Catharine van Ingen, Alex Szalay, Roger Barga, Jim Heasley
Issue Date:December 2009
pp. 321-328
The growing amount of scientific data from sensors and field observations is posing a challenge to “data valets” responsible for managing them in data repositories. These repositories built on commodity clusters need to reliably ingest data continuously an...
Environmental Monitoring 2.0
Found in: Data Engineering, International Conference on
By Sebastian Michel, Ali Salehi, Liqian Luo, Nicholas Dawes, Karl Aberer, Guillermo Barrenetxea, Mathias Bavay, Aman Kansal, K. Ashwin Kumar, Suman Nath, Marc Parlange, Stewart Tansley, Catharine van Ingen, Feng Zhao, Yongluan Zhou
Issue Date:April 2009
pp. 1507-1510
A sensor network data gathering and visualization infrastructure is demonstrated, comprising of Global Sensor Networks (GSN) middleware and Microsoft SensorMap. Users are invited to actively participate in the process of monitoring real-world deployments a...
On Building Scientific Workflow Systems for Data Management in the Cloud
Found in: eScience, IEEE International Conference on
By Yogesh Simmhan, Roger Barga, Catharine van Ingen, Ed Lazowska, Alex Szalay
Issue Date:December 2008
pp. 434-435
Scientific workflows have become an archetype to model in silico experiments in the Cloud by scientists. There is a class of workflows that are used to by
GrayWulf: Scalable Software Architecture for Data Intensive Computing
Found in: Hawaii International Conference on System Sciences
By Yogesh Simmhan, Roger Barga, Catharine van Ingen, Maria Nieto-Santisteban, Lazslo Dobos, Nolan Li, Michael Shipway, Alexander S. Szalay, Sue Werner, Jim Heasley
Issue Date:January 2009
pp. 1-10
Big data presents new challenges to both cluster infrastructure software and parallel application design. We present a set of software services and design principles for data intensive computing with petabyte data sets, named GrayWulf. These services are i...
GrayWulf: Scalable Clustered Architecture for Data Intensive Computing
Found in: Hawaii International Conference on System Sciences
By Alexander S. Szalay, Gordon Bell, Jan Vandenberg, Alainna Wonders, Randal Burns, Dan Fay, Jim Heasley, Tony Hey, Maria Nieto-Santisteban, Ani Thakar, Catharine van Ingen, Richard Wilton
Issue Date:January 2009
pp. 1-10
Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commo...
Stargazing through a digital veil: managing a large scale sky survey using distributed databases on HPC clusters
Found in: Proceedings of the first annual workshop on High performance computing meets databases (HPCDB '11)
By Alex Szalay, Catharine van Ingen, Jim Heasley, Yogesh Simmhan
Issue Date:November 2011
pp. 33-36
The Sloan Digital Sky Survey established the use of relational databases for the scans and cone searches common to astronomy analyses. The Pan-STARRS project scales up SDSS by melding HPC clusters with hierarchical and spatially partitioned distributed dat...