Search For:

Displaying 1-19 out of 19 total
The Future of Scientific Data Bases
Found in: Data Engineering, International Conference on
By Michael Stonebraker,Anastasia Ailamaki,Jeremy Kepner,Alex Szalay
Issue Date:April 2012
pp. 7-8
For many decades, users in scientific fields (domain scientists) have resorted to either home-grown tools or legacy software for the management of their data. Technological advancements nowadays necessitate many of the properties such as data independence,...
Performance modeling and analysis of flash-based storage devices
Found in: Mass Storage Systems and Technologies, IEEE / NASA Goddard Conference on
By H. Howie Huang,Shan Li,Alex Szalay,Andreas Terzis
Issue Date:May 2011
pp. 1-11
Flash-based solid-state drives (SSDs) will become key components in future storage systems. An accurate performance model will not only help understand the state-of-the-art of SSDs, but also provide the research tools for exploring the design space of such...
Building Reliable Data Pipelines for Managing Community Data Using Scientific Workflows
Found in: e-Science and Grid Computing, International Conference on
By Yogesh Simmhan, Catharine van Ingen, Alex Szalay, Roger Barga, Jim Heasley
Issue Date:December 2009
pp. 321-328
The growing amount of scientific data from sensors and field observations is posing a challenge to “data valets” responsible for managing them in data repositories. These repositories built on commodity clusters need to reliably ingest data continuously an...
On Building Scientific Workflow Systems for Data Management in the Cloud
Found in: eScience, IEEE International Conference on
By Yogesh Simmhan, Roger Barga, Catharine van Ingen, Ed Lazowska, Alex Szalay
Issue Date:December 2008
pp. 434-435
Scientific workflows have become an archetype to model in silico experiments in the Cloud by scientists. There is a class of workflows that are used to by
Data-Intensive Computing in the 21st Century
Found in: Computer
By Ian Gorton, Paul Greenfield, Alex Szalay, Roy Williams
Issue Date:April 2008
pp. 30-32
The deluge of data that future applications must process—in domains ranging from science to business informatics—creates a compelling argument for substantially increased R&D targeted at discovering scalable hardware and software solutions for data-int...
The sqlLoader Data-Loading Pipeline
Found in: Computing in Science and Engineering
By Alex Szalay, Ani R. Thakar, Jim Gray
Issue Date:January 2008
pp. 38-48
Using a database management system (DBMS) is essential to ensure the data integrity and reliability of large, multidimensional data sets. However, loading multiterabyte data into a DBMS is a time-consuming and error-prone task that the authors have tried t...
The Catalog Archive Server Database Management System
Found in: Computing in Science and Engineering
By Ani R. Thakar, Alex Szalay, George Fekete, Jim Gray
Issue Date:January 2008
pp. 30-37
The multiterabyte Sloan Digital Sky Survey’s (SDSS’s) catalog data is stored in a commercial relational database management system with SQL query access and a built-in query optimizer. The SDSS Catalog Archive Server adds advanced data mining features to t...
Distributing the Sloan Digital Sky Survey Using UDT and Sector
Found in: e-Science and Grid Computing, International Conference on
By Yunhong Gu, Robert L. Grossman, Alex Szalay, Ani Thakar
Issue Date:December 2006
pp. 56
In this paper, we describe a peer-to-peer storage system called Sector that is designed to access and transport large data sets over wide area high performance networks. We also describe our recent experience using Sector to distribute the Sloan Digital Sk...
Estimating Query Result Sizes for Proxy Caching in Scientific Database Federations
Found in: SC Conference
By Tanu Malik, Randal Burns, Nitesh V. Chawla, Alex Szalay
Issue Date:November 2006
pp. 36
<p>In a proxy cache for federations of scientific databases it is important to estimate the size of a query before making a caching decision. With accurate estimates, near-optimal cache performance can be obtained. On the other extreme, inaccurate es...
Petascale Computational Systems
Found in: Computer
By Gordon Bell, Jim Gray, Alex Szalay
Issue Date:January 2006
pp. 110-112
A balanced cyberinfrastructure is necessary to meet growing data-intensive scientific needs.
Batch is Back: CasJobs, Serving Multi-TB Data on the Web
Found in: Web Services, IEEE International Conference on
By William O'Mullane, Nolan Li, María Nieto-Santisteban, Alex Szalay, Ani Thakar
Issue Date:July 2005
pp. 33-40
The Sloan Digital Sky Survey (SDSS) science database describes over 230 million objects and is over 1.6 TB in size. The SDSS Catalog Archive Server (CAS) provides several levels of query interface to the SDSS data via the SkyServer website. Most queries ex...
Migrating a Multiterabyte Archive from Object to Relational Databases
Found in: Computing in Science and Engineering
By Ani Thakar, Alex Szalay, Peter Kunszt, Jim Gray
Issue Date:September 2003
pp. 16-29
<p>A commercial, object-oriented database engine with custom tools for data-mining the multiterabyte Sloan Digital Sky Survey archive did not meet its performance objectives. We describe the problems, technical issues, and process of migrating this l...
Extreme Data-Intensive Scientific Computing
Found in: Computing in Science and Engineering
By Alex Szalay
Issue Date:November 2011
pp. 34-41
Scientific computing increasingly involves massive data; in astronomy, observations and numerical simulations are on the verge of generating petabytes. This new, data-centric computing requires a new look at computing architectures and strategies. Using Am...
Stargazing through a digital veil: managing a large scale sky survey using distributed databases on HPC clusters
Found in: Proceedings of the first annual workshop on High performance computing meets databases (HPCDB '11)
By Alex Szalay, Catharine van Ingen, Jim Heasley, Yogesh Simmhan
Issue Date:November 2011
pp. 33-36
The Sloan Digital Sky Survey established the use of relational databases for the scans and cone searches common to astronomy analyses. The Pan-STARRS project scales up SDSS by melding HPC clusters with hierarchical and spatially partitioned distributed dat...
Migrating a (large) science database to the cloud
Found in: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10)
By Alex Szalay, Ani Thakar
Issue Date:June 2010
pp. 430-434
We report on attempts to put an existing scientific (astronomical) database -- the Sloan Digital Sky Survey (SDSS) science archive [1] - in the cloud. Based on our experience, it is either very frustrating or impossible at this time to migrate an existing,...
An overview of the Open Science Data Cloud
Found in: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10)
By Alex Szalay, Joe Mambretti, Kevin White, Michal Sabala, Robert L. Grossman, Yunhong Gu
Issue Date:June 2010
pp. 377-384
The Open Science Data Cloud is a distributed cloud based infrastructure for managing, analyzing, archiving and sharing scientific datasets. We introduce the Open Science Data Cloud, give an overview of its architecture, provide an update on its current sta...
Efficient scheduling of scientific workflows in a high performance computing cluster
Found in: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments (CLADE '08)
By Alex Szalay, Dan Fay, Dean Guo, Roger S. Barga, Steven Newhouse, Yogesh Simmhan
Issue Date:June 2008
pp. 1-6
The scientific computing community, especially academia is clearly in need of technology to handle and organize the 1-100+ Terabyte datasets coming from computer simulations and scientific instrumentation. In this paper we briefly describe GrayWulf, an exe...
Accelerating large-scale data exploration through data diffusion
Found in: Proceedings of the 2008 international workshop on Data-aware distributed computing (DADC '08)
By Alex Szalay, Ian T. Foster, Ioan Raicu, Yong Zhao
Issue Date:June 2008
pp. 9-18
Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compu...
The world-wide telescope
Found in: Communications of the ACM
By Alex Szalay, Jim Gray
Issue Date:January 1988
pp. 50-55
Mining vast databases of astronomical data, this new online way to see the global structure of the universe promises to be not only a wonderful virtual telescope but an archetype for the evolution of computational science.