Issue No.06 - June (2010 vol.59)
Bing Bing Zhou , The University of Sydney, Sydney
Chen Wang , CSIRO ICT Center, Australia
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2010.39
The paper presents EvolvingSpace, a data centric distributed system, which is intended to address the data and application integration problem in bioinformatics data centers. The system employs commodity PCs for data storage and computation. EvolvingSpace manages data in a decentralized manner, which is convenient for storing data annotations and can eliminate potential data-access bottlenecks. It indexes distributed data in multilevels to facilitate the construction of complex workflows that consist of applications running on different types of data. In addition, the paper proposes a data locality and workflow aware scheduling algorithm (ES-Scheduling) to balance the data distribution and computing performance as well as throughput and workflow response time. We run extensive experiments using the system with real bioinformatics applications. Our results show that the system is efficient for running integrated bioinformatics applications and has good scalability.
Distributed systems, data sharing, workflow management, data models, scheduling, bioinformatics.
Bing Bing Zhou, Chen Wang, "EvolvingSpace: A Data Centric Framework for Integrating Bioinformatics Applications", IEEE Transactions on Computers, vol.59, no. 6, pp. 721-734, June 2010, doi:10.1109/TC.2010.39