This Article 
 Bibliographic References 
 Add to: 
SQL in the Clouds
July/August 2009 (vol. 11 no. 4)
pp. 12-28
James L. Johnson, Western Washington University
In a cloud computing context, the MapReduce algorithm comprises two massively parallel operations linked by a generic sorting and data-distribution process. Although this algorithm is the workhorse in most cloud computing strategies, it's a special case of a more general dataflow. In place of the two cloud operations, the proposed method substitutes longer sequences and then lets the user direct outputs to any subsequent downstream operation. However, the method retains the job-supervisor infrastructure, which performs the necessary sorting, collating, and distributing of these outputs prior to initiating operations. To evaluate SQL database queries, particularly those with correlated subqueries, a computation identifies and aligns data elements from widely separated storage locations, suggesting cloud algorithms that exploit the supervisory sorting process to achieve the desired alignments. Exploring such algorithms reveals that a few customizable templates, assembled recursively as necessary, can handle a wide class of SQL data-mining queries.

1. J.L. Johnson, Database: Models, Languages, Design, Oxford Univ. Press, 1997.
2. C. Germaine et al., "Online Estimation for Subset-Based SQL Queries," VLDB J., vol. 16, no. 1, 2007, pp. 745–756.
3. S.R. Valluri and K. Karlapalem, Subset Queries in Relational Databases, tech. report TR 2003-1, Univ. Augsburg, 2004; .
4. H. Garcia-Molina, J.D. Ullman, and J. Widom, Database Systems: The Complete Book, 2nd ed., Prentice-Hall, 2008.

Index Terms:
cloud computing, distributed computing, dataflow architectures, database structures, database query evaluation, SQL cloud algorithms
James L. Johnson, "SQL in the Clouds," Computing in Science and Engineering, vol. 11, no. 4, pp. 12-28, July-Aug. 2009, doi:10.1109/MCSE.2009.127
Usage of this product signifies your acceptance of the Terms of Use.