This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data-Intensive Scalable Computing for Scientific Applications
Nov.-Dec. 2011 (vol. 13 no. 6)
pp. 25-33
Randal E. Bryant, Carnegie Mellon University

Increasingly, scientific computing applications must accumulate and manage massive datasets, as well as perform sophisticated computations over these data. Such applications call for data-intensive scalable computer (DISC) systems, which differ in fundamental ways from existing high-performance computing systems.

1. T. Hey and A. Trefethen, "The Data Deluge: An e-Science Perspective," Grid Computing: Making the Global Infrastructure a Reality, F. Berman, G.C. Fix, and A.J G. Hey eds., 2003, John Wiley & Sons, pp. 809–824.
2. S. Leo, F. Santoni, and G. Zanetti, "Biodoop: Bioinformatics on Hadoop," Proc. Int'l Conf. Parallel Processing Workshops, IEEE CS Press, 2009, pp. 415–422.
3. A.S. Szalay et al., "Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey," Proc. Int'l Conf. Management of Data, ACM Press, 2000, pp. 451–462.
4. L.G. Valiant, "A Bridging Model for Parallel Computation," Comm. ACM, vol. 33, no. 8, 1990, pp. 103–111.
5. H.W. Meuer, "The TOP500 Project: Looking Back over 15 Years of Supercomputing Experience," Informatik-Spektrum, vol. 31, no. 3, 2008, pp. 203–222.
6. E. Allen et al., The Fortress Language Specification, tech. report, Sun Microsystems, 2007.
7. B. Schroeder and G.A. Gibson, "Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?" Proc. Usenix Conf. File and Storage Technologies (FAST), Usenix Assoc., 2007, pp. 1–16.
8. L.A. Barroso and U. Hölze, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool, 2009.
9. S. Ghemawat, H. Gobioff, and S.T. Leung, "The Google File System," Proc. Symp. Operating Systems Principles (SOSP), ACM Press, 2003, pp. 29–43.
10. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Operating Systems Design and Implementation (OSDI), Usenix Assoc., 2004; http://labs.google.com/papersmapreduce-osdi04.pdf .
11. L. Page et al., The PageRank Citation Ranking: Bringing Order to the Web, tech. report, Stanford Digital Library Technologies Project, 1998.
12. J. Dean, "Experiences with MapReduce, an Abstraction for Large-Scale Computation," Proc. ACM Int'l Conf. Parallel Architecture and Compilation Techniques, ACM Press, 2006; doi:10.1145/1152154.1152155.
13. J. Cohen, "Graph Twiddling in a MapReduce World," Computing in Science and Eng., vol. 11, no. 4, 2009, pp. 29–41.
14. J. Ekanayake, S. Pallickara, and G. Fox, "MapReduce for Data Intensive Scientific Analyses," Proc. 4th Int'l Conf. eScience, IEEE Press, 2008, pp. 277–284.
15. K. Kambatla et al., "Asynchronous Algorithms in MapReduce," Proc. Int'l Conf. Cluster Computing, IEEE Press, 2010, pp. 245–254.
16. M. Isard et al., "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," ACM SIGOPS Operating Systems Rev.—EuroSys'07 Conf. Proc., ACM Press, vol. 41, no. 3, 2007; doi:10.1145/1272996.1273005.
17. R. Pike et al., "Interpreting the Data: Parallel Analysis with Sawzall," Scientific Programming, vol. 13, no. 4, 2005, pp. 277–298.
18. C. Olston et al., "Pig Latin: A Not-So-Foreign Language for Data Processing," Proc. SIGMOD Int'l Conf. Management of Data, ACM Press, 2008, pp. 1099–1110.
19. A. Thusoo et al., "Hive: A Warehousing Solution over a Map-Reduce Framework," Proc. Very Large Database Endowment, ACM Press, vol. 2, no. 1, 2009, pp. 1626–1629; www.vldb.org/pvldb/2vldb09-938.pdf.
20. M. Isard and Y. Yu, "Distributed Data-Parallel Computing Using a High-Level Programming Language," Proc. SIGMOD Int'l Conf. Management of Data, ACM Press, 2009, pp. 987–994.
21. Y.C. Kwon et al., "Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster," Scientific and Statistical Database Management, LNCS 6187, Springer-Verlag, 2010, pp. 132–150.
22. U. Kang, C.E. Tsourakakis, and C. Faloutsos, "Pegasus: A Peta-Scale Graph Mining System Implementation and Observations," Proc. Int'l Conf. Data Mining, IEEE Press, 2009, pp. 229–238.
23. U. Kang, B. Meeder, and C. Faloutsos, "Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation," Advances in Knowledge Discovery and Data Mining, LNCS 6635, Springer-Verlag, 2011, pp. 13–25.
24. M. Anderson, "Better Benchmarking for Super-computers," IEEE Spectrum, vol. 48, no. 1, 2011, pp. 12–14.

Index Terms:
Data-intensive computing, e-Science, MapReduce
Citation:
Randal E. Bryant, "Data-Intensive Scalable Computing for Scientific Applications," Computing in Science and Engineering, vol. 13, no. 6, pp. 25-33, Nov.-Dec. 2011, doi:10.1109/MCSE.2011.73
Usage of this product signifies your acceptance of the Terms of Use.