The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2011 vol.13)
pp: 25-33
Randal E. Bryant , Carnegie Mellon University
ABSTRACT
<p>Increasingly, scientific computing applications must accumulate and manage massive datasets, as well as perform sophisticated computations over these data. Such applications call for data-intensive scalable computer (DISC) systems, which differ in fundamental ways from existing high-performance computing systems.</p>
INDEX TERMS
Data-intensive computing, e-Science, MapReduce
CITATION
Randal E. Bryant, "Data-Intensive Scalable Computing for Scientific Applications", Computing in Science & Engineering, vol.13, no. 6, pp. 25-33, Nov.-Dec. 2011, doi:10.1109/MCSE.2011.73
REFERENCES
1. T. Hey and A. Trefethen, "The Data Deluge: An e-Science Perspective," Grid Computing: Making the Global Infrastructure a Reality, F. Berman, G.C. Fix, and A.J G. Hey eds., 2003, John Wiley & Sons, pp. 809–824.
2. S. Leo, F. Santoni, and G. Zanetti, "Biodoop: Bioinformatics on Hadoop," Proc. Int'l Conf. Parallel Processing Workshops, IEEE CS Press, 2009, pp. 415–422.
3. A.S. Szalay et al., "Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey," Proc. Int'l Conf. Management of Data, ACM Press, 2000, pp. 451–462.
4. L.G. Valiant, "A Bridging Model for Parallel Computation," Comm. ACM, vol. 33, no. 8, 1990, pp. 103–111.
5. H.W. Meuer, "The TOP500 Project: Looking Back over 15 Years of Supercomputing Experience," Informatik-Spektrum, vol. 31, no. 3, 2008, pp. 203–222.
6. E. Allen et al., The Fortress Language Specification, tech. report, Sun Microsystems, 2007.
7. B. Schroeder and G.A. Gibson, "Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?" Proc. Usenix Conf. File and Storage Technologies (FAST), Usenix Assoc., 2007, pp. 1–16.
8. L.A. Barroso and U. Hölze, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool, 2009.
9. S. Ghemawat, H. Gobioff, and S.T. Leung, "The Google File System," Proc. Symp. Operating Systems Principles (SOSP), ACM Press, 2003, pp. 29–43.
10. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Operating Systems Design and Implementation (OSDI), Usenix Assoc., 2004; http://labs.google.com/papersmapreduce-osdi04.pdf .
11. L. Page et al., The PageRank Citation Ranking: Bringing Order to the Web, tech. report, Stanford Digital Library Technologies Project, 1998.
12. J. Dean, "Experiences with MapReduce, an Abstraction for Large-Scale Computation," Proc. ACM Int'l Conf. Parallel Architecture and Compilation Techniques, ACM Press, 2006; doi:10.1145/1152154.1152155.
13. J. Cohen, "Graph Twiddling in a MapReduce World," Computing in Science and Eng., vol. 11, no. 4, 2009, pp. 29–41.
14. J. Ekanayake, S. Pallickara, and G. Fox, "MapReduce for Data Intensive Scientific Analyses," Proc. 4th Int'l Conf. eScience, IEEE Press, 2008, pp. 277–284.
15. K. Kambatla et al., "Asynchronous Algorithms in MapReduce," Proc. Int'l Conf. Cluster Computing, IEEE Press, 2010, pp. 245–254.
16. M. Isard et al., "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," ACM SIGOPS Operating Systems Rev.—EuroSys'07 Conf. Proc., ACM Press, vol. 41, no. 3, 2007; doi:10.1145/1272996.1273005.
17. R. Pike et al., "Interpreting the Data: Parallel Analysis with Sawzall," Scientific Programming, vol. 13, no. 4, 2005, pp. 277–298.
18. C. Olston et al., "Pig Latin: A Not-So-Foreign Language for Data Processing," Proc. SIGMOD Int'l Conf. Management of Data, ACM Press, 2008, pp. 1099–1110.
19. A. Thusoo et al., "Hive: A Warehousing Solution over a Map-Reduce Framework," Proc. Very Large Database Endowment, ACM Press, vol. 2, no. 1, 2009, pp. 1626–1629; www.vldb.org/pvldb/2vldb09-938.pdf.
20. M. Isard and Y. Yu, "Distributed Data-Parallel Computing Using a High-Level Programming Language," Proc. SIGMOD Int'l Conf. Management of Data, ACM Press, 2009, pp. 987–994.
21. Y.C. Kwon et al., "Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster," Scientific and Statistical Database Management, LNCS 6187, Springer-Verlag, 2010, pp. 132–150.
22. U. Kang, C.E. Tsourakakis, and C. Faloutsos, "Pegasus: A Peta-Scale Graph Mining System Implementation and Observations," Proc. Int'l Conf. Data Mining, IEEE Press, 2009, pp. 229–238.
23. U. Kang, B. Meeder, and C. Faloutsos, "Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation," Advances in Knowledge Discovery and Data Mining, LNCS 6635, Springer-Verlag, 2011, pp. 13–25.
24. M. Anderson, "Better Benchmarking for Super-computers," IEEE Spectrum, vol. 48, no. 1, 2011, pp. 12–14.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool