This Article 
 Bibliographic References 
 Add to: 
ISP: An Optimal Out-of-Core Image-Set Processing Streaming Architecture for Parallel Heterogeneous Systems
June 2012 (vol. 18 no. 6)
pp. 838-851
C. T. Silva, Six Metrotech Center, Polytech. Inst. of NYU, Brooklyn, NY, USA
J. L. D. Comba, Inst. de Inf., Fed. Univ. of Rio Grande do Sul, Porto Alegre, Brazil
J. Kruger, Univ. of Saarland, Saarbrucken, Germany
L. K. Ha, Sci. Imaging & Comput. Inst., Univ. of Utah, Salt Lake City, UT, USA
S. Joshi, Sci. Imaging & Comput. Inst., Univ. of Utah, Salt Lake City, UT, USA
Image population analysis is the class of statistical methods that plays a central role in understanding the development, evolution, and disease of a population. However, these techniques often require excessive computational power and memory that are compounded with a large number of volumetric inputs. Restricted access to supercomputing power limits its influence in general research and practical applications. In this paper we introduce ISP, an Image-Set Processing streaming framework that harnesses the processing power of commodity heterogeneous CPU/GPU systems and attempts to solve this computational problem. In ISP, we introduce specially designed streaming algorithms and data structures that provide an optimal solution for out-of-core multiimage processing problems both in terms of memory usage and computational efficiency. ISP makes use of the asynchronous execution mechanism supported by parallel heterogeneous systems to efficiently hide the inherent latency of the processing pipeline of out-of-core approaches. Consequently, with computationally intensive problems, the ISP out-of-core solution can achieve the same performance as the in-core solution. We demonstrate the efficiency of the ISP framework on synthetic and real datasets.

[1] A.M. Alattar, “A Probabilistic Filter for Eliminating Temporal Noise in Time-Varying Image Sequences,” Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS '92), vol. 3, pp. 1491-1494, May 1992.
[2] J. Bittner, M. Wimmer, H. Piringer, and W. Purgathofer, “Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful,” Computer Graphics Forum, vol. 23, pp. 615-624, 2004.
[3] G.E. Blelloch, Introduction to Data Compression. Carnegie Mellon Univ., 2010.
[4] R. Bordawekar, A. Choudhary, K. Kennedy, C. Koelbel, and M. Paleczny, “A Model and Compilation Strategy for Out-of-Core Data Parallel Programs,” ACM SIGPLAN Notices, vol. 30, no. 8, pp. 1-10, 1995.
[5] V.M. BoveJr. and J.A. Watlington, “Cheops: A Reconfigurable Data-Flow System for Video Processing,” IEEE Trans. Circuits and Systems for Video Technology, vol. 5, no. 2, pp. 140-149, Apr. 1995.
[6] J.M. Boyce, “Noise Reduction of Image Sequences using Adaptive Motion Compensated Frame Averaging,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '92), vol. 3, pp. 461-464, Mar. 1992.
[7] A.D. Brown, T.C. Mowry, and O. Krieger, “Compiler-based I/O Prefetching for Out-of-Core Applications,” ACM Trans. Computer Systems, vol. 19, no. 2, pp. 111-170, 2001.
[8] M. Burtscher and P. Ratanaworabhan, “High throughput Compression of Double-Precision Floating-Point Data,” Proc. Data Compression Conf. (DCC '07), pp. 293-302, 2007.
[9] E. Caron, F. Desprez, and F. Suter, “Out-of-Core and Pipeline Techniques for Wavefront Algorithms,” Proc. IEEE 19th Int'l Parallel and Distributed Processing Symp. (IPDPS '05), vol. 01, 2005.
[10] T.-J. Chen and K.-S. Chuang, “A Pseudo Lossless Image Compression Method,” Proc. Third Int'l Congress Image and Social Processing (CISP), vol. 2, pp. 610-615, Oct. 2010.
[11] Y.J. Chiang, J. El-Sana, P. Lindstrom, R. Pajarola, and C.T. Silva, “Out-of-Core Algorithms for Scientific Visualization and Computer Graphics,” Proc. IEEE Visualization, 2003.
[12] G.E. Christensen, M.I. Miller, M.W. Vannier, and U. Grenander, “Individualizing Neuroanatomical Atlases Using a Massively Parallel Computer,” Computer, vol. 29, no. 1, pp. 32-38, Jan. 1996.
[13] NVIDIA Corp., “Compute Visual Profiler User Guide,” Oct. 2010.
[14] NVIDIA Corp., NVIDA CUDA Programming Guide 3.2, Oct. 2010.
[15] B.C. Davis, P.T. Fletcher, E. Bullitt, and S. Joshi, “Population Shape Regression from Random Design Data,” Int'l J. Computer Vision, vol. 90, no. 1, pp. 255-266, 2010.
[16] E. Derzapf, N. Menzel, and M. Guthe, “Parallel View-dependent Out-of-Core Progressive Meshes,” Proc. Vision Modeling and Visualization, pp. 25-32, 2010.
[17] T.R. Dial, “Multithreaded Asynchronous I/O and I/O Completion Ports,” 2007.
[18] F. Dufaux and F. Moscheni, “Motion Estimation Techniques for Digital TV: A Review and a New Contribution,” Proc. IEEE, vol. 83, no. 6, pp. 858-876, June 1995.
[19] A. Eklund, M. Andersson, and H. Knutsson, “Phase Based Volume Registration Using CUDA,” Proc. IEEE Int'l Acoustics Speech and Signal Processing (ICASSP), pp. 658-661, Mar. 2010.
[20] T. Engelhardt and C. Dachsbacher, “Granular Visibility Queries on the GPU,” Proc. Symp. Interactive 3D Graphics and Games (I3D '09), pp. 161-167, 2009.
[21] R. Farias and C.T. Silva, “Out-of-Core Rendering of Large, Unstructured Grids,” IEEE Computer Graphics and Applications, vol. 21, no. 4, pp. 42-50, July/Aug. 2001.
[22] P.T. Fletcher, R.T. Whitaker, R. Tao, M.B. DuBray, A. Froehlich, C. Ravichandran, A.L. Alexander, E.D. Bigler, N. Lange, and J.E. Lainhart, “Microstructural Connectivity of the Arcuate Fasciculus in Adolescents with High-functioning Autism,” NeuroImage, vol. 51, no. 3, pp. 1117-1125, 2010.
[23] M.J. Flynn, “Some Computer Organizations and their Effectiveness,” IEEE Trans. Computers, vol. C-21, no. 9, pp. 948-960, Sept. 1972.
[24] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S.M. Seitz, “Multi-View Stereo for Community Photo Collections,” Proc. IEEE 11th Int'l Conf. Computer Vision (ICCV '07), pp. 1-8, Oct. 2007.
[25] L.K. Ha, J. Kruger, P.T. Fletcher, S. Joshi, and C.T. Silva, “Fast Parallel Unbiased Diffeomorphic Atlas Construction on Multi-Graphics Processing Units,” Proc. Eurographics Assoc., 2009.
[26] L.K. Ha, J. Krüger, S. Joshi, and C.T. Silva, “Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs,” vol. I, Elsevier, Jan. 2011.
[27] M. Harris, Optimizing Parallel Reduction in CUDA, 2007.
[28] J. Hays and A.A. Efros, “Scene Completion Using Millions of Photographs,” ACM Trans. Graphics, vol. 26, 2007.
[29] H. Hoppe, “Progressive Meshes,” Proc. ACM SIGGRAPH, pp. 99-108, 1996.
[30] Q. Hou, X. Sun, K. Zhou, C. Lauterbach, and D. Manocha, “Memory-Scalable GPU Spatial Hierarchy Construction,” IEEE Trans. Visualization and Computer Graphics, vol. 17, no. 4, pp. 466-474, Apr. 2011.
[31] C. Hu, G. Yao, J. Wang, and J. Li, “Transforming the Adaptive Irregular Out-of-Core Applications for Hiding Communication and Disk I/O,” Proc. Confederated Int'l Conf. Move to Meaningful Internet Systems (OTM '07), vol. Part II, 2007.
[32] M. Isenburg, P. Lindstrom, and J. Snoeyink, “Lossless Compression of Predicted Floating-Point Geometry,” Computer Aided Design, vol. 37, pp. 869-877, 2005.
[33] M.T. Jones, Boost Application Performance Using Asynchronous I/O. IBM developerWorks, 2006.
[34] S. Joshi, B. Davis, M. Jomier, and G. Gerig, “Unbiased Diffeomorphic Atlas Construction for Computational Anatomy,” NeuroImage, vol. 23, pp. 151-160, 2004.
[35] L. Lamport, “How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs,” IEEE Trans. Computers, vol. 28, no. 9, pp. 690-691, Sept. 1979.
[36] C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, and D. Manocha, “Fast BVH Construction on GPUs,” Proc. EGEurographics, vol. 28, pp. 375-384, 2009.
[37] A. Macovksi, “Tolerating Latency through Software-Controlled Data Prefetching,”, 1994.
[38] H. Meuer, China Grabs Supercomputing Leadership Spot in Latest Ranking of World's Top 500 Supercomputers, Nov. 2010.
[39] T.C. Mowry, A.K. Demke, and O. Krieger, “Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications,” Proc. Second USENIX Symp. Operating Systems Design and Implementation (OSDI '96), vol. 30, no. SI, pp. 3-17, 1996.
[40] S. Preston, L.K. Ha, and S. Joshi. http://www.sci.utah.edusoftware.html, , AtlasWerks: High-Performance Tools for Diffeomorphic 3D Image Registration and Atlas Building, 2012.
[41] S. Rixner, W.J. Dally, U.J. Kapasi, B. Khailany, A. López-Lagunas, P.R. Mattson, and J.D. Owens, “A Bandwidth-Efficient Architecture for Media Processing,” Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO), pp. 3-13, 1998.
[42] M. Roberts, M.C. Sousa, and J.R. Mitchell, “A Work-efficient GPU Algorithm for Level Set Segmentation,” Proc. ACM SIGGRAPH, pp. 53:1-53:1, 2010.
[43] D. Scherzer, L. Yang, and O. Mattausch, “Exploiting Temporal Coherence in Real-Time Rendering,” Proc. ACM SIGGRAPH, pp. 24:1-24:26, 2010.
[44] N. Snavely, R. Garg, S.M. Seitz, and R. Szeliski, “Finding Paths through the World's Photos,” ACM Trans. Graphics, vol. 27, pp. 15:1-15:11, 2008.
[45] N. Snavely, S.M. Seitz, and R. Szeliski, “Photo Tourism: Exploring Photo Collections in 3D,” Proc. ACM SIGGRAPH '06, pp. 835-846, 2006.
[46] N. Sundaram, A. Raghunathan, and S.T. Chakradhar, “A Framework for Efficient and Scalable Execution of Domain-Specific Templates on GPUs,” Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '09), pp. 1-12, 2009.
[47] D. Womble, D. Greenberg, R. Riesen, and S. Wheat, “Out of Core, Out of Mind: Practical Parallel I/O,” Proc. Scalable Parallel Libraries Conf. (SPLC '93), pp. 10-16, 2002.
[48] S.-E. Yoon, P. Lindstrom, V. Pascucci, and D. Manocha, “Cache-Oblivious Mesh Layouts,” ACM Trans. Graphics, vol. 24, pp. 886-893, 2005.
[49] S.-E. Yoon, B. Salomon, R. Gayle, and D. Manocha, “Quick-VDR: Out-of-Core View-Dependent Rendering of Gigantic Models,” IEEE Trans. Visualization Computer Graphics, vol. 11, no. 4, pp. 369-382, July-Aug. 2005.
[50] K. Zhou, Q. Hou, R. Wang, and B. Guo, “Real-Time kd-Tree Construction on Graphics Hardware,” ACM Trans. Graphics, vol. 27, no. 5, pp. 126:1-126:11, 2008.

Index Terms:
statistical analysis,graphics processing units,image processing,parallel processing,pipeline processing,out-of-core approach pipeline processing,ISP framework,optimal out-of-core image-set processing streaming architecture,parallel heterogeneous systems,image population analysis,statistical methods,supercomputing power,CPU-GPU systems,streaming algorithms,data structures,memory usage,computational efficiency,asynchronous execution mechanism,Streaming media,Graphics processing unit,MIMO,Hardware,Computational modeling,Data models,Parallel processing,multiimage processing framework.,GPUs,out-of-core processing,atlas construction,diffeomorphism
C. T. Silva, J. L. D. Comba, J. Kruger, L. K. Ha, S. Joshi, "ISP: An Optimal Out-of-Core Image-Set Processing Streaming Architecture for Parallel Heterogeneous Systems," IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 6, pp. 838-851, June 2012, doi:10.1109/TVCG.2012.32
Usage of this product signifies your acceptance of the Terms of Use.