This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Large-Scale Parallel Data Clustering
August 1998 (vol. 20 no. 8)
pp. 871-876

Abstract—Algorithmic enhancements are described that enable large computational reduction in mean square-error data clustering. These improvements are incorporated into a parallel data-clustering tool, P-CLUSTER, designed to execute on a network of workstations. Experiments involving the unsupervised segmentation of standard texture images were performed. For some data sets, a 96 percent reduction in computation was achieved.

[1] U. Fayyad, D. Haussler, and P. Stolorz, "Mining Scientific Data," Comm. ACM, vol. 39, pp. 51-57, Nov. 1996.
[2] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice-Hall, 1988.
[3] A.K. Jain and S. Bhattacharjee, "Text Segmentation Using Gabor Filters for Automatic Document Processing," Machine Vision Applications, vol. 5, pp. 169-184, 1992.
[4] A.K. Jain and P.J. Flynn, "Image Segmentation Using Clustering," Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, pp. 65-83.Los Alamitos, Calif.: IEEE CS Press, 1996.
[5] H.P. Friedman and J. Rubin, "On Some Invariant Criteria for Grouping Data," J. Am. Statistical Assoc., vol. 62, pp. 1,159-1,178, 1967.
[6] D. Judd, N.K. Ratha, P.K. McKinley, J. Weng, and A.K. Jain, "Parallel Implementation of Vision Algorithms on Workstation Clusters," Proc. 12th Int'l Conf. Pattern Recognition, pp. 317-321,Jerusalem, Israel, Oct. 1994.
[7] L.M. Ni and A.K. Jain, "A VLSI Systolic Architecture for Pattern Clustering," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, no. 1, pp. 80-89, Jan. 1985.
[8] K. Hwang and D. Kim, "Parallel Pattern Clustering on a multiprocessor With Orthogonally Shared Memory," Proc. Int'l Conf. Parallel Processing, pp. 913-916, 1987.
[9] J.C. Tilton and J.P. Strong, "Analyzing Remotely Sensed Data on the Massively Parallel Processor," Proc. Seventh Int'l Conf. Pattern Recognition, pp. 398-400,Montreal, 1984.
[10] S. Ranka and S. Sahni, "Clustering on a Hypercube Multicomputer," IEEE Trans. Parallel and Distributed Systems, vol. 2, pp. 129-137, Apr. 1991.
[11] D. Judd, P.K. McKinley, and A.K. Jain, "Computational Pruning Techniques in Parallel Square-Error Clustering of Large Data Sets," Tech. Rep. MSU-CPS-96-02, Dept. of Computer Science, Michigan State Univ., East Lansing, Mich., 1996.
[12] J.B. McQueen, "Some Methods of Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.
[13] E. Forgy, "Cluster Analysis of Multivariate Data: Efficiency Versus Interpretability of Classifications," Biometrics, vol. 21, p. 768, 1965.
[14] K. Fukunaga and P.M. Narendra, "A Branch and Bound Algorithm for Computing k-nearest Neighbors," IEEE Trans. Computers, pp. 750-753, July 1975.
[15] H. Avi-Itzhak and T. Diep, "Lossless Acceleration for Correlation-Based Nearest-Neighbor Pattern Recognition," Proc. 12th Int'l. Conf. Pattern Recognition, Jerusalem, vol. 2, pp. 240-244, 1994.
[16] R. Pausch, N.R. II Young, and R. DeLine, "SUIT: The Pascal of User Interface Toolkits," ACM Symp. User Interface Software and Technology, Proc. UIST'91, Hilton Head, S.C., pp. 117-125, Nov. 1991.
[17] P. Brodatz, Textures: A Photographic Album for Artists and Designers.New York: Dover, 1966.
[18] A.K. Jain and F. Farrokhnia, "Unsupervised Texture Segmentation Using Gabor Filters," Pattern Recognition, vol. 24, no. 12, pp. 1,167-1,186, 1991.
[19] V.S. Sunderam, "PVM: A Framework for Parallel Distributed Computing," Concurrency: Practice and Experience, vol. 2, no. 4, pp. 315-339, Dec. 1990.
[20] Message Passing Interface Forum, "MPI: A Message-Passing Interface Standard," Tech. Rep. CS-94-230, Dept. of Computer Science, Univ. of Tennessee, Knoxville, Tenn., May 1994.
[21] M. Fanty and R. Cole, "Spoken Letter Recognition," Advances in Neural Information Processing Systems 3,San Mateo, Calif., 1991.

Index Terms:
Data clustering, mean square error, data mining, image segmentation, parallel algorithm, network of workstations.
Citation:
Dan Judd, Philip K. McKinley, Anil K. Jain, "Large-Scale Parallel Data Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 871-876, Aug. 1998, doi:10.1109/34.709614
Usage of this product signifies your acceptance of the Terms of Use.