
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Clara Pizzuti, Domenico Talia, "PAutoClass: Scalable Parallel Clustering for Mining Large Data Sets," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 629641, May/June, 2003.  
BibTex  x  
@article{ 10.1109/TKDE.2003.1198395, author = {Clara Pizzuti and Domenico Talia}, title = {PAutoClass: Scalable Parallel Clustering for Mining Large Data Sets}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {15}, number = {3}, issn = {10414347}, year = {2003}, pages = {629641}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1198395}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  PAutoClass: Scalable Parallel Clustering for Mining Large Data Sets IS  3 SN  10414347 SP629 EP641 EPD  629641 A1  Clara Pizzuti, A1  Domenico Talia, PY  2003 KW  Data mining KW  parallel processing KW  knowledge discovery KW  data clustering KW  unsupervised classification KW  isoefficiency KW  scalability. VL  15 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—Data clustering is an important task in the area of data mining. Clustering is the unsupervised classification of data items into homogeneous groups called
[1] M.S. Aldenderfer and R.K. Blashfield, Cluster Analysis. Sage Publication, 1986.
[2] R. Aggrawal et al., "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1998, pp. 94105.
[3] K. Alsabti, S. Ranka, and V. Singh, “An Efficient KMeans Clustering Algorithm,” Proc. First Workshop High Performance Data Mining, 1998.
[4] P. Cheeseman, J. Stutz, M. Self, W. Taylor, J. Goebel, K. Volk, and H. Walker, Automatic Classification of Spectra from the Infrared Astronomical Satellite (IRAS), NASA Reference Publication 1217, 1989.
[5] P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, pp. 6183, 1996.
[6] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data Via the EM Algorithm,” J. Royal Statistical Soc., Series B, vol. 39, no. 1, pp. 138, 1977.
[7] B.S. Everitt and D.J. Hand, Finite Mixture Distribution. London: Chapman and Hall, 1981.
[8] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, AAAI Press, 1996.
[9] B. Everitt, Cluster Analysis. London: Heinemann Educational Books Ltd, 1977.
[10] A.A. Freitas and S.H. Lavington, Mining Very Large Databases with Parallel Processing. Kluwer Academic Publishers, 1998.
[11] U.M. Fayyad, G. PiateskyShapiro, and P. Smith, “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, U.M. Fayyad et al., eds., pp. 134, 1996.
[12] G. Folino, G. Spezzano, and D. Talia, “Evaluating and Modeling Communication Overhead of MPI Primitives on the Meiko CS2,” Recent Advances in Parallel Virtual Machine and Message Passing Interface—Proc. EuroPVM/MPI '98, pp. 2735, Sept. 1998.
[13] A. Grama, A. Gupta, and V. Kumar, “Isoefficiency Function: A Scalability Metric for Parallel Algorithms and Architectures,” IEEE Trans. Parallel and Distributed Technology, vol. 1, no. 3, pp. 1221, 1993.
[14] S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases Proc. ACM SIGMOD, pp. 7384, June 1998.
[15] L. Hunter and D.J. States, “Bayesian Classification of Protein Structure,” Expert, vol. 7, no. 4, pp. 6775, 1992.
[16] G. Karypis, EH. Han, and V. Kumar, "Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling," Computer, Aug. 1999, pp. 6875.
[17] B. Kanefsky, J. Stutz, P. Cheeseman, and W. Taylor, “An Improved Automatic Classification of a Landsat/TM Image from Kansas (FIFE),” Technical Report FIA9401, NASA Ames Research Center, May 1994.
[18] L. Kaufman and P.J. Rousseew, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.
[19] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[20] M.N. Murty, A.K. Jain, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264323, 1999.
[21] D. Judd, P. McKinley, and A. Jain, “Performance Evaluation on LargeScale Parallel Clustering in NOW Environments,” Proc. Eighth SIAM Conf. Parallel Processing for Scientific Computing, Mar. 1997.
[22] D. Judd, P. McKinley, and A.K. Jain, “LargeScale Parallel Data Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 871876, Aug. 1998.
[23] R. Miller and Y. Guo, “Parallelisation of AutoClass,” Proc. Parallel Computing Workshop (PCW '97), 1997.
[24] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144155.
[25] C.F. Olson, “Parallel Algorithms for Hierarchical Clustering,” Parallel Computing, vol. 21, pp. 13131325, 1995.
[26] J.T. Potts, “Seeking Parallelism in Discovery Programs,” master thesis, Univ. of Texas at Arlington, 1996.
[27] K. Stoffel and A. Belkoniene, “Parallel KMeans Clustering for Large Data Sets,” Proc. EuroPar '99—Parallel Processing, pp. 14511454, 1999.
[28] D.B. Skillicorn and D. Talia, “Models and Languages for Parallel Computation,” ACM Computing Surveys, vol. 30, no. 2, pp. 123169, 1998.
[29] D. Talia, “Esplicitazione del Parallelismo nelle Tecniche di Data Mining,” Proc. Sistemi Evoluti per Basi di Dati (SEBD '99), pp. 387401, June 1999.
[30] D.M. Titterington, A.F.M. Smith, and U.E. Makov, Statistical Analysis of Finite Mixture Distribution. New York: John Wiley and Sons, 1985.
[31] Z. Xu and K. Hwang, "Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2," IEEE Parallel&Distributed Technology, Vol. 4, No. 1, Spring 1996, pp. 923.
[32] T. Zhang, R. Ramakrishnan, and M. Livny, "Birch: An Efficient Data Clustering Method for Very Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1996, pp. 103114.