This Article 
 Bibliographic References 
 Add to: 
An Integrated Framework for Visualized and Exploratory Pattern Discovery in Mixed Data
February 2006 (vol. 18 no. 2)
pp. 161-173
Data mining uncovers hidden, previously unknown, and potentially useful information from large amounts of data. Compared to the traditional statistical and machine learning data analysis techniques, data mining emphasizes providing a convenient and complete environment for the data analysis. In this paper, we propose an integrated framework for visualized, exploratory data clustering, and pattern extraction from mixed data. We further discuss its implementation techniques: a generalized self-organizing map (GSOM) and an extended attribute-oriented induction (EAOI), which not only overcome the drawbacks of their original algorithms, but also provide additional analysis capabilities. Specifically, the GSOM facilitates the direct handling of mixed data, including categorical and numeric values. The EAOI enables exploration for major values hidden in the data and, in addition, offers an alternative for processing numeric attributes, instead of generalizing them. A prototype was developed for experiments with synthetic and real data sets, and comparison with those of the traditional approaches. The results confirmed the feasibility of the framework and the superiority of the extended techniques.

[1] U. Fayyad and R. Uthurusammy, “Data Mining and Knowledge Discovery in Databases,” Comm. ACM, vol. 39, pp. 24-26, 1996.
[2] G. Groth, Data Mining: A Hands-On Approach for Business Professionals. Prentice Hall, 1998.
[3] J. Han and M. Kamber, Data Mining Concepts and Techniques. Morgan Kaufmann, 2001.
[4] T.M. Mitchell, Machine Learning. McGraw Hill, 1997.
[5] T. Kohonen, Self-Organizing Maps. Springer-Verlag, 1997.
[6] J. Han, Y. Cai, and N. Cercone, “Data-Driven Discovery of Quantitative Rules in Relational Databases,” IEEE Trans. Knowledge and Data Eng., vol. 5, pp. 29-40, 1993.
[7] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice-Hall, 1988.
[8] G.A. Carpenter and S. Grossberg, “A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine,” Computer Vision, Graphics, and Image Processing, vol. 37, pp. 54-115, 1987.
[9] T. Kohonen, E. Oja, O. Simula, A. Visa, and J. Kangas, “Engineering Applications of the Self-Organizing Map,” Proc. IEEE, vol. 84, no. 10, pp. 1358-1384, Oct. 1996.
[10] A. Visa, “A Texture Classifier Based on Neural Network Principles,” Proc. Int'l Joint Conf. Neural Networks, pp. 491-496, 1990.
[11] M. Kasslin, J. Kangas, and O. Simula, “Process State Monitoring Using Self-Organizing Maps,” Artificial Neural Networks, pp. 1532-1534, 1992.
[12] O. Simula and J. Kangas, “Process Monitoring and Visualization Using Self-Organizing Maps,” Neural Networks for Chemical Eng., 1995.
[13] J. Mantysalo, K. Torkkola, and T. Kohonen, “Mapping Context Dependent Acoustic Information into Context Independent Form by LVQ,” Speech Comm., vol. 14, no. 2, pp. 119-130, 1994.
[14] M. Vapola, O. Simula, T. Kohonen, and P. Merilainen, “Representation and Identification of Fault Conditions of an Aesthesia System by Means of the Self-Organizing Map,” Proc. Int'l Conf. Artificial Neural Networks (ICANN '94), vol. 1, pp. 246-249, 1994.
[15] T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela, “Self-Organization of a Massive Document Collection,” IEEE Trans. Neural Networks, vol. 11, no. 3, pp. 574-585, 2000.
[16] D.R. Chen, R.F. Chang, and Y.L. Huang, “Breast Cancer Diagnosis Using Self-Organizing Map for Sonography,” Ultrasound in Medicine and Biology, vol. 1, no. 26, pp. 405-411, 2000.
[17] A.A. Kramer, D. Lee, and R.C. Axelrod, “Use of a Kohonen Neural Network to Characterize Respiratory Patients for Medical Intervention,” Proc. Conf. Artificial Neural Networks in Medicine and Biology, pp. 192-196, 2000.
[18] N. Kasabov, D. Deng, L. Erzegovezi, M. Fedrizzi, and A. Beber, “On-Line Decision Making and Prediction of Financial and Macroeconomic Parameters on the Case Study of the European Monetary Union,” Proc. ICSC Symp. Neural Computation, 2000.
[19] G.J. Deboeck, “Modeling Non-Linear Market Dynamics for Intra-Day Trading,” Neural-Network-World, vol. 1, no. 10, pp. 3-27, 2000.
[20] S. Kaski and T. Kohonen, “Exploratory Data Analysis by the Self-Organizing Map: Structures of Welfare and Poverty in the World,” Neural-Networks in Financial Eng., pp. 498-507, 1996.
[21] J. Vesanto and E. Alhoniemi, “Clustering of the Self-Organizing Map,” IEEE Trans. Neural Networks, vol. 11, no. 3, pp. 586-600, May 2000.
[22] M.Y. Kiang, U.R. Kulkarni, and K.Y. Tam, “Self-Organizing Map Network as an Interactive Clustering Tool— An Application to Group Technology,” Decision Support Systems, pp. 351-374, 1995.
[23] M.Y. Kiang, “Extending the Kohonen Self-Organizing Map Networks for Clustering Analysis,” Computational Statistics and Data Analysis, vol. 38, pp. 161-180, 2001.
[24] Z. Huang, “Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Data Mining and Knowledge Discovery, vol. 2, no. 3, Sept. 1998.
[25] Z. Huang and M.K. Ng, “A Fuzzy k-Modes Algorithm for Clustering Categorical Data,” IEEE Trans. Fuzzy Systems, vol. 7, no. 4, pp. 446-452, 1999.
[26] M.K. Ng and J.C. Wong, “Clustering Categorical Data Sets Using Tabu Search Techniques,” Pattern Recognition, vol. 35, pp. 2783-2790, 2002.
[27] J. Catlett, “Megainduction: Machine Learning on Very Large Databases,” PhD dissertation, Univ. of Sydney, 1991.
[28] U.M. Fayyad and K.B. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[29] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[30] R. Kerber, “Chimerge: Discretization of Numeric Attributes,” Proc. Ninth Nat'l Conf. Artificial Intelligence, pp. 123-128, 1992.
[31] H. Liu and R. Setiono, “Chi2: Feature Selection and Discretization of Numeric Attributes,” Proc. Seventh IEEE Int'l Conf. Tools with Artificial Intelligence, pp. 388-391, 1995.
[32] J. Han and Y. Fu, “Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases,” Proc. AAAI '94 Workshop Knowledge Discovery in Databases (KDD '94), pp. 157-168, 1994.
[33] C.-C. Hsu, “Extending Attributed-Oriented Induction Algorithm for Major Attribute Values and Numeric Values,” Expert Systems with Applications, vol. 27, no. 2, pp. 187-202, 2004.
[34] T. Kohonen, J. Hynninen, J. Kangas, and J. Laaksonen, “SOM_ PAK: The Self-Organizing Map Program Package,” Technical Report A31, Laboratory of Computer and Information Science, Helsinki Univ. of Technology, Espoo, Finland, 1996.
[35] P.M. Murphy and D.W. Aha, UCI Repository of Machine Learning Databases, html , 1992.
[36] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sept. 1999.
[37] M. Gluck and J. Corter, “Information, Uncertainty, and the Utility of Categories,” Proc. Seventh Ann. Conf. Cognitive Soc., pp. 283-287, 1985.
[38] D. Barbara, J. Couto, and Y. Li, “COOLCAT: An Entropy-Based Algorithm for Categorical Clustering,” Proc. 11th Int'l Conf. Information and Knowledge Management, pp. 582-589, 2002.
[39] M. Dash and H. Liu, “Feature Selection Methods for Classification,” Intelligent Data Analysis: An Int'l J., vol. 1, 1997.
[40] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[41] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, 1998.

Index Terms:
Index Terms- Attribute-oriented induction, clustering, data mining, pattern discovery, self-organizing map.
Chung-Chian Hsu, Sheng-Hsuan Wang, "An Integrated Framework for Visualized and Exploratory Pattern Discovery in Mixed Data," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 2, pp. 161-173, Feb. 2006, doi:10.1109/TKDE.2006.23
Usage of this product signifies your acceptance of the Terms of Use.