
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Tom Chau, Andrew K.C. Wong, "Pattern Discovery by Residual Analysis and Recursive Partitioning," IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 6, pp. 833852, November/December, 1999.  
BibTex  x  
@article{ 10.1109/69.824592, author = {Tom Chau and Andrew K.C. Wong}, title = {Pattern Discovery by Residual Analysis and Recursive Partitioning}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {11}, number = {6}, issn = {10414347}, year = {1999}, pages = {833852}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.824592}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Pattern Discovery by Residual Analysis and Recursive Partitioning IS  6 SN  10414347 SP833 EP852 EPD  833852 A1  Tom Chau, A1  Andrew K.C. Wong, PY  1999 KW  Pattern discovery KW  residual analysis KW  recursive partitioning KW  events KW  contingency tables. VL  11 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—In this paper, a novel method of pattern discovery is proposed. It is based on the theoretical formulation of a contingency table of events. Using residual analysis and recursive partitioning, statistically significant events are identified in a data set. These events constitute the important information contained in the data set and are easily interpretable as simple rules, contour plots, or parallel axes plots. In addition, an informative probabilistic description of the data is automatically furnished by the discovery process. Following a theoretical formulation, experiments with real and simulated data will demonstrate the ability to discover subtle patterns amid noise, the invariance to changes of scale, cluster detection, and discovery of multidimensional patterns. It is shown that the pattern discovery method offers the advantages of easy interpretation, rapid training, and tolerance to noncentralized noise.
[1] D.W. Scott, Multivariate Density Estimation. John Wiley and Sons, 1992.
[2] P.J. Huber, “Projection Pursuit (with Discussion),” Annals of Statistics, vol. 13, pp. 435525, 1985.
[3] M. Rosenblatt, “Remarks on Some Nonparametric Estimates of a Density Function,” Annals of Math. Statistics, vol. 27, pp. 832837, 1956.
[4] E. Parzen, “On Estimation of Probability Density Function and Mode,” Annals of Math. Statistics, vol. 33 pp. 1,0651,076, 1962.
[5] J. Moody and C.J. Darken, “Fast Learning in Networks of LocallyTuned Processing Units,” Neural Computation, vol. 1, no. 2, pp. 281294, 1989.
[6] R.O. Duda and P.H. Hart, Pattern Classification and Scene Analysis. John Wiley and Sons, 1973.
[7] E.J. Hartman, J.D. Keeler, and J.M. Kowalski, “Layered Neural Networks with Gaussian Hidden Units as Universal Approximations,” Neural Computation, vol. 2, no. 2, pp. 210215, 1990.
[8] J. Park and I.W. Sandberg, “Approximation and Radial Basis Function Networks,” Neural Computation, vol. 5, no. 2, pp. 305316, 1993.
[9] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[10] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning Representations by Backpropagating Errors,” Nature, vol. 323, pp. 533536, 1986.
[11] G. Cybenko, “Approximation by Superpositions of a Sigmoidal Function,” Math. of Control, Signals, and Systems, vol. 2, pp. 303314, 1989.
[12] H. White, “Learning in Artificial Neural Networks: A Statistical Perspective,” Neural Computation, vol. 1, pp. 425464, 1989.
[13] G.E. Hinton, J.L. McClelland, and D.E. Rumelhart, “Distributed Representations,” Parallel Distributed Processing: Explorations in the Microstructure of Cognition, D.E. Rumelhart and J.L. McClelland, eds., vol. 1, MIT Press, Cambridge, Mass., 1986.
[14] J. Wejchert and G. Tesauro, ”Visualizing Processes in Neural Networks,” IBM J. Research and Development, vol. 35, pp. 244253, 1991.
[15] H. Narazaki, M. Yamamoto, and T. Watanabe, “Reorganizing Knowledge in Neural Networks: An Explanation Mechanism for Neural Networks in Data Classification Problems,” IEEE Trans. Systems, Man, and Cybernetics, part B, vol. 26, no. 1, pp. 107117, 1996.
[16] S. Avner, “Extraction of Comprehensive Symbolic Rules from a Multilayer Perceptron,” Eng. Applications, vol. 9, no. 2, pp. 13743, 1996.
[17] R. Andrews, R. Cable, J. Diederich, S. Geva, M. Golea, R. Hayward, C. HoStuart, and A. Tickle, “Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks,” KnowledgeBased Systems, vol. 8, no. 6, pp. 373389, 1995.
[18] M. Dolson, “Discriminative Nonlinear Dimensionality Reduction for Improved Classification,” Int'l J. Neural Systems, vol. 5, no. 4, pp. 313333, 1994.
[19] T. Kohonen, SelfOrganizing Maps. Berlin: SpringerVerlag, 1995.
[20] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge Univ. Press, 1996.
[21] A. Papoulis, Probability, Random Variables, and Stochastic Processes. McGrawHill, 1991.
[22] A.K.C. Wong and Y. Wang, “HighOrder Pattern Discovery from DiscreteValued Data Sets,” IEEE Trans. Knowledge and Data Eng., pp. 877893, vol. 9, no. 6, Nov./Dec. 1997.
[23] S.C. Port, Theoretical Probability for Applications. John Wiley and Sons, 1994.
[24] R.A. Christensen, LogLinear Models. SpringerVerlag, 1990.
[25] S. Haberman, The Analysis of Qualitative Data, vol. 1.Academic Press, 1978.
[26] B.S. Everitt, The Analysis of Contingency Tables. Wiley, New York, 1977.
[27] A. Agresti, Categorical Data Analysis. Wiley, New York, 1990.
[28] R.N. Forthofer, Public Program Analysis: A New Categorical Data Approach. Lifetime Learning Publications, 1981.
[29] S.J. Haberman, The Analysis of Frequency Data. Univ. of Chicago Press, 1974.
[30] C. Cox, “An Elementary Introduction to Maximum Likelihood Estimaton for Multinomial Models: Birch's Theorem and the Delta Method,” Amer. Statistician, vol. 38, no. 4, pp. 283287, 1984.
[31] A.K.C. Wong, D.K.Y. Chiu, and B. Cheung, “Information Discovery Through Hierarchical Maximum Entropy Discretization,” Knowledge Discovery in Databases, G. PiatetskyShapiro and W.J. Frawley, eds., pp. 125140, AAAI/MIT Press, 1987.
[32] M. Lascurain, “On Maximum Entropy Discretization and its Applications in Pattern Recognition,” Ph.D. thesis, Systems Design Eng., Univ. of Waterloo, Waterloo, Ontario, Canada, 1983.
[33] C.E. Shannon, “Mathematical Theory of Communication,” Bell Systems Technical J., vol. 27, no. 3, pp. 379423, 1948.
[34] E.T. Jaynes, “Information Theory and Statistical Mechanics,” Physical Rev., vol. 106, no. 1, pp. 620630, 1957.
[35] D.K.Y. Chiu, B. Cheung, and A.K.C. Wong, “Information Synthesis Based on Hierarchical Maximum Entropy Discretization,” J. Experimental and Theoretical Artificial Intelligence, vol. 2, pp. 117129, 1990.
[36] J. Sonquist, Multivariate Model Building: The Validation of a Search Strategy, Inst. for Social Research, Univ. of Michigan, Ann Arbor, 1970.
[37] L. Gordon and R.A. Olshen, “Asymptotically Efficient Solutions to the Classification Problem,” Annals of Statistics, vol. 6, no. 3, pp. 515533, 1978.
[38] J.H. Friedman, “A Recursive Partitioning Decision Rule for Nonparametric Classification,” IEEE Trans. Computers, pp. 404408, 1977.
[39] E.G. Henrichon and K.S. Fu, “A Nonparametric Partitioning Procedure for Pattern Classification,” IEEE Trans. Computers, vol. 18, no. 7, pp. 614624, 1969.
[40] A. Ciampi, C.H. Chang, S.A. Hogg, and S. McKinney, “Recursive Partition: A Versatile Method for Exploratory Data Analysis in Biostatistics,” Biostatistics, I.B. MacNeil and G.J. Umphrey, eds., vol. 5, pp. 2350, Dordrecht, Netherlands, 1987.
[41] D.K.Y. Chiu and A.K.C. Wong, “Synthesizing Knowledge: A Cluster Analysis Approach Using Event Covering,” IEEE Trans. Systems, Man, and Cybernetics, vol. 16, no. 2, pp. 251259, 1986.
[42] R. Fletcher, Practical Methods of Optimization. John Wiley and Sons, second ed., 1987.
[43] T.W. Anderson, “Some Nonparametric Multivariate Procedures Based on Statistically Equivalent Blocks,” Multivariate Analysis, P.R. Krishnaiah, ed., pp. 528, Academic Press, 1966.
[44] L.P. Devroye, “A Universal kNearest Neighbor Procedure in Discrimination,” Proc. IEEE Computer Soc. Conf. Pattern Recognition and Image Processing,” pp. 142147, 1978.
[45] W.S. Meisel and D.A. Michalopoulos, “A Partitioning Algorithm with Application in Pattern Classification and the Optimization of Decision Trees,” IEEE Trans. Computers, vol. 22, no. 1, pp. 93103, 1973.
[46] D.W. Scott, A.M. Gotto, J.S. Cole, and G.A. Gorry, “Plasma Lipids as Collateral Risk Factors in Coronary Artery Disease—A Study of 371 Males with Chest Pain,” J. Chronic Diseases, vol. 31, pp. 337345, 1978.
[47] D. Coomans, I. Broeckaert, M. Jonckheer, and D.L. Massart, “Comparison of Multivariate Discrimination Techniques for Clinical Data,” Methods of Information in Medicine, vol. 22, pp. 93101, 1983.
[48] E.J. Wegman, “Hyperdimensional Data Analysis Using Parallel Coordinates,” J. Amer. Statistical Assoc., vol. 85, no. 411, pp. 664675, 1990.