
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Paolo Frasconi, Marco Gori, Giovanni Soda, "Data Categorization Using Decision Trellises," IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 5, pp. 697712, September/October, 1999.  
BibTex  x  
@article{ 10.1109/69.806931, author = {Paolo Frasconi and Marco Gori and Giovanni Soda}, title = {Data Categorization Using Decision Trellises}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {11}, number = {5}, issn = {10414347}, year = {1999}, pages = {697712}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.806931}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Data Categorization Using Decision Trellises IS  5 SN  10414347 SP697 EP712 EPD  697712 A1  Paolo Frasconi, A1  Marco Gori, A1  Giovanni Soda, PY  1999 KW  Belief networks KW  classification KW  connectionist models KW  context specific independence KW  data mining KW  decision trees KW  machine learning. VL  11 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—We introduce a probabilistic graphical model for supervised learning on databases with categorical attributes. The proposed belief network contains hidden variables that play a role similar to nodes in decision trees and each of their states either corresponds to a class label or to a single attribute test. As a major difference with respect to decision trees, the selection of the attribute to be tested is probabilistic. Thus, the model can be used to assess the probability that a tuple belongs to some class, given the predictive attributes. Unfolding the network along the hidden states dimension yields a trellis structure having a signal flow similar to second order connectionist networks. The network encodes context specific probabilistic independencies to reduce parametric complexity. We present a custom tailored inference algorithm and derive a learning procedure based on the expectationmaximization algorithm. We propose decision trellises as an alternative to decision trees in the context of tuple categorization in databases, which is an important step for building data mining systems. Preliminary experiments on standard machine learning databases are reported, comparing the classification accuracy of decision trellises and decision trees induced by C4.5. In particular, we show that the proposed model can offer significant advantages for sparse databases in which many predictive attributes are missing.
[1] U.M. Fayyad, G. PiateskyShapiro, and P. Smith, “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, U.M. Fayyad et al., eds., pp. 134, 1996.
[2] M. Holsheimer and A. Siebes, "Data Mining: The Search for Knowledge in Databases," Technical Report CSR9406, CWI, Amsterdam 1994.
[3] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[4] P. Langley, W. Iba, and K. Thompson, “An Analysis of Bayesian Classifiers,” Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 223–228, AAAI Press and MIT Press, 1992.
[5] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[6] R.L. Rivest, “Learning Decision Lists,” Machine Learning, vol. 2, pp. 229–246, 1987.
[7] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[8] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, Calif.: Morgan Kaufman, 1988.
[9] J. Whittaker, Graphical Models in Applied Multivariate Statistics. Chichester: Wiley, 1990.
[10] W.L. Buntine, “A Guide to the Literature on Learning Probabilistic Networks from Data,” IEEE Trans. Knowledge and Data Engineering, 1996.
[11] F. Jensen, S. Lauritzen, and K. Olesen, “Bayesian Updating in Recursive Graphical Models by Local Computations,” Computational Statistical Quarterly, vol. 4, pp. 269–282, 1990.
[12] D. Heckerman, D. Geiger, and D.M. Chickering, “Learning Bayesian Networks: The Combination of Knowledge and Statistical Data,” Machine Learning, vol. 20, pp. 197–243, 1995.
[13] D. Heckerman, “A Tutorial on Learning with Bayesian Networks,” Technical Report MSRTR9506, Microsoft Research, Redmond, Wash., Mar. 1995.
[14] C. Glymour, “Available Technology for Discovering Causal Models, Building Bayes Nets, and Selecting Predictors: The TETRAD II Program,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, Montreal, pp. 130–135, 1995.
[15] W.L. Buntine, “Operations for Learning with Graphical Models,” J. Artificial Intelligence Research, vol. 2, pp. 159–225, 1994.
[16] P Smyth, D. Heckerman, and M. Jordan, “Probabilistic Independence Networks for Hidden Markov Probability Models,” AI memo 1565, MIT, Cambridge, Mass., Feb. 1996.
[17] P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, pp. 6183, 1996.
[18] N. Friedman and M. Goldszmidt, “Building Classifiers Using Bayesian Networks,” Proc. Int'l Conf. Machine Learning, 1996.
[19] R.M. Neal, "Connectionist Learning of Belief Networks," Artificial Intelligence, vol. 56, pp. 71113, 1992.
[20] C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller, “ContextSpecific Independence in Bayesian Networks,” Proc. 12th Conf. Uncertainty in Artificial Intelligence, E. Horwitz and F. Jensen, eds., pp. 115–123, Portland, Ore., 1996.
[21] M.I. Jordan and R.A. Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, vol. 6, pp. 181214, 1994.
[22] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and Unsupervised Discretization of Continuous Attributes,” Proc. 12th Int'l Conf. Machine Learning, A. Prieditis and S. Russell, eds. San Francisco: Morgan Kaufmann, 1995.
[23] R.C. Holte, “Very Simple Classification Rules Perform Well on Most Commonly Used Datasets,” Machine Learning, vol. 11, pp. 63–91, 1993.
[24] J.D. Ullman, Principles of Database and KnowledgeBase Systems, vol. II: The New Tech nologies. New York: Computer Science Press, 1989.
[25] R. Kohavi and D. Sommerfield, “Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 192–197, Montreal, 1995.
[26] S.K. Murthy, S. Kasif, and S. Salzberg, “A System for Induction of Oblique Decision Trees,” J. Artificial Intelligence Research, vol. 2, pp. 1–32, 1994.
[27] F.M. Malvestuto, “A Unique Formal System for Binary Decompositions of Database Relations, Probability Distributions, and Graphs,” Information Sciences, vol. 59, pp. 21–52, 1992.
[28] F.M. Malvestuto, “Statistical vs. Relational Join Dependencies,” Proc. Seventh Int'l Working Conf. Scientific and Statistical Database Management, J.C. French and H. Hinterberger, eds., Charlottesville, Va., IEEE/CS Press, 1994.
[29] M. Studeny, “Structural Semigraphoids,” Int'l J. General Systems, vol. 22, no. 2, pp. 207–217, 1994.
[30] R.M. Neal, “Asymmetric Parallel Boltzmann Machines Are Belief Networks,” Neural Computation, vol. 4, no. 6, pp. 832–834, 1992.
[31] L.R. Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257285, 1989.
[32] Y. Bengio and P. Frasconi, “Input/Output HMMs for Sequence Processing,” IEEE Trans. Neural Networks, vol. 7, no. 5, pp. 1,231–1,249, 1996.
[33] S. Lauritzen, Graphical Models. Oxford, Clarendon, 1996.
[34] M.I. Jordan, “Why the Logistic Function? A Tutorial Discussion on Probabilities and Neural Networks,” Technical Report 9503, Computational Cognitive Science, Massachusetts Inst. of Tech nology, 1995, URL:ftp://psyche.mit.edu/pub/jordanuai.ps.
[35] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth, 1984.
[36] G.F. Cooper,"The computational complexity of probabilistic inference using Bayesian belief networks," Artificial Intelligence, vol. 42, pp. 393405, 1990.
[37] P. Smyth, D. Heckerman, and M. Jordan, “Probabilistic Independence Networks for Hidden Markov Probability Models,” Technical Report TR9603, Microsoft Research, 1996.
[38] A.P. Dempster, N.M. Laird, and D.B. Rubin, “MaximumLikelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, pp. 1–38, 1977.
[39] G.J. McLachlan and K.E. Basford, Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988.
[40] M.H. DeGroot, Optimal Statistical Decision. New York: McGrawHill, 1970.
[41] Z. Ghahramani and M.I. Jordan, “Supervised Learning from Incomplete Data via an EM Approach,” Advances in Neural Information Processing Systems, vol. 6,J.D. Cowan, G. Tesauro, and J. Alspector, eds., Morgan Kaufmann, 1994.
[42] Y.C. Lee, G. Doolen, H.H. Chen, G.Z. Sun, T. Maxwell, H.Y. Lee, and C.L. Giles, “Machine Learning Using a Higher Order Correlational Network,” Physica D, vol. 22D, nos. 13, p. 276, 1986.
[43] C.J. Merz, P.M. Murphy, and D.W. Aha, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, Irvine, Calif., 1996, URL:http://www.ics.uci.edu/mlearnMLRepository.html .
[44] U.M. Fayyad and K.B. Irani, “MultiInterval Discretization of ContinuousValued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1,022–1,027, Morgan Kaufmann, 1993.
[45] R. Kohavi, G. John, R. Long, D. Manley, and K. Pfleger, "MLC++: A Machine Learning Library in C++," Tools with Artificial Intelligence, pp. 740743, IEEE C. S. Press, 1994.
[46] J. Siöberg and L. Ljung, “Overtraining, Regularization, and Searching for Minimum in Neural Networks,” technical report, Linköping Univ., Sweden, 1992.
[47] J. W. Shavlik,"A framework of combining symbolic and neural learning," Machine Learning, vol. 14, no. 3, pp. 321331, 1994.