• Publication
  • 2003
  • Issue No. 11 - November
  • Abstract - A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets
November 2003 (vol. 25 no. 11)
pp. 1468-1483

Abstract—Several authors have shown that, when labeled data are scarce, improved classifiers can be built by augmenting the training set with a large set of unlabeled examples and then performing suitable learning. These works assume each unlabeled sample originates from one of the (known) classes. Here, we assume each unlabeled sample comes from either a known or from a heretofore undiscovered class. We propose a novel mixture model which treats as observed data not only the feature vector and the class label, but also the fact of label presence/absence for each sample. Two types of mixture components are posited. "Predefined" components generate data from known classes and assume class labels are missing at random. "Nonpredefined" components only generate unlabeled data—i.e., they capture exclusively unlabeled subsets, consistent with an outlier distribution or new classes. The predefined/nonpredefined natures are data-driven, learned along with the other parameters via an extension of the EM algorithm. Our modeling framework addresses problems involving both the known and unknown classes: 1) robust classifier design, 2) classification with rejections, and 3) identification of the unlabeled samples (and their components) from unknown classes. Case 3 is a step toward new class discovery. Experiments are reported for each application, including topic discovery for the Reuters domain. Experiments also demonstrate the value of label presence/absence data in learning accurate mixtures.

[1] A. Ben-Dor, N. Friedman, and Z. Yakhini, Class Discovery in Gene Expression Data Proc. Fifth Ann. Int'l Conf. Computational Biology, pp. 31-38, 2001.
[2] J. Besag, On the Statistical Analysis of Dirty Pictures J. Royal Statistical Soc. B., vol. 48, pp. 259-302, 1986.
[3] A. Blum and T. Mitchell, Combined Labeled and Unlabeled Data with Co-Training Proc. Conf. Computational Learning Theory, pp. 92-100, 1998.
[4] J.S. Bridle, Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition Neurocomputing: Algorithms, Architectures, and Applications, New York: Springer-Verlag, pp. 227-236, 1990.
[5] V. Castelli and T. Cover, On the Exponential Value of Labeled Samples Pattern Recognition Letters, vol. 16, pp. 105-111, 1995.
[6] I. Chang and M. Loew, Pattern Recognition with New Class Discovery Proc. Conf. Computer Vision and Pattern Recognition, pp. 438-443, 1991.
[7] F.G. Cozman and I. Cohen, Unlabeled Data can Degrade Classification Performance of Generative Classifiers Hewlett Packard technical report, pp. 1-16, 2001.
[8] R. Dave, Characterization and Detection of Noise in Clustering Pattern Recognition Letters, vol. 12, pp. 657-664, 1991.
[9] A. Dempster, N. Laird, and D. Rubin, Maximum-Likelihood from Incomplete Data via the EM Algorithm J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[10] S. Dumais et al., "Inductive Learning Algorithms and Representations for Text Categorization, to be published in Proc. Conf. Information and Knowledge Management, 1998; .
[11] J.A. Fessler and A.O. Hero, Space-Alternating Generalized Expectation-Maximization Algorithm IEEE Trans. Signal Processing, vol. 42, pp. 2664-2677, 1994.
[12] M.A.T. Figueiredo and A.K. Jain, Unsupervised Learning of Finite Mixture Models IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 381-396, 2002.
[13] P. Hall and D.M. Titterington, The Use of Uncategorized Data to Improve the Performance of a Nonparametric Estimator of a Mixture Density J. Royal Statistical Soc. B, vol. 47, pp. 155-163, 1985.
[14] S. Haykin, Neural Network—A Comprehensive Foundation, second ed. Prentice Hall, 1999.
[15] P.J. Huber, Robust Statistics. New York: Wiley, 1981.
[16] M. Inoue and N. Ueda, HMMs for Both Labeled and Unlabeled Time Series Data Proc. IEEE Workshop Neural Networks for Signal Processing, pp. 93-102, 2001.
[17] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton, Adaptive Mixtures of Local Experts Neural Computation, pp. 79-87, 1991.
[18] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[19] B. Jeong and D. Landgrebe, Partially Supervised Classification Using Weighted Unsupervised Clustering IEEE Trans. Geoscience and Remote Sensing, pp. 1073-1079, 1999.
[20] T. Joachims, Transductive Inference for Text Classification Using Support Vector Machines Proc. 14th Conf. Uncertainty in Artificial Intelligence, pp. 200-209, 1999.
[21] J. Larsen, L.K. Hansen, T. Christiansen, and T. Kolenda, Webmining: Learning from the World Wide Web Computational Statistics and Data Analysis, vol. 38, pp. 517-532, 2002.
[22] A. McCallum and K. Nigam, A Comparison of Event Models for Naive Bayes Text Classification Proc. AAAI Workshop Learning for Text Categorization, pp. 41-48, 1998.
[23] G. McLachlan and D. Peel, Finite Mixture Models. New York: John Wiley and Sons, 2000.
[24] X.-L. Meng and D. van Dyk, The EM Algorithm An Old Folk-Song Sung to a Fast New Tune J. Royal Statistical Soc. B, vol. 59, no. 3, pp. 511-567, 1997.
[25] D.J. Miller and H. Uyar, A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data Proc. Neural Information Processing Systems Conf., vol. 9, pp. 571-577, 1997.
[26] D.J. Miller and H. Uyar, Combined Learning and Use for a Mixture Model Equivalent to the RBF Classifier Neural Computation, vol. 10, pp. 281-294, 1998.
[27] D.J. Miller and H. Uyar, A Generalized Gaussian Mixture Classifier with Learning Based on Both Labelled and Unlabelled Data Proc. Int'l Conf. Information Sciences and Systems, pp. 783-787, 1996.
[28] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, Text Classification from Labeled and Unlabeled Documents Using EM Machine Learning, pp. 1-34, 2000.
[29] P. Pudil, J. Novovicova, S. Blaha, and J. Kittler, Multistage Pattern Recognition with Reject Option Proc. Int'l Conf. Pattern Recognition Methodology and Systems, pp. 92-95, 1992.
[30] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Upper Saddle River, N.J., 1993.
[31] J. Rissanen, Modelling by Shortest Data Description Automatica, vol. 14, pp. 465-471, 1978.
[32] N. Roy and A. McCallum, Toward Optimal Active Learning through Sampling Estimation of Error Reduction Proc. Int'l Conf. Machine Learning, pp. 441-448, 2001.
[33] B. Shashahani and D. Landgrebe, The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon IEEE Trans. Geoscience and Remote Sensing, vol. 32, pp. 1087-1095, 1994.
[34] A. Sierra and F. Corbacho, Reclassification as Supervised Clustering Neural Computation, vol. 12, pp. 2537-2546, 2000.
[35] S. Tajudin and D. Landgrebe, Robust Parameter Estimation for Mixture Model IEEE Trans. Geoscience and Remote Sensing, vol. 38, pp. 439-445, 2000.
[36] N. Ueda and R. Nakano, Deterministic Annealing EM Algorithm Neural Networks, vol. 11, pp. 271-282, 1998.
[37] N. Ueda, R. Nakano, Z. Ghahramani, and G. Hinton, Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates Proc. IEEE Workshop Neural Networks for Signal Processing, pp. 274-283, 1998.
[38] L. Xu, M.I. Jordan, and G.E. Hinton, An Alternative Model for Mixtures of Experts Proc. Neural Information Processing Systems Conf., vol. 7, pp. 633-640, 1995.

Index Terms:
Class discovery, labeled and unlabeled data, outlier detection, sample rejection, mixture models, EM algorithm, text categorization.
Citation:
David J. Miller, John Browning, "A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 11, pp. 1468-1483, Nov. 2003, doi:10.1109/TPAMI.2003.1240120
Usage of this product signifies your acceptance of the Terms of Use.