This Article 
 Bibliographic References 
 Add to: 
Prototype-Based Domain Description for One-Class Classification
June 2012 (vol. 34 no. 6)
pp. 1131-1144
F. Angiulli, DEIS, Univ. of Calabria, Rende, Italy
This work introduces the Prototype-based Domain Description rule (PDD) one-class classifier. PDD is a nearest neighbor-based classifier since it accepts objects on the basis of their nearest neighbor distances in a reference set of objects, also called prototypes. For a suitable choice of the prototype set, the PDD classifier is equivalent to another nearest neighbor-based one-class classifier, namely, the NNDD classifier. Moreover, it generalizes statistical tests for outlier detection. The concept of a PDD consistent subset is introduced, which exploits only a selected subset of the training set. It is shown that computing a minimum size PDD consistent subset is, in general, not approximable within any constant factor. A logarithmic approximation factor algorithm, called the CPDD algorithm, for computing a minimum size PDD consistent subset is then introduced. In order to efficiently manage very large data sets, a variant of the basic rule, called Fast CPDD, is also presented. Experimental results show that the CPDD rule sensibly improves over the CNNDD classifier, namely the condensed variant of NNDD, in terms of size of the subset while guaranteeing a comparable classification quality, that it is competitive over other one-class classification methods and is suitable to classify large data sets.

[1] F. Angiulli, "Prototype-Based Domain Description," Proc. European Conf. Artificial Intelligence, pp. 107-111, 2008.
[2] D.M.J. Tax, "One-Class Classification," PhD dissertation, Delft Univ. of Tech nology, June 2001.
[3] B. Schölkopf, C. Burges, and V. Vapnik, "Extracting Support Data for a Given Task," Proc. Int'l Conf. Knowledge Discovery & Data Mining, pp. 251-256, 1995.
[4] D. Tax and R. Duin, "Data Domain Description Using Support Vectors," Proc. European Symp. Artificial Neural Networks, pp. 251-256, Apr. 1999.
[5] I. Tsang, J. Kwok, and P.-M. Cheung, "Core Vector Machines: Fast SVM Training on Very Large Data Sets," J. Machine Learning Research, vol. 6, pp. 363-392, 2005.
[6] I. Tsang, A. Kocsor, and J. Kwok, "Simpler Core Vector Machines with Enclosing Balls," Proc. Int'l Conf. Machine Learning, pp. 911-918, 2007.
[7] A. Ypma and R. Duin, "Support Objects for Domain Approximation," Proc. Int'l Conf. Artificial Neural Networks, 1998.
[8] D. Tax and R. Duin, "Data Descriptions in Subspaces," Proc. Int'l Conf. Pattern Recognition, pp. 672-675, 2000.
[9] M. Breunig, H. Kriegel, R. Ng, and J. Sander, "LOF: Identifying Density-Based Local Outliers," Proc. ACM Int'l Conf. Management of Data, 2000.
[10] F. Angiulli, "Condensed Nearest Neighbor Data Domain Description," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1746-1758, Oct. 2007.
[11] E. Knorr and R. Ng, "Algorithms for Mining Distance-Based Outliers in Large Datasets," Proc. Int'l Conf. Very Large Databases, pp. 392-403, 1998.
[12] V. Barnett and T. Lewis, Outliers in Statistical Data. John Wiley & Sons, 1994.
[13] P. Hart, "The Condensed Nearest Neighbor Rule," IEEE Trans. Information Theory, vol. 14, no. 3, pp. 515-516, May 1968.
[14] F. Angiulli, "Fast Nearest Neighbor Condensation for Large Data Sets Classification," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 11, pp. 1450-1464, Nov. 2007.
[15] N. Littlestone and M. Warmuth, "Relating Data Compression and Learnability," technical report, Univ. of California, Santa Cruz, 1986.
[16] S. Floyd and M. Warmuth, "Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension," Machine Learning, vol. 21, no. 3, pp. 269-304, 1995.
[17] F. Angiulli and F. Fassetti, "Dolphin: An Efficient Algorithm for Mining Distance-Based Outliers in Very Large Datasets," ACM Trans. Knowledge Discovery from Data, vol. 3, no. 1,pp. 4:1-4:57, 2009.
[18] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi, Complexity and Approximation. Springer-Verlag, 1999.
[19] M. Bellare, S. Goldwasser, C. Lund, and A. Russeli, "Efficient Probabilistically Checkable Proofs and Applications to Approximations," Proc. 25th Ann. ACM Symp. Theory of Computing, pp. 294-304, 1993.
[20] C.B.D.J. Newman, S. Hettich, and C. Merz, "UCI Repository of Machine Learning Databases," , 1998.
[21] B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson, "Estimating the Support of a High-Dimensional Distribution," Neural Computation, vol. 13, no. 7, pp. 1443-1471, 2001.
[22] C.-C. Chang and C.-J. Lin, LIBSVM: A Library for Support Vector Machines, aoftware, 2001.

Index Terms:
pattern classification,learning (artificial intelligence),one-class classification methods,prototype-based domain description rule,nearest neighbor-based classifier,PDD classifier,NNDD classifier,statistical tests,outlier detection,logarithmic approximation factor algorithm,CPDD algorithm,CNNDD classifier,Prototypes,Handheld computers,Classification algorithms,Approximation algorithms,Approximation methods,Training,Measurement,data set condensation.,One-class classification,novelty detection,nearest neighbor classification
F. Angiulli, "Prototype-Based Domain Description for One-Class Classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 6, pp. 1131-1144, June 2012, doi:10.1109/TPAMI.2011.204
Usage of this product signifies your acceptance of the Terms of Use.