
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Kenneth E. Hild, Deniz Erdogmus, Kari Torkkola, Jose C. Principe, "Feature Extraction Using InformationTheoretic Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 13851392, September, 2006.  
BibTex  x  
@article{ 10.1109/TPAMI.2006.186, author = {Kenneth E. Hild and Deniz Erdogmus and Kari Torkkola and Jose C. Principe}, title = {Feature Extraction Using InformationTheoretic Learning}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {28}, number = {9}, issn = {01628828}, year = {2006}, pages = {13851392}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2006.186}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  Feature Extraction Using InformationTheoretic Learning IS  9 SN  01628828 SP1385 EP1392 EPD  13851392 A1  Kenneth E. Hild, A1  Deniz Erdogmus, A1  Kari Torkkola, A1  Jose C. Principe, PY  2006 KW  Feature extraction KW  information theory KW  classification KW  nonparametric statistics. VL  28 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
[1] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1995.
[2] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley & Sons, 1991.
[3] J.C. Principe, D. Xu, Q. Zhao, and J.W. FisherIII, “Learning from Examples with Information Theoretic Criteria,” J. VLSI Signal Proc. Systems, vol. 26, nos. 1/2, pp. 6177, Aug. 2000.
[4] D. Erdogmus and J.C. Principe, “Lower and Upper Bounds for Misclassification Probability Based on Renyi's Information,” J. VLSI Signal Processing, vol. 37, nos. 23, pp. 305317, June 2004.
[5] M.E. Hellman and J. Raviv, “Probability of Error, Equivocation, and the Chernoff Bound,” IEEE Trans. Information Theory, vol. 16, no. 4, pp. 368372, July 1970.
[6] R. Battiti, “Using Mutual Information for Selecting Features in Supervised Neural Net Learning,” IEEE Trans. Neural Networks, vol. 5, no. 4, pp. 537550, July 1994.
[7] H.H. Yang and J. Moody, “Feature Selection Based on Joint Mutual Information,” Proc. Conf. Advances in Intelligent Data Analysis, Computational Intelligence Methods, and Applications, June 1999.
[8] K.D. Bollacker and J. Ghosh, “Mutual Information Feature Extractors for Neural Classifiers,” Proc. Int'l Conf. Neural Networks (ICNN '96), pp. 15281533, June 1996.
[9] N. Kwak and C.H. Choi, “Improved Mutual Information Feature Selector for Neural Networks in Supervised Learning,” Proc. Int'l Joint Conf. Neural Networks, vol. 2, pp. 13131318, July 1999.
[10] R. Rajagopal, K.A. Kumar, and P.R. Rao, “An Integrated Approach to Passive Target Classification,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 313316, Apr. 1994.
[11] K.E. HildII, D. Erdogmus, and J.C. Principe, “An Analysis of Entropy Estimators for Blind Source Separation,” Signal Processing, vol. 86, no. 1, pp. 182194, Jan. 2006.
[12] A. Renyi, Probability Theory. Amsterdam: NorthHolland Publishing Company, 1970.
[13] K.E. HildII, D. Erdogmus, and J.C. Principe, “OnLine Minimum Mutual Information Method for TimeVarying Blind Source Separation,” Proc. Int'l Workshop Independent Component Analysis and Signal Separation, pp. 126131, Dec. 2001.
[14] D. Erdogmus, K.E. HildII, and J.C. Principe, “OnLine Entropy Manipulation: Stochastic Information Gradient,” IEEE Signal Processing Letters, vol. 10, no. 8, pp. 242245, Aug. 2003.
[15] J. Beirlant, E.J. Dudewica, L. Gyofi, and E. van der Meulen, “Nonparametric Entropy Estimation: An Overview,” Int'l J. Math. Statistics Sciences, vol. 6, no. 1, pp. 1739, 1997.
[16] E. Parzen, “On Estimation of a Probability Density Function and Mode,” Annals of Math. Statistics, vol. 33, no. 3, pp. 10651076, Sept. 1962.
[17] G.H. Golub and C.F. Van Loan, Matrix Computations, third ed. Baltimore: John Hopkins Univ. Press, 1996.
[18] S. Theodoridis and K. Koutroumbas, Pattern Recognition. San Diego, Calif.: Academic Press, 1999.
[19] K.E. HildII, D. Erdogmus, and J.C. Principe, “Blind Source Separation Using Renyi's Mutual Information,” IEEE Signal Processing Letters, vol. 8, no. 6, pp. 174176, June 2001.
[20] R.A. Morejon, “An InformationTheoretic Approach to Sonar Automatic Target Recognition,” PhD dissertation, Univ. of Florida, 2003.
[21] C. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.: Oxford Univ. Press, 1995.
[22] S.C. Fralick and R.W. Scott, “Nonparametric BayesRisk Estimation,” IEEE Trans. Information Theory, vol. 17, no. 4 pp. 440444, July 1971.
[23] K. Torkkola, “On Feature Extraction by Mutual Information Maximization,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 821825, May 2002.
[24] K. Torkkola, “Learning Discriminative Feature Transforms to Low Dimensions in Low Dimensions,” Proc. Conf. Advances in Neural Information Processing Systems, Dec. 2001.
[25] K. Torkkola and W.M. Campbell, “Mutual Information in Learning Feature Transformations,” Proc. Int'l Conf. Machine Learning, pp. 10151022, June 2000.
[26] K. Torkkola, “Visualizing Class Structure in Data Using Mutual Information,” Proc. Conf. Neural Networks for Signal Proc. (NNSP '00), pp. 376385, Dec. 2000.
[27] D. Xu and J.C. Principe, “Feature Evaluation Using Quadratic Mutual Information,” Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 459463, July 2001.
[28] A. Biem, S. Katagiri, and B.H. Juang, “Pattern Recognition Using Discriminative Feature Extraction,” IEEE Trans. Signal Processing, vol. 45, no. 2, pp. 500504, Feb. 1997.
[29] H. Watanabe, T. Yamaguchi, and S. Katagiri, “Discriminative Metric Design for Robust Pattern Recognition,” IEEE Trans. Signal Processing, vol. 45, no. 11, pp. 26552662, Nov. 1997.
[30] S. Katagiri, B.H. Juang, and C.H. Lee, “Pattern Recognition Using a Family of Design Algorithms Based upon the Generalized Probabilistic Descent Method,” Proc. IEEE, vol. 86, no. 11, pp. 23452373, Nov. 1998.
[31] B.H. Juang and S. Katagiri, “Discriminative Learning for Minimum Error Classification,” IEEE Trans. Signal Processing, vol. 40, no. 12, pp. 30433054, Dec. 1992.
[32] A. Biem, S. Katagiri, and B.H. Juang, “Discriminative Feature Extraction for Speech Recognition,” Proc. Conf. Neural Networks for Signal Processing (NNSP '93), pp. 392401, Sept. 1993.
[33] Q. Li and B.H. Juang, “A New Algorithm for Fast Discriminative Training,” Proc. Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '02), vol. 1, pp. 97100, May 2002.
[34] V. Nedeljkovic, “A Novel Multilayer Neural Networks Training Algorithm that Minimizes the Probability of Classification Error,” IEEE Trans. Neural Networks, vol. 4, no. 4, pp. 650659, July 1993.
[35] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Boston: Academic Press, 1990.
[36] D. Erdogmus, K.E. HildII, and J.C. Principe, “Kernel Size Selection in Parzen Density Estimation,” J. VLSI Signal Processing Systems, submitted.
[37] D. Erdogmus and J.C. Principe, “Generalized Information Potential Criterion for Adaptive System Training,” IEEE Trans. Neural Networks, Sept. 2002.
[38] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “GradientBased Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 22782324, Nov. 1998.