This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images
March 2003 (vol. 25 no. 3)
pp. 373-378

Abstract—This paper describes a new hierarchical approach to content-based image retrieval called the “customized-queries” approach (CQA). Contrary to the single feature vector approach which tries to classify the query and retrieve similar images in one step, CQA uses multiple feature sets and a two-step approach to retrieval. The first step classifies the query according to the class labels of the images using the features that best discriminate the classes. The second step then retrieves the most similar images within the predicted class using the features customized to distinguish “subclasses” within that class. Needing to find the customized feature subset for each class led us to investigate feature selection for unsupervised learning. As a result, we developed a new algorithm called FSSEM (feature subset selection using expectation-maximization clustering). We applied our approach to a database of high resolution computed tomography lung images and show that CQA radically improves the retrieval precision over the single feature vector approach. To determine whether our CBIR system is helpful to physicians, we conducted an evaluation trial with eight radiologists. The results show that our system using CQA retrieval doubled the doctors' diagnostic accuracy.

[1] H. Almuallim and T. Dietterich, “Learning With Many Irrelevant Features,” Proc. Ninth Nat'l Conf. Artificial Intelligence, AAAI Press, pp. 547-552, 1991.
[2] C. Cardie, “Using Decision Trees to Improve Case-Based Learning,” Machine Learning: Proc. 10th Int'l Conf., Morgan Kaufmann, pp. 25-32, 1993.
[3] J. Chen, C.A. Bouman, and J.C. Dalton, “Similarity Pyramids for Browsing and Organization of Large Image Databases,” Proc. SPIE/IS&T Conf. Human Vision and Electronic Imaging III, vol. 3299, Jan. 1998.
[4] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data Via the EM Algorithm,” J. Royal Statistical Soc., Series B, vol. 39, no. 1, pp. 1-38, 1977.
[5] M. Devaney and A. Ram, “Efficient Feature Selection in Conceptual Clustering,” Proc. 14th Int'l Conf. Machine Learning, 1997.
[6] J.G. Dy, “Feature Selection for Unsupervised Learning Applied to Content-Based Image Retrieval.” PhD thesis, Purdue Univ., West Lafayette, IN, 2001.
[7] J. Dy and C. Brodley, “Feature Subset Selection and Order Identification for Unsupervised Learning,” Proc. 17th Int'l. Conf. Machine Learning, 2000.
[8] J.G. Dy, C.E. Brodley, A. Kak, C.R. Shyu, and L.S. Broderick, “The Customized-Queries Approach to CBIR Using EM,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 400-406, June 1999.
[9] J.G. Dy, C.E. Brodley, A. Kak, C.R. Shyu, and L.S. Broderick, “The Customized-Queries Approach to CBIR,” SPIE Storage and Retrieval for Image and Video Databases VII, vol. 3656, pp. 22-32, Jan. 1999.
[10] U. Fayyad, C. Reina, and P.S. Bradley, “Initialization of Iterative Refinement Clustering Algorithms,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, AAAI Press, pp. 194-198, Aug. 1998.
[11] D.H. Fisher, “Knowledge Acquisition via Incremental Conceptual Clustering,” Machine Learning, no. 2, pp. 139-172, 1987.
[12] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by Image and Video Content: The QBIC System,” IEEE Computer, 1995.
[13] Y. Freund and R.E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting J. Computer and Systems Science, vol. 55, pp. 119-139, 1997.
[14] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[15] J. Kittler, “Feature Set Search Algorithms,” Pattern Recognition and Signal Processing, pp. 41-60, 1978.
[16] R. Kohavi and G.H. John, Wrappers for Feature Subset Selection Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[17] W.Y. Ma and B.S. Manjunath, “Texture Features and Learning Similarity,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 425-430, 1996.
[18] G.J. McLachlan and K.E. Basford, Mixture Models, Inference, and Applications to Clustering. New York: Marcel Dekker, 1988.
[19] T.P. Minka and R.W. Picard, "Interactive Learning Using a 'Society of Models,'" Pattern Recognition, 1996. To appear. Also appears as MIT Media Lab Perceptual Computing, TR #349.
[20] A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Tools for Content-Based Manipulation of Image Databases,” SPIE Storage and Retrieval for Image and Video Databases II, no. 2185, Feb. 1994.
[21] J.R. Quinlan, “Bagging, Boosting and C4.5,” Proc. 13th Nat'l Conf. Artificial Intelligence, AAAI Press, pp. 725-730, 1996.
[22] J. Rissanen, “A Universal Prior for Integers and Estimation by Minimum Description Length,” Annals of Statistics, vol. 11, no. 2, pp. 416-431, 1983.
[23] Y. Rui and T.S. Huang, "Optimizing Learning in Image Retrieval," Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 1, June 2000, pp. 236-243.
[24] S.J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, Upper Saddle River, N.J., 1994.
[25] R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” Proc. 14th Int'l Conf. Machine Learning, 1997.
[26] C.-R. Shyu, C.E. Brodley, A.C. Kak, and A. Kosaka, “ASSERT: A Physician in the Loop Content-Based Retrieval System for HCRT Image Databases,” Image Understanding, vol. 75,nos. 1/2, pp. 111-132, 1999.
[27] C.-R. Shyu, A. Kak, C.E. Brodley, and L.S. Broderick, Testing for Human Perceptual Categories in a Physician-in-the-Loop CBIR System for Medical Imagery, Proc. Workshop Content-Based Access of Image and Video Libraries, pp. 102-108, June 1999.
[28] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-Based Image Retrieval at the End of the Early Years IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[29] P. Smyth, “Clustering Using Monte Carlo Cross-Validation,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, AAAI Press, E. Simoudis, J. Han, and U. Fayyad, eds., pp. 126-133, 1996.
[30] L. Talavera, “Feature Selection as a Preprocessing Step for Hierarchical Clustering,” Proc. 16th Int'l Conf. on Machine Learning, pp. 389-397, 1999.
[31] L. Taycher, M. La Cascia, and S. Sclaroff, “Image Digestion and Relevance Feedback in the Image Rover WWW Search Engine,” Proc. Int'l Conf. Visual Information, Dec. 1997.
[32] K. Tieu and P. Viola, “Boosting Image Retrieval,” Proc. Computer Vision and Pattern Recognition, pp. pp. 228-235, 2000.
[33] D.M. Titterington, A.F.M. Smith, and U.E. Makov, Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, 1985.
[34] S. Vaithyanathan and B. Dom, “Model Selection in Unsupervised Learning with Applications to Document Clustering,” Proc. Sixth Int'l Conf. Machine Learning, pp. 433-443, June 1999.
[35] W.R. Webb, N.L. Muller, and D.P. Naidich, High-Resolution CT of the Lung. PA: Lippincott Williams and Wilkins, third ed., 2001.
[36] J.H. Wolfe, “Pattern Clustering by Multivariate Mixture Analysis,” Multivariate Behavioral Research, vol. 5, no. 3, pp. 101-116, 1970.

Index Terms:
Image retrieval, feature selection, clustering, expectation-maximization, unsupervised learning.
Citation:
Jennifer G. Dy, Carla E. Brodley, Avi Kak, Lynn S. Broderick, Alex M. Aisen, "Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 3, pp. 373-378, March 2003, doi:10.1109/TPAMI.2003.1182100
Usage of this product signifies your acceptance of the Terms of Use.