The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2014 vol.36)
pp: 417-435
Lei Wang , Sch. of Comput. Sci. & Software Eng., Univ. of Wollongong, Wollongong, NSW, Australia
Luping Zhou , Sch. of Comput. Sci. & Software Eng., Univ. of Wollongong, Wollongong, NSW, Australia
Chunhua Shen , Sch. of Comput. Sci., Univ. of Adelaide, Adelaide, SA, Australia
Lingqiao Liu , Sch. of Comput. Sci. & Software Eng., Univ. of Wollongong, Wollongong, NSW, Australia
Huan Liu , Sch. of Comput., Inf., & Decision Syst. Eng., Arizona State Univ., Tempe, AZ, USA
ABSTRACT
In image recognition with the bag-of-features model, a small-sized visual codebook is usually preferred to obtain a low-dimensional histogram representation and high computational efficiency. Such a visual codebook has to be discriminative enough to achieve excellent recognition performance. To create a compact and discriminative codebook, in this paper we propose to merge the visual words in a large-sized initial codebook by maximally preserving class separability. We first show that this results in a difficult optimization problem. To deal with this situation, we devise a suboptimal but very efficient hierarchical word-merging algorithm, which optimally merges two words at each level of the hierarchy. By exploiting the characteristics of the class separability measure and designing a novel indexing structure, the proposed algorithm can hierarchically merge 10,000 visual words down to two words in merely 90 seconds. Also, to show the properties of the proposed algorithm and reveal its advantages, we conduct detailed theoretical analysis to compare it with another hierarchical word-merging algorithm that maximally preserves mutual information, obtaining interesting findings. Experimental studies are conducted to verify the effectiveness of the proposed algorithm on multiple benchmark data sets. As shown, it can efficiently produce more compact and discriminative codebooks than the state-of-the-art hierarchical word-merging algorithms, especially when the size of the codebook is significantly reduced.
INDEX TERMS
Visualization, Algorithm design and analysis, Merging, Tin, Histograms, Training, Computational modeling,object recognition, Hierarchical word merge, compact codebook, class separability, bag-of-features model
CITATION
Lei Wang, Luping Zhou, Chunhua Shen, Lingqiao Liu, Huan Liu, "A Hierarchical Word-Merging Algorithm with Class Separability Measure", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.36, no. 3, pp. 417-435, March 2014, doi:10.1109/TPAMI.2013.160
REFERENCES
[1] S. Agarwal and A. Awan, "Learning to Detect Objects in Images via a Sparse, Part-Based Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1475-1490, Nov. 2004.
[2] G. Csurka, C.R. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual Categorization with Bags of Keypoints," Proc. European Conf. Computer Vision Int'l Workshop Statistical Learning in Computer Vision, 2004.
[3] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second, ed. Wiley, 2001.
[4] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results," http://www.pascal-network.org/ challenges/ VOC/voc2007/workshopindex.html, 2013.
[5] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results," http://www.pascal-network.org/ challenges/ VOC/voc2012/workshopindex.html, 2013.
[6] J.D.R. Farquhar, S. Szedmak, H. Meng, and J. Shawe-Taylor, "Improving 'Bag of Keypoints' Image Categorization: Generative Models and PDF-Kernels," technical report, Univ. of Southampton, 2005.
[7] W. Dinkelbach, "On Nonlinear Fractional Programming," Management Science, vol. 13, no. 7, pp. 492-498, 1967.
[8] L. Fei-Fei, R. Fergus, and P. Perona, "Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshop Generative-Model Based Vision, 2004.
[9] L. Fei-Fei and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 524-531, 2005.
[10] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, "Learning Object Categories from Google's Image Search," Proc. 10th IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1816-1823, Oct. 2005.
[11] B. Fulkerson, A. Vedaldi, and S. Soatto, "Localizing Objects with Smart Dictionaries," Proc. 10th European Conf. Computer Vision (ECCV '08), pp. 179-192, 2008.
[12] G. Griffin, A. Holub, and P. Perona, "Caltech-256 Object Category Data Set," Technical Report CNS-TR-2007-001, California Inst. of Tech nology, 2007.
[13] Y. Jia, F. Nie, and C. Zhang, "Trace Ratio Problem Revisited," IEEE Trans. Neural Networks, vol. 20, no. 4, pp. 729-735, Apr. 2009.
[14] F. Jurie and B. Triggs, "Creating Efficient Codebooks for Visual Recognition," Proc. 10th IEEE Int'l Conf. Computer Vision (ICCV), pp. 604-610, 2005.
[15] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, "Learning Realistic Human Actions from Movies," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2008.
[16] D. Larlus and F. Jurie, "Latent Mixture Vocabularies for Object Categorization and Segmentation," J. Image and Vision Computing, vol. 27, no. 5, pp. 523-534, Apr. 2009.
[17] S. Lazebnik and M. Raginsky, "Supervised Learning of Quantizer Codebooks by Information Loss Minimization," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 7, pp. 1294-1309, July 2009.
[18] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '06), pp. 2169-2178, 2006.
[19] B. Leibe and B. Schiele, "Interleaved Object Categorization and Segmentation," Proc. British Machine Vision Conf. (BMVC), pp. 759-768, 2003.
[20] J. Liu and M. Shah, "Learning Human Actions via Information Maximization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[21] J. Liu, Y. Yang, and M. Shah, "Learning Semantic Visual Vocabularies Using Diffusion Distance," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 461-468, 2009.
[22] L. Liu, L. Wang, and C. Shen, "A Generalized Probabilistic Framework for Compact Codebook Creation," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1537-1544, 2011.
[23] M. Loog, R.P.W. Duin, and R. Haeb-Umbach, "Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 7, pp. 762-766, July 2001.
[24] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Discriminative Learned Dictionaries for Local Image Analysis," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[25] F. Moosmann, B. Triggs, and F. Jurie, "Fast Discriminative Visual Codebooks Using Randomized Clustering Forests," Advances in Neural Information Processing Systems 19, B. Schölkopf, J. Platt, and T. Hoffman, eds., pp. 985-992, MIT Press, 2007.
[26] F. Nie, S. Xiang, Y. Jia, C. Zhang, and S. Yan, "Trace Ratio Criterion for Feature Selection," Proc. 23rd Nat'l Conf. Artificial Intelligence (AAAI), pp. 671-676, 2008.
[27] D. Nister and H. Stewenius, "Scalable Recognition with a Vocabulary Tree," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '06), pp. 2161-2168, 2006.
[28] F. Perronnin, C.R. Dance, G. Csurka, and M. Bressan, "Adapted Vocabularies for Generic Visual Categorization," Proc. Ninth European Conf. Computer Vision (ECCV), pp. 464-475, 2006.
[29] P. Quelhas, F. Monay, J.-M. Odobez, D. Gatica-Perez, T. Tuytelaars, and L. Van Gool, "Modeling Scenes with Local Descriptors and Latent Aspects," Proc. 10th IEEE Int'l Conf. Computer Vision (ICCV '05), pp. 883-890, 2005.
[30] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng, "Self-Taught Learning: Transfer Learning from Unlabeled Data," Proc. 24th Int'l Conf. Machine Learning (ICML '07), 2007.
[31] C. Schuldt, I. Laptev, and B. Caputo, "Recognizing Human Actions: A Local SVM Approach," Proc. 17th Int'l Conf. Pattern Recognition (ICPR '04), pp. 32-36, 2004.
[32] C. Shen, H. Li, and M.J. Brooks, "A Convex Programming Approach to the Trace Quotient Problem," Proc. Eighth Asian Conf. Computer Vision (ACCV), pp. 227-235, 2007.
[33] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1470-1477, Oct. 2003.
[34] J. Sivic, B. Russell, A.A. Efros, A. Zisserman, and B. Freeman, "Discovering Objects and Their Location in Images," Proc. IEEE Int'l Conf. Computer Vision (ICCV '05), Oct. 2005.
[35] N. Slonim and N. Tishby, "Agglomerative Information Bottleneck," Proc. Conf. Neural Information Processing Systems (NIPS), pp. 617-623, 1999.
[36] K.E.A. van de Sande, T. Gevers, and C.G.M. Snoek, "Evaluating Color Descriptors for Object and Scene Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1582-1596, Sept. 2010.
[37] A. Vedaldi and B. Fulkerson, "VLFeat: An Open and Portable Library of Computer Vision Algorithms," Software http:/www.vlfeat.org/, 2008.
[38] A. Vedaldi and A. Zisserman, "Efficient Additive Kernels via Explicit Feature Maps," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 480-492, Mar. 2012.
[39] H. Wang, S. Yan, D. Xu, X. Tang, and T.S. Huang, "Trace Ratio versus Ratio Trace for Dimensionality Reduction," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2007.
[40] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, "Locality-Constrained Linear Coding for Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
[41] L. Wang, "Feature Selection with Kernel Class Separability," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 9, pp. 1534-1546, Dec. 2008.
[42] L. Wang, L. Zhou, and C. Shen, "A Fast Algorithm for Creating a Compact and Discriminative Visual Codebook," Proc. 10th European Conf. Computer Vision (ECCV '08), pp. 719-732, 2008.
[43] J. Winn, A. Criminisi, and T. Minka, "Object Categorization by Learned Universal Visual Dictionary," Proc. 10th IEEE Int'l Conf. Computer Vision, pp. 1800-1807, 2005.
[44] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, "Graph Embedding and Extensions: A General Framework for Dimensionality Reduction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40-51, Jan. 2007.
[45] J. Yang, K. Yu, Y. Gong, and T.S. Huang, "Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1794-1801, 2009.
[46] L. Yang, R. Jin, R. Sukthankar, and F. Jurie, "Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[47] J. Yuan, Y. Wu, and M. Yang, "Discovery of Collocation Patterns: From Visual Words to Visual Phrases," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[48] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, "Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study," Int'l J. Computer Vision, vol. 73, no. 2, pp. 213-238, 2007.
75 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool