The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2014 vol.36)
pp: 507-520
Zeynep Akata , Xerox Res. Centre Eur., Meylan, France
Florent Perronnin , Xerox Res. Centre Eur., Meylan, France
Zaid Harchaoui , INRIA Grenoble Rhone-Alpes, Isere, France
Cordelia Schmid , INRIA Grenoble Rhone-Alpes, Isere, France
ABSTRACT
We benchmark several SVM objective functions for large-scale image classification. We consider one-versus-rest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that ranking-based algorithms do not outperform the one-versus-rest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through cross-validation the optimal rebalancing of positive and negative examples can result in a significant improvement for the one-versus-rest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these "good practices," we were able to improve the state of the art on a large subset of 10K classes and 9M images of ImageNet from 16.7 percent Top-1 accuracy to 19.1 percent.
INDEX TERMS
Support vector machines, Training, Linear programming, Accuracy, Optimization, Visualization, Encoding,stochastic learning, Large scale, fine-grained visual categorization, image classification, ranking, SVM
CITATION
Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid, "Good Practice in Large-Scale Learning for Image Classification", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.36, no. 3, pp. 507-520, March 2014, doi:10.1109/TPAMI.2013.146
REFERENCES
[1] E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers," Proc. Int'l Conf. Machine Learning (ICML), 2000.
[2] B. Bai, J. Weston, D. Grangier, R. Collobert, O. Chapelle, and K. Weinberger, "Supervised Semantic Indexing," Proc. 18th ACM Conf. Information and Knowledge Management (CIKM), 2009.
[3] P.L. Bartlett, M.I. Jordan, and J.D. McAuliffe, "Convexity Classification, and Risk Bounds," Proc. Conf. Neural Information Processing Systems (NIPS), 2003.
[4] S. Bengio, J. Weston, and D. Grangier, "Label Embedding Trees for Large Multi-Class Tasks," Proc. Conf. Neural Information Processing Systems (NIPS), 2010.
[5] Y. Bengio, A. Courville, and P. Vincent, Representation Learning: A Rev. and New Perspectives.
[6] A. Berg, J. Deng, and L. Fei-Fei, ILSVRC, http://www.image-net. org/challenges/LSVRC/ 2010index, 2010.
[7] A. Bergamo, L. Torresani, and A. Fitzgibbon, "PICODES: Learning a Compact Code for Novel-Category Recognition," Proc. Conf. Neural Information Processing Systems (NIPS), 2011.
[8] A. Beygelzimer, V. Dani, T.P. Hayes, J. Langford, and B. Zadrozny, "Error Limiting Reductions between Classification Tasks," Proc. Int'l Conf. Machine Learning (ICML), 2005.
[9] A. Bordes, L. Bottou, P. Gallinari, and J. Weston, "Solving Multiclass Support Vector Machines with LaRank," Proc. Int'l Conf. Machine Learning (ICML), 2007.
[10] L. Bottou, SGD, http://leon.bottou.org/projectssgd, 2013.
[11] L. Bottou and O. Bousquet, "The Tradeoffs of Large Scale Learning," Proc. Conf. Neural Information Processing Systems (NIPS), 2007.
[12] Y.-L. Bourreau, F. Bach, Y. LeCun, and J. Ponce, "Learning Mid-Level Features for Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
[13] G. Burghouts and J.-M. Geusebroek, "Performance Evaluation of Local Colour Invariants," Computer Vision and Image Understanding, vol. 113, pp. 48-62, 2009.
[14] P.K. Chan and S.J. Stolfo, "On the Accuracy of Meta-Learning for Scalable Data Mining," J. Intelligent Information Systems, vol. 8, pp. 5-28, 1997.
[15] C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intelligent Systems and Technology, vol. 2, http://www.csie.ntu.edu.tw/cjlinlibsvm, 2011.
[16] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, "The Devil Is in the Details: An Evaluation of Recent Feature Encoding Methods," Proc. British Machine Vision Conf. (BMVC), 2011.
[17] K. Crammer and Y. Singer, "On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines," The J. Machine Learning Research, vol. 2, pp. 265-292, 2002.
[18] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual Categorization with Bags of Keypoints," Proc. ECCV Workshop Statistical Learning in Computer Vision Workshop, 2004.
[19] J. Dean, G. Corrado, R. Monga, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, "Large Scale Distributed Deep Networks," Proc. Conf. Neural Information Processing Systems (NIPS), 2012.
[20] J. Deng, A. Berg, K. Li, and L. Fei-Fei, "What Does Classifying More Than 10,000 Image Categories Tell Us?" Proc. 11th European Conf. Computer Vision (ECCV), 2010.
[21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A Large-Scale Hierarchical Image Database," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
[22] J. Deng, S. Satheesh, A. Berg, and L. Fei-Fei, "Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition," Proc. Conf. Neural Information Processing Systems (NIPS), 2011.
[23] T. Deselaers and V. Ferrari, "Visual and Semantic Similarity in Imagenet," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011.
[24] T.G. Dietterich and G. Bakiri, "Solving Multiclass Learning Problems via Error-Correcting Output Codes," J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995.
[25] M. Everingham, L.V. Gool, C. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," Int'l J. Computer Vision, vol. 88, pp. 303-338, 2010.
[26] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "LIBLINEAR: A Library for Large Linear Classification," The J. Machine Learning Research, vol. 9, pp. 1871-1874, 2008.
[27] J. Farquhar, S. Szedmak, H. Meng, and J. Shawe-Taylor, "Improving Bag-of-Keypoints Image Categorisation," technical report, Univ. of Southampton, 2005.
[28] V. Franc and S. Sonnenburg, "Optimized Cutting Plane Algorithm for Support Vector Machines," Proc. Int'l Conf. Machine Learning (ICML), 2008.
[29] T. Gao and D. Koller, "Discriminative Learning of Relaxed Hierarchy for Large-Scale Visual Recognition," Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2011.
[30] J. Gehrke, R. Ramakrishnan, and V. Ganti, "Rainforest—A Framework for Fast Decision Tree Construction of Large Data Sets," Proc. Data Mining and Knowledge Discovery (DMKD), 2000.
[31] Y. Gong and S. Lazebnik, "Comparing Data-Dependent and Data-Independent Embeddings for Classification and Ranking of Internet Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011.
[32] D. Grangier, F. Monay, and S. Bengio, "A Discriminative Approach for the Retrieval of Images from Text Queries," Proc. 17th European Conf. Machine Learning (ECML), 2006.
[33] C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S.S. Keerthi, and S. Sundararajan, "A Dual Coordinate Descent Method for Large-Scale Linear SVM," Proc. Int'l Conf. Machine Learning (ICML), 2008.
[34] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "What Is the Best Multi-Stage Architecture for Object Recognition?" Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2009.
[35] H. Jégou, M. Douze, and C. Schmid, "Product Quantization for Nearest Neighbor Search," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117-128, Jan. 2011.
[36] H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid, "Aggregating Local Image Descriptors Into Compact Codes," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1704-1716, Sept. 2012.
[37] T. Joachims, "Making Large-Scale Support Vector Machine Learning Practical," Advances in Kernel Methods, MIT Press, 1999.
[38] T. Joachims, "Optimizing Search Engines Using Clickthrough Data," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 133-142, 2002.
[39] T. Joachims, "Training Linear SVMs in Linear Time," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2006.
[40] A. Krizhevsky, I. Sutskever, and G. Hinton, "Image Classification with Deep Convolutional Neural Networks," Proc. Conf. Neural Information Processing Systems (NIPS), 2012.
[41] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2006.
[42] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng, "Building High-Level Features Using Large Scale Unsupervised Learning," Proc. Int'l Conf. Machine Learning (ICML), 2012.
[43] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel, "Handwritten Digit Recognition with a Back-Propagation Network," Proc. Conf. Neural Information Processing Systems (NIPS), 1989.
[44] Y. LeCun, L. Bottou, G. Orr, and K. Muller, "Efficient Backprop," Neural Networks: Tricks of the Trade, Springer, 1998.
[45] Y. LeCun, F. Huang, and L. Bottou, "Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2004.
[46] T. Lee, Y. Lin, and G. Wahba, "Multicategory Support Vector Machines: Theory and Application to the Classification of Microarray Data and Satellite Radiance Data," J. Am. Statistical Assoc., 2004.
[47] L. Li, H. Su, E. Xing, and L. Fei-Fei, "Object Bank: A High-Level Image Representation for Scene Classification and Semantic Feature Sparsification," Proc. Conf. Neural Information Processing Systems (NIPS), 2010.
[48] Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, and T. Huang, "Large-Scale Image Classification: Fast Feature Extraction and SVM Training," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011.
[49] Y. Lin, F. Lv, S. Zhu, K. Yu, M. Yang, and T. Cour, "Large-Scale Image Classification: Fast Feature Extraction and SVM Training," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011.
[50] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, pp. 91-110, 2004.
[51] S. Maji and A. Berg, "Max-Margin Additive Classifiers for Detection," Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2009.
[52] M. Marszalek and C. Schmid, "Constructing Category Hierarchies for Visual Recognition," Proc. 10th European Conf. Computer Vision (ECCV), 2008.
[53] M. Mehta, R. Agrawal, and J. Rissanen, "Sliq: A Fast Scalable Classifier for Data Mining," Proc. Int'l Conf. Extending Database Technology (EDBT), 1996.
[54] S. Nowozin and C. Lampert, "Structured Learning and Prediction in Computer Vision," Foundations and Trends in Computer Graphics and Vision, vol. 6, pp. 185-365, 2011.
[55] F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid, "Towards Good Practice in Large-Scale Learning for Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012.
[56] F. Perronnin and C. Dance, "Fisher Kernels on Visual Vocabularies for Image Categorization," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2007.
[57] F. Perronnin, J. Sánchez, and Y. Liu, "Large-Scale Image Categorization with Explicit Data Embedding," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
[58] F. Perronnin, J. Sánchez, and T. Mensink, "Improving the Fisher Kernel for Large-Scale Image Classification," Proc. 11th European Conf. Computer Vision (ECCV), 2010.
[59] J.C. Platt, "Fast Training of Support Vector Machines Using Sequential Minimal Optimization," Advances in Kernel Methods, MIT Press, 1999.
[60] M. Rastegari, C. Fang, and L. Torresani, "Scalable Object-Class Retrieval with Approximate and Top-K Ranking," Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2011.
[61] R. Rifkin and A. Klautau, "In Defense of One-vs-All Classification," The J. Machine Learning Research, vol. 5, pp. 101-141, 2004.
[62] M. Rohrbach, M. Stark, and B. Schiele, "Evaluating Knowledge Transfer and Zero-Shot Learning in a Large-Scale Setting," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011.
[63] L. Rokach and O. Maimon, "Top-Down Induction of Decision Trees Classifiers—A Survey," IEEE Trans. Systems, Man, and Cybernetics, vol. 35, no. 4, pp. 476-487, Nov. 2005.
[64] B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman, "LabelME: A Database and Web-Based Tool for Image Annotation," Int'l J. Computer Vision, vol. 77, pp. 157-173, 2008.
[65] S.L. Salzberg, "On Comparing Classifiers: Pitfalls Toavoid and a Recommended Approach," Data Mining and Knowledge Discovery, vol. 1, pp. 317-328, 1997.
[66] J. Sánchez and F. Perronnin, "High-Dimensional Signature Compression for Large-Scale Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2011.
[67] S. Shalev-Shwartz, Y. Singer, and N. Srebro, "Pegasos: Primal Estimate Sub-Gradient Solver for SVM," Proc. Int'l Conf. Machine Learning (ICML), 2007.
[68] D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, fourth ed. Chapman & Hall/CRC, 2007.
[69] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2003.
[70] D. Tao, X. Tang, X. Li, and X. Wu, "Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1088-1099, July 2006.
[71] A. Tewari and P.L. Bartlett, "On the Consistency of Multiclass Classification Methods," The J. Machine Learning Research, vol. 8, pp. 1007-1025, 2007.
[72] A. Torralba, R. Fergus, and W. Freeman, "80 Million Tiny Images: A Large Data Set for Non-Parametric Object and Scene Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, Nov. 2008.
[73] L. Torresani, M. Szummer, and A. Fitzgibbon, "Efficient Object Category Recognition Using Classemes," Proc. 11th European Conf. Computer Vision (ECCV), 2010.
[74] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, "Large Margin Methods for Structured and Interdependent Output Variables," The J. Machine Learning Research, vol. 6, pp. 1453-1484, 2005.
[75] N. Usunier, D. Buffoni, and P. Gallinari, "Ranking with Ordered Weighted Pairwise Classification," Proc. Int'l Conf. Machine Learning (ICML), 2009.
[76] J.C. van Gemert, C.J. Veenman, A.W.M. Smeulders, and J.M. Geusebroek, "Visual Word Ambiguity," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1271-1283, July 2010.
[77] A. Vedaldi and A. Zisserman, "Efficient Additive Kernels via Explicit Feature Maps," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
[78] A. Vedaldi and A. Zisserman, "Sparse Kernel Approximations for Efficient Classification and Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012.
[79] V. Vural and J.G. Dy, "A Hierarchical Method for Multi-Class Support Vector Machines," Proc. Int'l Conf. Machine Learning (ICML), 2004.
[80] G. Wang, D. Hoiem, and D. Forsyth, "Learning Image Similarity from Flickr Groups Using Stochastic Intersection Kernel Machines," Proc. IEEE Int'l Conf. Computer Vision (ICCV), 2009.
[81] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, "Locality-Constrained Linear Coding for Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
[82] J. Weston, S. Bengio, and N. Usunier, "Large Scale Image Annotation: Learning to Rank with Joint Word-Image Embeddings," Proc. European Conf. Machine Learning (ECML), 2010.
[83] J. Weston, S. Bengio, and N. Usunier, "WSABIE: Scaling up to Large Vocabulary Image Annotation," Proc. 22nd Int'l Joint Conf. Artificial Intelligence, 2011.
[84] J. Weston and C. Watkins, "Multi-Class Support Vector Machines," technical report, Dept. of Computer Science, Royal Holloway, Univ. of London, 1998.
[85] J. Weston and C. Watkins, "Support Vector Machines for Multi-Class Pattern Recognition," Proc. European Symp. Artificial Neural Networks (ESANN), 1999.
[86] J. Xu, T. Liu, M. Lu, H. Li, and W. Ma, "Directly Optimizing Evaluation Measures in Learning to Rank," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2008.
[87] J. Yang, K. Yu, Y. Gong, and T.S. Huang, "Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
[88] Y. Yue, T. Finley, F. Radlinski, and T. Joachims, "A Support Vector Method for Optimizing Average Precision," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2007.
[89] B. Zhao, L. Fei-Fei, and E. Xing, "Large-Scale Category Structure Aware Image Categorization," Proc. Conf. Neural Information Processing Systems (NIPS), 2011.
[90] Z. Zhou, K. Yu, T. Zhang, and T. Huang, "Image Classification Using Super-Vector Coding of Local Image Descriptors," Proc. 11th European Conf. Computer Vision (ECCV), 2010.
83 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool