This Article 
 Bibliographic References 
 Add to: 
Fast Branch & Bound Algorithms for Optimal Feature Selection
July 2004 (vol. 26 no. 7)
pp. 900-912

Abstract—A novel search principle for optimal feature subset selection using the Branch & Bound method is introduced. Thanks to a simple mechanism for predicting criterion values, a considerable amount of time can be saved by avoiding many slow criterion evaluations. We propose two implementations of the proposed prediction mechanism that are suitable for use with nonrecursive and recursive criterion forms, respectively. Both algorithms find the optimum usually several times faster than any other known Branch & Bound algorithm. As the algorithm computational efficiency is crucial, due to the exponential nature of the search problem, we also investigate other factors that affect the search performance of all Branch & Bound algorithms. Using a set of synthetic criteria, we show that the speed of the Branch & Bound algorithms strongly depends on the diversity among features, feature stability with respect to different subsets, and criterion function dependence on feature set size. We identify the scenarios where the search is accelerated the most dramatically (finish in linear time), as well as the worst conditions. We verify our conclusions experimentally on three real data sets using traditional probabilistic distance criteria.

[1] R. Caruana and D. Freitag, Greedy Attribute Selection Proc. Int'l Conf. Machine Learning, pp. 28-36, 1994.
[2] N. Chaikla and Y. Qi, Genetic Algorithms in Feature Selection Proc. IEEE Int'l Conf. Systems, Man, and Cybernetics, vol. 5, pp. 538-540, 1999.
[3] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice Hall, 1982.
[4] I. Foroutan and J. Sklansky, Feature Selection for Automatic Classification of Non-Gaussian Data IEEE Trans. Systems, Man, and Cybernetics, vol. 17, pp. 187-198, 1987.
[5] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, Inc., 1990.
[6] M. Gengler and G. Coray, A Parallel Best-First Branch and Bound Algorithm and Its Aximatization Parallel Algorithms and Applications, vol. 2, pp. 61-80, 1994.
[7] Y. Hamamoto, S. Uchimura, Y. Matsuura, T. Kanaoka, and S. Tomita, Evaluation of the Branch and Bound Algorithm for Feature Selection Pattern Recognition Letters, vol. 11, no. 7, pp. 453-456, July 1990.
[8] A. Iamnitchi and I. Foster, A Problem-Specific Fault-Tolerance Mechanism for Asynchronous, Distributed Systems Proc. Int'l Conf. Parallel Processing, pp. 4-14, 2000.
[9] A. Jain and D. Zongker, Feature Selection: Evaluation, Application, and Small Sample Performance IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.
[10] S. Kirkpatrick, C.D. GelattJr, and M.P. Vecchi, Optimization by Simulated Annealing Science, vol. 220, no. 4598, pp. 671-680, 1983.
[11] R. Kohavi and G.H. John, Wrappers for Feature Subset Selection Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[12] D. Koller and M. Sahami, Toward Optimal Feature Selection Proc. 13th Int'l Conf. Machine Learning, pp. 284-292, 1996.
[13] R.E. Korf, Artificial Intelligence Search Algorithms Handbook of Algorithms and Theory of Computation, chapter 36, CRC press, 1999.
[14] M. Kudo and J. Sklansky, Comparison of Algorithms that Select Features for Pattern Classifiers Pattern Recognition, vol. 33, no. I, pp. 25-41, Jan. 2000.
[15] V. Kumar and L.N. Kanal, A General Branch and Bound Formulation for Understanding and Synthesizing and/or Tree Search Procedures Artificial Intelligence, vol. 21, pp. 179-198, 1983.
[16] E.L. Lawler and D.E. Wood, Branch and Bound Methods: A Survey Operations Research, vol. 149, pp. 699-719, 1966.
[17] H. Liu, H. Motoda, and M. Dash, A Monotonic Measure for Optimal Feature Selection Proc. European Conf. Machine Learning, pp. 101-106, 1998.
[18] A. Mitschele-Thiel, Optimal Compile-Time Scheduling and Network Configuration Transputers '94: Advanced Research and Industrial Applications IOS Press, pp. 153-164, 1994.
[19] P.M. Murphy and D.W. Aha, UCI Repository of Machine Learning Databases (Machine-Readable Data Repository) Dept. of Information and Computer Science, Univ. of California, Irvine, 1994.
[20] P.M. Narendra and K. Fukunaga, A Branch and Bound Algorithm for Feature Subset Selection IEEE Trans. Computers, vol. 26, no. 9, pp. 917-922, Sept. 1977.
[21] N.J. Nilsson, Problem-Solving Methods in Artificial Intelligence. McGraw-Hill, 1971.
[22] N.J. Nilsson, Artificial Intelligence: A New Synthesis. Morgan Kaufmann 1998.
[23] P. Somol, P. Pudil, F.J. Ferri, and J. Kittler, Fast Branch&Bound Algorithm in Feature Selection Proc. Fourth World Multiconf. Systemics, Cybernetics, and Informatics, vol. 7, Part 1, pp. 646-651, 2000.
[24] P. Somol, P. Pudil, and J. Grim, Branch&Bound Algorithm with Partial Prediction for Use with Recursive and Non-Recursive Criterion Forms Lecture Notes in Computer Science, vol. 2013, pp. 230-238, 2001.
[25] G.I. Webb, OPUS: An Efficient Admissible Algorithm for Unordered Search J. Artificial Intelligence Research, vol. 3, pp. 431-465, 1995.
[26] Ch. Xu, S. Tschke, and B. Monien, Performance Evaluation of Load Distribution Strategies in Parallel Branch and Bound Computations Proc. Seventh IEEE Symp. Parallel and Distributed Processing, pp. 402-405, 1995.
[27] M.K. Yang and C.R. Das, A Parallel Optimal Branch-and-Bound Algorithm for MIN-Based Multiprocessors Proc. IEEE 1999 Int'l Conf. Parallel Processing, pp. 112-119, 1999.
[28] B. Yu and B. Yuan, A More Efficient Branch and Bound Algorithm for Feature Selection Pattern Recognition, vol. 26, pp. 883-889, 1993.

Index Terms:
Subset search, feature selection, search tree, optimum search, subset selection, dimensionality reduction, artificial intelligence.
Petr Somol, Pavel Pudil, Josef Kittler, "Fast Branch & Bound Algorithms for Optimal Feature Selection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 7, pp. 900-912, July 2004, doi:10.1109/TPAMI.2004.28
Usage of this product signifies your acceptance of the Terms of Use.