This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Mining for Case-Based Reasoning in High-Dimensional Biological Domains
August 2005 (vol. 17 no. 8)
pp. 1127-1137
Case-based reasoning (CBR) is a suitable paradigm for class discovery in molecular biology, where the rules that define the domain knowledge are difficult to obtain and the number and the complexity of the rules affecting the problem are too large for formal knowledge representation. To extend the capabilities of CBR, we propose the mixture of experts for case-based reasoning (MOE4CBR), a method that combines an ensemble of CBR classifiers with spectral clustering and logistic regression. Our approach not only achieves higher prediction accuracy, but also leads to the selection of a subset of features that have meaningful relationships with their class labels. We evaluate MOE4CBR by applying the method to a CBR system called {TA3}—a computational framework for CBR systems. For two ovarian mass spectrometry data sets, the prediction accuracy improves from 80 percent to 93 percent and from 90 percent to 98.4 percent, respectively. We also apply the method to leukemia and lung microarray data sets with prediction accuracy improving from 65 percent to 74 percent and from 60 percent to 70 percent, respectively. Finally, we compare our list of discovered biomarkers with the lists of selected biomarkers from other studies for the mass spectrometry data sets.

[1] E.F. Petricoin, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, and L.A. Liotta, “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer,” Lancet, vol. 359, no. 9306, pp. 572-577, 2002.
[2] I. Jurisica and J. Glasgow, “Application of Case-Based Reasoning in Molecular Biology,” Artificial Intelligence Magazine, special issue on bioinformatics, vol. 25, no. 1, pp. 85-95, 2004.
[3] Case-Based Reasoning: Experiences, Lessons, and Future Directions, M. Lenz, B. Bartsch-Sporl, H. Burkhard, and S. Wess, eds., Springer, 1998.
[4] Case-Based Reasoning: Experiences, Lessons, and Future Directions, D.B. Leake, ed. AAAI Press/MIT Press, 1996.
[5] C. Marling, M. Sqalli, E. Rissland, H. Munoz-Avila, and D. Aha, “Case-Based Reasoning Integrations,” AI Magazine, vol. 23, no. 1, pp. 69-86, 2002.
[6] Case-Based Reasoning Integrations: Papers from the 1998 Workshop, D. Aha and J.J. Daniels, eds. Menlo Park, Calif.: AAAI Press, 1998.
[7] N. Arshadi and I. Jurisica, “Maintaining Case-Based Reasoning Systems: A Machine Learning Approach,” Advances in Case-Based Reasoning: Proc. Seventh European Conf., pp. 17-31, 2004.
[8] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kauffmann, 2000.
[9] D.W. Aha and R. Bankert, “Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison,” Proc. AAAI-94 Workshop Case-Based Reasoning, pp. 106-112 1994.
[10] Q. Yang and J. Wu, “Keep It Simple: A Case-Base Maintenance Policy Based on Clustering and Information Theory,” Advances in Artificial Intelligence, Proc. 13th Biennial Conf. Canadian Soc. for Computational Studies of Intelligence, pp. 102-114, May 2000.
[11] A.Y. Ng, M.I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and An Algorithm,” Advances in Neural Information Processing Systems 14, Z.G.G. Dieterich and S. Becker, ed., MIT Press, 2002.
[12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[13] I. Jurisica, J. Glasgow, and J. Mylopoulos, “Incremental Iterative Retrieval and Browsing for Efficient Conversational CBR Systems,” Int'l J. Applied Intelligence, vol. 12, no. 3, pp. 251-268, 2000.
[14] T. Kohonen, Self-Organizing Maps. Springer, 1995.
[15] J. Devore, Probability and Statistics for Engineering and the Sciences. Duxbury Press, 1995.
[16] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gassenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, 1999.
[17] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, E. Dmitrovsky, E. Lander, and T. Golub, “Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation,” Proc. Nat'l Academy of Science USA, vol. 96, no. 6, pp. 2907-2912, 1999.
[18] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[19] J. Jaeger, B. Sengupta, and W. Ruzzo, “Improved Gene Selection for Classification of Microarrays,” Proc. Pacific Symp. Biocomputing, pp. 53-64, 2003.
[20] W. Zhu, X. Wang, Y. Ma, M. Rao, J. Glimm, and J.S. Kovach, “Detection of Cancer-Specific Markers Amid Massive Mass Spectral Data,” Proc. Nat'l Academy of Sciences USA, vol. 100, no. 25, pp. 14666-14671, 2003.
[21] B. Wu, T. Abbott, D. Fishman, W. McMurray, G. Mor, K. Stone, D. Ward, K. Williams, and H. Zhao, “Comparison of Statistical Methods for Classification of Ovarian Cancer Using Mass Spectrometry Data,” Bioinformatics, vol. 19, no. 13, pp. 1636-1643, 2003.
[22] K. Baggerly, J. Morris, and K. Coombes, “Reproducibility of Seldi-Tof Protein Patterns in Serum: Comparing Datasets from Different Experiments,” Bioinformatics, vol. 20, no. 5, pp. 777-785, 2004.
[23] E.P. Xing, M.L. Jordan, and R.M. Karp, “Feature Selection for High-Dimensional Genomic Microarray Data,” Proc. 18th Int'l Conf. Machine Learning, pp. 601-608, 2001.
[24] S. Mukherjee, “Classifying Microarray Data Using Support Vector Machines,” A Practical Approach to Microarray Data Analysis, D. Berrar, W. Dubitzky, and M. Granzow, eds., chapter 9, pp. 166-185, Kluwer Academic, 2003.
[25] Y. LeCun, J. Denker, and S. Solla, “Optimal Brain Damage,” Advances in Neural Information Processing Systems, D. Touretzky, ed., vol. 2,. pp. 598-605, San Mateo, Calif.: Morgan Kaufmann, 1990.
[26] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, nos. 1/3, pp. 389-422, 2002.
[27] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton, “Adaptive Mixture of Local Experts,” Neural Computation, vol. 3, pp. 79-87, 1991.
[28] I. Jurisica, J. Mylopoulos, J. Glasgow, H. Shapiro, and R.F. Casper, “Case-Based Reasoning in IVF: Prediction and Knowledge Mining,” Artificial Intelligence in Medicine, vol. 12, pp. 1-24, 1998.
[29] I. Jurisica, P. Rogers, J. Glasgow, S. Fortier, J. Luft, J. Wolfley, M. Bianca, D. Weeks, and G. DeTitta, “Intelligent Decision Support for Protein Crystal Growth,” IBM Systems J., vol. 40, no. 2, pp. 394-409, 2001.
[30] J. Mylopoulos, A. Borgida, M. Jarke, and M. Koubarakis, “Telos: Representing Knowledge about Information Systems,” ACM Trans. Information Systems, vol. 8, no. 4, pp. 325-362, 1990.
[31] D. Wettschereck and T. Dietterich, “An Experimental Comparison of the Nearest Neighbor and Nearest Hyperrectangle Algorithms,” Machine Learning, vol. 19, no. 1, pp. 5-27, 1995.
[32] I. Jurisica and J. Glasgow, “Improving Performance of Case-Based Classification Using Context-Based Relevance,” Int'l J. Artificial Intelligence Tools, special issue of IEEE ITCAI-96 Best Papers, vol. 6, no. 4, pp. 511-536, 1997.
[33] N. Arshadi and I. Jurisica, “Maintaining Case-Based Reasoning in High-Dimensional Domains Using Mixture of Experts,” Dept. of Computer Science, Univ. of Toronto, Technical Report CSRG-490, June 2004.
[34] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[35] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R.B. Altman, “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics, vol. 17, pp. 520-525, 2001.
[36] J.M. Sorace and M. Zhan, “A Data Review and Re-Assessment of Ovarian Cancer Serum Proteomic Profiling,” BMC Bioinformatics, vol. 4, no. 24, pp. 666-671, available at http://www.biomedcentral.com/1471-2105/4 24, 2003.
[37] J. Dunn, “Well Separated Clusters and Optimal Fuzzy Partitions,” J. Cybernetics, vol. 4, pp. 95-104, 1974.
[38] R. Baeza-Yates and B. Ribiero-Neto, Modern Information Retrieval. Addison-Wesley, 1999.
[39] K.A. Baggerly, J.S. Morris, S.R. Edmonson, and K.R. Coombes, “Signal in Noise: Evaluating Reported Reproducibility of Serum Proteomic Tests for Ovarian Cancer,” J. Nat'l Cancer Inst., vol. 97, no. 4, pp. 307-309, 2005.
[40] L. Jones, S.-K. Ng, C. Ambroise, and G. McLachlan, “Use of Microarray Data via Model-Based Classification in the Study and Prediction of Survival from Lung Cancer,” Critical Assessment of Microarray Data Analysis, K. Johnson and S. Lin, eds., pp. 38-42, 2003.
[41] P. Andritsos, P. Tsaparas, R.J. Miller, and K.C. Sevcik, “LIMBO: Scalable Clustering of Categorical Data,” Proc. Ninth Int'l Conf. Extending DataBase Technology (EDBT), pp. 123-146, 2004.

Index Terms:
Index Terms- Machine learning, data mining, clustering, feature selection, case-based reasoning classifiers, microarray data analysis, mass spectrometry data analysis, biomarker discovery.
Citation:
Niloofar Arshadi, Igor Jurisica, "Data Mining for Case-Based Reasoning in High-Dimensional Biological Domains," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 8, pp. 1127-1137, Aug. 2005, doi:10.1109/TKDE.2005.124
Usage of this product signifies your acceptance of the Terms of Use.