Issue No.08 - August (2005 vol.17)
pp: 1127-1137
Case-based reasoning (CBR) is a suitable paradigm for class discovery in molecular biology, where the rules that define the domain knowledge are difficult to obtain and the number and the complexity of the rules affecting the problem are too large for formal knowledge representation. To extend the capabilities of CBR, we propose the mixture of experts for case-based reasoning (MOE4CBR), a method that combines an ensemble of CBR classifiers with spectral clustering and logistic regression. Our approach not only achieves higher prediction accuracy, but also leads to the selection of a subset of features that have meaningful relationships with their class labels. We evaluate MOE4CBR by applying the method to a CBR system called {TA3}—a computational framework for CBR systems. For two ovarian mass spectrometry data sets, the prediction accuracy improves from 80 percent to 93 percent and from 90 percent to 98.4 percent, respectively. We also apply the method to leukemia and lung microarray data sets with prediction accuracy improving from 65 percent to 74 percent and from 60 percent to 70 percent, respectively. Finally, we compare our list of discovered biomarkers with the lists of selected biomarkers from other studies for the mass spectrometry data sets.
Index Terms- Machine learning, data mining, clustering, feature selection, case-based reasoning classifiers, microarray data analysis, mass spectrometry data analysis, biomarker discovery.
Niloofar Arshadi, Igor Jurisica, "Data Mining for Case-Based Reasoning in High-Dimensional Biological Domains", IEEE Transactions on Knowledge & Data Engineering, vol.17, no. 8, pp. 1127-1137, August 2005, doi:10.1109/TKDE.2005.124