This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity
July-Aug. 2012 (vol. 9 no. 4)
pp. 1257-1263
Hau-San Wong, Dept. of Comput. Sci., City Univ. of Hong Kong, Kowloon, China
Shaohong Zhang, Dept. of Comput. Sci., Guangzhou Univ., Guangzhou, China
Ying Shen, Dept. of Comput. Sci., City Univ. of Hong Kong, Kowloon, China
Dongqing Xie, Dept. of Comput. Sci., Guangzhou Univ., Guangzhou, China
Feature selection is widely established as one of the fundamental computational techniques in mining microarray data. Due to the lack of categorized information in practice, unsupervised feature selection is more practically important but correspondingly more difficult. Motivated by the cluster ensemble techniques, which combine multiple clustering solutions into a consensus solution of higher accuracy and stability, recent efforts in unsupervised feature selection proposed to use these consensus solutions as oracles. However, these methods are dependent on both the particular cluster ensemble algorithm used and the knowledge of the true cluster number. These methods will be unsuitable when the true cluster number is not available, which is common in practice. In view of the above problems, a new unsupervised feature ranking method is proposed to evaluate the importance of the features based on consensus affinity. Different from previous works, our method compares the corresponding affinity of each feature between a pair of instances based on the consensus matrix of clustering solutions. As a result, our method alleviates the need to know the true number of clusters and the dependence on particular cluster ensemble approaches as in previous works. Experiments on real gene expression data sets demonstrate significant improvement of the feature ranking results when compared to several state-of-the-art techniques.

[1] J. Quackenbush, "Computational Analysis of Microarray Data," Nature Rev. Genetics, vol. 2, no. 6, pp. 418-427, 2001.
[2] P. Baldi and G. Hatfield, DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling. Cambridge Univ. Press, 2002.
[3] N. Armstrong and M. van de Wiel, "Microarray Data Analysis: From Hypotheses to Conclusions Using Gene Expression Data," Cellular Oncology, vol. 26, nos. 5/6, pp. 279-290, 2004.
[4] H. Peng, F. Long, and C. Ding, "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
[5] J. Herrero, R. Dlaz-Uriarte, and J. Dopazo, "Gene Expression Data Preprocessing," Bioinformatics, vol. 19, no. 5, pp. 655-656, http://dblp.uni-trier.de/db/journals/bioinformatics bioinformatics19. html#HerreroDD03 , 2003.
[6] R. Varshavsky, A. Gottlieb, M. Linial, and D. Horn, "Novel Unsupervised Feature Filtering of Biological Data," Bioinformatics, vol. 22, no. 14, pp. 507-513, 2006.
[7] Y. Saeys, I. Inza, and P. Larrañaga, "A Review of Feature Selection Techniques in Bioinformatics," Bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007.
[8] S. Zhu, D. Wang, K. Yu, T. Li, and Y. Gong, "Feature Selection for Gene Expression Using Model-Based Entropy," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 25-36, Jan.-Mar. 2010.
[9] F. Yang and K. Mao, "Robust Feature Selection for Microarray Data Based on Multi-Criterion Fusion," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 1080-1092, July/Aug. 2011.
[10] H. Liu and R. Setiono, "A Probabilistic Approach to Feature Selection—A Filter Solution," Proc. 13th Int'l Conf. Machine Learning, pp. 319-327, 1996.
[11] M. Hall, "Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning," Proc. 17th Int'l Conf. Machine Learning, pp. 359-366, 2000.
[12] L. Yu and H. Liu, "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution," Proc. 20th Int'l Conf. Machine Learning, vol. 20, no. 2, pp. 856-863, 2003.
[13] R. Kohavi and G. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, nos. 1/2, pp. 273-324, 1997.
[14] C. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.
[15] C. Ding, "Unsupervised Feature Selection via Two-Way Ordering in Gene Expression Analysis," Bioinformatics, vol. 19, no. 10, pp. 1259-1266, 2003.
[16] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol. 3, pp. 1157-1182, http://jmlr.csail.mit.edu/papers/volume3/ guyon03aguyon03a.pdf, 2003.
[17] X. He, D. Cai, and P. Niyogi, "Laplacian Score for Feature Selection," Proc. Advances in Neural Information Processing Systems (NIPS), http://dblp.uni-trier.de/db/conf/nipsnips2005.html#HeCN05 , 2005.
[18] L. Wolf and A. Shashua, "Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach," J. Machine Learning Research, vol. 6, pp. 1855-1887, http://dblp.uni-trier.de/db/journals/jmlr jmlr6.html#WolfS05, 2005.
[19] Z. Zhao and H. Liu, "Spectral Feature Selection for Supervised and Unsupervised Learning," Proc. 24th Int'l Conf. Machine Learning, pp. 1151-1157, 2007.
[20] J.B. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, vol. 1, pp. 281-297, 1967.
[21] Y. Hong, S. Kwong, Y. Chang, and Q. Ren, "Consensus Unsupervised Feature Ranking from Multiple Views," Pattern Recognition Letters, vol. 29, no. 5, pp. 595-602, 2008.
[22] H. Elghazel and A. Aussem, "Feature Selection for Unsupervised Learning Using Random Cluster Ensembles," Proc. IEEE 10th Int'l Conf. Data Mining, pp. 168-175, 2010.
[23] Y. Hong, S. Kwong, Y. Chang, and Q. Ren, "Unsupervised Feature Selection Using Clustering Ensembles and Population-Based Incremental Learning Algorithm," Pattern Recognition, vol. 41, no. 9, pp. 2742-2756, 2008.
[24] A. Strehl and J. Ghosh, "Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions," J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
[25] X.Z. Fern and C.E. Brodley, "Solving Cluster Ensemble Problems by Bipartite Graph Partitioning," Proc. 21st Int'l Conf. Machine Learning, 2004.
[26] A.L.N. Fred and A.K. Jain, "Combining Multiple Clusterings Using Evidence Accumulation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835-850, June 2005.
[27] L.I. Kuncheva and D. Vetrov, "Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 28, no. 11, pp. 1798-1808, Nov. 2006.
[28] J.G. Dy and C.E. Brodley, "Feature Selection for Unsupervised Learning," J. Machine Learning Research, vol. 5, pp. 845-889, http://dblp.uni-trier.de/db/journals/jmlr jmlr5.html#DyB04, 2004.
[29] S. Zhang and H.-S. Wong, "Arimp: A Generalized Adjusted Rand Index for Cluster Ensembles," Proc. 20th Int'l Conf. Pattern Recognition (ICPR '10), 2010.
[30] L. Hubert and P. Arabie, "Comparing Partitions," J. Classification, vol. 2, pp. 193-218, 1985.
[31] N. Vinh, J. Epps, and J. Bailey, "Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?" Proc. 26th Ann. Int'l Conf. Machine Learning, 2009.
[32] W.M. Rand, "Objective Criteria for the Evaluation of Clustering Methods," J. Am. Statistical Assoc., vol. 66, pp. 846-850, 1971.
[33] K. Wagstaff, C. Cardie, S. Rogers, and S. Schroed, "Constrained k-Means Clustering with Background Knowledge," Proc. 18th Int'l Conf. Machine Learning, 2001.
[34] E.P. Xing, A.Y. Ng, M.I. Jordan, and S. Russell, "Distance Metric Learning, with Application to Clustering with Side-Information," Proc. Advances in Neural Information Processing Systems 15, pp. 505-512, 2003.
[35] R.J.G.B. Campello, "A Fuzzy Extension of the Rand Index and Other Related Indexes for Clustering and Classification Assessment," Pattern Recognition Letters, vol. 28, no. 7, pp. 833-841, 2007.
[36] S. Armstrong, J. Staunton, L. Silverman, R. Pieters, M. den Boer, M. Minden, S. Sallan, E. Lander, T. Golub, and S. Korsmeyer, "MLL Translocations Specify a Distinct Gene Expression Profile That Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, 2002.
[37] M. De Souto, I. Costa, D. De Araujo, T. Ludermir, and A. Schliep, "Clustering Cancer Gene Expression Data: A Comparative Study," BMC Bioinformatics, vol. 9, no. 1,article 497, 2008.
[38] C. Nutt et al., "Gene Expression-Based Classification of Malignant Gliomas Correlates Better with Survival Than Histological Classification," Cancer Research, vol. 63, no. 7, pp. 1602-1607, 2003.
[39] S. Tomlins et al., "Integrative Molecular Concept Modeling of Prostate Cancer Progression," Nature Genetics, vol. 39, no. 1, pp. 41-51, 2006.
[40] M. Bredel, C. Bredel, D. Juric, G. Harsh, H. Vogel, L. Recht, and B. Sikic, "Functional Network Analysis Reveals Extended Gliomagenesis Pathway Maps and Three Novel Myc-Interacting Genes in Human Gliomas," Cancer Research, vol. 65, no. 19, pp. 8679-8689, 2005.
[41] A.P. Topchy, A.K. Jain, and W.F. Punch, "Clustering Ensembles: Models of Consensus and Weak Partitions," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[42] J. Aslam and M. Montague, "Models for Metasearch," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 276-284, 2001.

Index Terms:
lab-on-a-chip,biology computing,data mining,feature extraction,genetic algorithms,genetics,multiple clustering solutions,unsupervised feature ranking method,gene expression data,consensus affinity,feature selection,fundamental computational techniques,microarray data mining,cluster ensemble techniques,Clustering algorithms,Gene expression,Indexes,Bioinformatics,Partitioning algorithms,Principal component analysis,Laplace equations,cluster ensembles.,Unsupervised feature ranking,gene selection
Citation:
Hau-San Wong, Shaohong Zhang, Ying Shen, Dongqing Xie, "A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1257-1263, July-Aug. 2012, doi:10.1109/TCBB.2012.34
Usage of this product signifies your acceptance of the Terms of Use.