The Community for Technology Leaders
Green Image
Issue No. 04 - July-Aug. (2018 vol. 15)
ISSN: 1545-5963
pp: 1315-1324
Chun-Qiu Xia , School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei, Nanjing, China
Ke Han , School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei, Nanjing, China
Yong Qi , School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei, Nanjing, China
Yang Zhang , Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI
Dong-Jun Yu , School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei, Nanjing, China
ABSTRACT
Accurate identification of the cancer types is essential to cancer diagnoses and treatments. Since cancer tissue and normal tissue have different gene expression, gene expression data can be used as an efficient feature source for cancer classification. However, accurate cancer classification directly using original gene expression profiles remains challenging due to the intrinsic high-dimension feature and the small size of the data samples. We proposed a new self-training subspace clustering algorithm under low-rank representation, called SSC-LRR, for cancer classification on gene expression data. Low-rank representation (LRR) is first applied to extract discriminative features from the high-dimensional gene expression data; the self-training subspace clustering (SSC) method is then used to generate the cancer classification predictions. The SSC-LRR was tested on two separate benchmark datasets in control with four state-of-the-art classification methods. It generated cancer classification predictions with an overall accuracy 89.7 percent and a general correlation 0.920, which are 18.9 and 24.4 percent higher than that of the best control method respectively. In addition, several genes (RNF114, HLA-DRB5, USP9Y, and PTPN20) were identified by SSC-LRR as new cancer identifiers that deserve further clinical investigation. Overall, the study demonstrated a new sensitive avenue to recognize cancer classifications from large-scale gene expression data.
INDEX TERMS
Cancer, Gene expression, Clustering algorithms, Classification algorithms, Sparse matrices, Benchmark testing, Matrix decomposition
CITATION

C. Xia, K. Han, Y. Qi, Y. Zhang and D. Yu, "A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 15, no. 4, pp. 1315-1324, 2018.
doi:10.1109/TCBB.2017.2712607
418 ms
(Ver 3.3 (11022016))