Complex diseases such as various types of cancer and diabetes are conjectured to be triggered and influenced by a combination of genetic and environmental factors. To integrate potential effects from interplay among underlying candidate factors, we propose a new network-based framework to identify effective biomarkers by searching for groups of synergistic risk factors with high predictive power to disease outcome. An interaction network is constructed with node weights representing individual predictive power of candidate factors and edge weights capturing pairwise synergistic interactions among factors. We then formulate this network-based biomarker identification problem as a novel graph optimization model to search for multiple cliques with maximum overall weight, which we denote as the MaximumWeighted Multiple Clique Problem (MWMCP). To achieve optimal or near optimal solutions, both an analytical algorithm based on column generation method and a fast heuristic for largescale networks have been derived. Our algorithms for MWMCP have been implemented to analyze two biomedical datasets: a Type 1 Diabetes (T1D) dataset from the Diabetes Prevention Trial-Type 1 (DPT-1) study, and a breast cancer genomics dataset for metastasis prognosis. The results demonstrate that our network-based methods can identify important biomarkers with better prediction accuracy compared to the conventional feature selection that only considers individual effects.
Amin Ahmadi Adl, "Network-based Methods to Identify Highly Discriminating Subsets of Biomarkers", IEEE/ACM Transactions on Computational Biology and Bioinformatics, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/TCBB.2014.2325014