A Generalized Multivariate Approach to Pattern Discovery from Replicated and Incomplete Genome-Wide Measurements
Issue No. 05 - September/October (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.102
Dongxiao Zhu , University of New Orleans, New Orleans, Children's Hospital, New Orleans, and Tulane University Cancer Center, New Orleans
Lipi Acharya , University of New Orleans, New Orleans
Hui Zhang , Novartis Pharmaceutical Corporation, East Hanover
Estimation of pairwise correlation from incomplete and replicated molecular profiling data is an ubiquitous problem in pattern discovery analysis, such as clustering and networking. However, existing methods solve this problem by ad hoc data imputation, followed by aveGation coefficient type approaches, which might annihilate important patterns present in the molecular profiling data. Moreover, these approaches do not consider and exploit the underlying experimental design information that specifies the replication mechanisms. We develop an Expectation-Maximization (EM) type algorithm to estimate the correlation structure using incomplete and replicated molecular profiling data with a priori known replication mechanism. The approach is sufficiently generalized to be applicable to any known replication mechanism. In case of unknown replication mechanism, it is reduced to the parsimonious model introduced previously. The efficacy of our approach was first evaluated by comprehensively comparing various bivariate and multivariate imputation approaches using simulation studies. Results from real-world data analysis further confirmed the superior performance of the proposed approach to the commonly used approaches, where we assessed the robustness of the method using data sets with up to 30 percent missing values.
Replicated data, pairwise correlation, pattern recognition, unsupervised learning, missing value.
L. Acharya, H. Zhang and D. Zhu, "A Generalized Multivariate Approach to Pattern Discovery from Replicated and Incomplete Genome-Wide Measurements," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 1153-1169, 2010.