Subscribe
Issue No.02 - April-June (2008 vol.5)
pp: 172-182
ABSTRACT
Static expression experiments analyze samples from many individuals. These samples are often snapshots of the progression of a certain disease such as cancer. This raises an intriguing question: Can we determine a temporal order for these samples? Such an ordering can lead to better understanding of the dynamics of the disease and to the identification of genes associated with its progression. In this paper we formally prove, for the first time, that under a model for the dynamics of the expression levels of a single gene, it is indeed possible to recover the correct ordering of the static expression datasets by solving an instance of the traveling salesman problem (TSP). In addition, we devise an algorithm that combines a TSP heuristic and probabilistic modeling for inferring the underlying temporal order of the microarray experiments. This algorithm constructs probabilistic continuous curves to represent expression profiles leading to accurate temporal reconstruction for human data. Applying our method to cancer expression data we show that the ordering derived agrees well with survival duration. A classifier that utilizes this ordering improves upon other classifiers suggested for this task. The set of genes displaying consistent behavior for the determined ordering are enriched for genes associated with cancer progression.
INDEX TERMS
microarrays, traveling salesman, EM, glioma
CITATION
Anupam Gupta, Ziv Bar-Joseph, "Extracting Dynamics from Static Cancer Expression Data", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 2, pp. 172-182, April-June 2008, doi:10.1109/TCBB.2007.70233
REFERENCES
 [1] Extracting Dynamics from Static Cancer Expression Data, www.cs.cmu.edu/~zivbj/cancercancer.html, 2008. [2] D.N. Baldwin, V. Vanchinathan, P.O. Brown, and J.A. Theriot, “A Gene-Expression Program Reflecting the Innate Immune Response of Cultured Intestinal Epithelial Cells to Infection by Listeria Monocytogenes,” Genome Biology, vol. 4, no. 1, 2003. [3] Z. Bar-Joseph, E. Demaine, D. Gifford, A. Hamel, N. Srebro, and T. Jaakkola, “$k\hbox{-}{\rm Ary}$ Clustering with Optimal Leaf Ordering for Gene Expression Data,” Bioinformatics, vol. 19, pp. 1070-1078, 2003. [4] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, “Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem,” Proc. Sixth Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '02), pp. 49-57, 2002. [5] S. Bicciato, A. Luchini, and C. Di Bello, “PCA Disjoint Models for Multiclass Cancer Analysis Using Gene Expression Data,” Bioinformatics, vol. 19, no. 5, pp. 571-578, 2003. [6] M.J. Bissell et al., “Tissue Structure, Nuclear Organization and Gene Expression in Normal and Malignant Breast,” Cancer Research, vol. 59, pp. 1757s-1764s, 1999. [7] J. Ernst and Z. Bar-Joseph, “Stem: A Tool for the Analysis of Short Time Series Gene Expression Data,” BMC Bioinformatics, vol. 7, p.191, 2006. [8] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data,” Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000. [9] A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, and M.B. Eisen, “Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes,” Molecular Biology of the Cell, vol. 11, no. 12, pp. 4241-4257, 2000. [10] J. Giesen, Curve Reconstruction in Arbitrary Dimension and the Traveling Salesman Problem. Springer, 1999. [11] T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, 1999. [12] A.C. Gustafsson et al., “Global Gene Expression Analysis in Time Series Following N-Acetyl L-Cysteine Induced Epithelial Differentiation of Human Normal and Cancer Cells in Vitro,” BMC Cancer, vol. 5, p. 75, 2005. [13] T. Hastie and W. Stuetzle, “Principal Curves,” J. Am. Statistical Assoc., vol. 84, pp. 502-516, 1989. [14] C. Jin et al., “Irectionally Specific Paracrine Communication Mediated by Epithelial fgf9 to Stromal fgfr3 in Two-Compartment Premalignant Prostate Tumors,” Cancer Research, vol. 64, pp. 4555-4562, 2004. [15] N. Kaminski and Z. Bar-Joseph, “A Patient-Gene Model for Temporal Expression Profiles in Clinical Studies,” Proc. 10th Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '06), pp. 69-82, 2006. [16] S.V. Kim, S. Imoto, and S. Miyano, “Inferring Gene Networks from Time Series Microarray Data Using Dynamic Bayesian Networks,” Briefings in Bioinformatics, vol. 4, pp. 228-235, 2003. [17] P.M. Magwene, P. Lizardi, and J. Kim, “Reconstructing the Temporal Ordering of Biological Samples Using Microarray Data,” Bioinformatics, vol. 19, no. 7, pp. 842-850, 2003. [18] G.J. Nau et al., “Human Macrophage Activation Programs Induced by Bacterial Pathogens,” Proc. Nat'l Academy of Sciences, vol. 99, pp. 1503-1508, 2002. [19] C.L. Nutt et al., “Gene Expression-Based Classification of Malignant Gliomas Correlates Better with Survival than Histological Classification,” Cancer Research, vol. 63, no. 7, pp. 1602-1607, 2003. [20] C.H. Yeang et al., “Molecular Classification of Multiple Tumor Types,” Bioinformatics, vol. 17, no. 1, pp. S316-S322, 2001.