The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2012 vol.9)
pp: 1831-1836
M. Bicego , Dipt. di Inf., Univ. degli Studi di Verona, Verona, Italy
P. Lovato , Dipt. di Inf., Univ. degli Studi di Verona, Verona, Italy
A. Perina , Microsoft Res., Redmond, WA, USA
M. Fasoli , Dipt. di Biotecnologie, Univ. degli Studi di Verona, Verona, Italy
M. Delledonne , Dipt. di Biotecnologie, Univ. degli Studi di Verona, Verona, Italy
M. Pezzotti , Dipt. di Biotecnologie, Univ. degli Studi di Verona, Verona, Italy
A. Polverari , Dipt. di Biotecnologie, Univ. degli Studi di Verona, Verona, Italy
V. Murino , Anal. & Comput. Vision (PAVIS), Ist. Italiano di Tecnol. (IIT), Genoa, Italy
ABSTRACT
In recent years a particular class of probabilistic graphical models-called topic models-has proven to represent an useful and interpretable tool for understanding and mining microarray data. In this context, such models have been almost only applied in the clustering scenario, whereas the classification task has been disregarded by researchers. In this paper, we thoroughly investigate the use of topic models for classification of microarray data, starting from ideas proposed in other fields (e.g., computer vision). A classification scheme is proposed, based on highly interpretable features extracted from topic models, resulting in a hybrid generative-discriminative approach; an extensive experimental evaluation, involving 10 different literature benchmarks, confirms the suitability of the topic models for classifying expression microarray data.
INDEX TERMS
Biological system modeling, Data models, Computational modeling, Probabilistic logic, Feature extraction, Analytical models,hybrid generative discriminative approaches, Expression microarray, topic models
CITATION
M. Bicego, P. Lovato, A. Perina, M. Fasoli, M. Delledonne, M. Pezzotti, A. Polverari, V. Murino, "Investigating Topic Models' Capabilities in Expression Microarray Data Classification", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 6, pp. 1831-1836, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.121
REFERENCES
[1] J. Lee, J. Lee, M. Park, and S. Song, “An Extensive Comparison of Recent Classification Tools Applied to Microarray Data,” Computational Statistics & Data Analysis, vol. 48, no. 4, pp. 869-885, 2005.
[2] M. de Souto, I. Costa, D. de Araujo, T. Ludermir, and A. Schliep, “Clustering Cancer Gene Expression Data: A Comparative Study,” BMC Bioinformatics, vol. 9, article 497, 2008.
[3] S. Rogers, M. Girolami, C. Campbell, and R. Breitling, “The Latent Process Decomposition of cDNA Microarray Data Sets,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 2, pp. 143-156, Apr.-June 2005.
[4] Y. Ying, P. li, and C. Campbell, “A Marginalized Variational Bayesian Approach to the Analysis of Array Data,” BMC Proc., vol. 2, no. Suppl 4, article S7, 2008.
[5] T. Masada, T. Hamada, Y. Shibata, and K. Oguri, “Bayesian Multi-Topic Microarray Analysis with Hyperparameter Reestimation,” Proc. Int'l Conf. Advanced Data Mining and Applications, 2009.
[6] M. Bicego, P. Lovato, A. Ferrarini, and M. Delledonne, “Biclustering of Expression Microarray Data with Topic Models,” Proc. Int'l Conf. Pattern Recognition, pp. 2728-2731, 2010.
[7] A. Bosch, A. Zisserman, and X. Munoz, “Scene Classification via PLSA,” Proc. European Conf. Computer Vision, vol. 4, pp. 517-530, 2006.
[8] A. Perina, M. Cristani, U. Castellani, V. Murino, and N. Jojic, “Free Energy Score Space,” Proc. Neural Information Processing Systems, 2009.
[9] J. Chang, J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei, “Reading the Tea Leaves: How Humans Interpret Topic Models,” Proc. Neural Information processing systems, 2009.
[10] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, vol. 42, nos. 1/2, pp. 177-196, 2001.
[11] J. Joung, D. Shin, R. Seong, and B. Zhang, “Identification of Regulatory Modules by Co-Clustering Latent Variable Models: Stem Cell Differentiation,” Bioinformatics, vol. 22, no. 16, pp. 2005-2011, 2006.
[12] S. Lacoste-Julien, F. Sha, and M. Jordan, “Disclda: Discriminative Learning for Dimensionality Reduction and Classification,” Proc. Information Processing Systems Conf., 2008.
[13] D. Blei and J. McAuliffe, “Supervised Topic Models,” Proc. Neural Information Processing Systems, 2007.
[14] J. Lasserre, C. Bishop, and T. Minka, “Principled Hybrids of Generative and Discriminative Models,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[15] Y.D. Rubinstein and T. Hastie, “Discriminative vs Informative Learning,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, pp. 49-53, 1997.
[16] J. Chappelier and E. Eckard, “Plsi: The True Fisher Kernel and Beyond,” Proc. European Conf. Machine Learning and Knowledge Discovery in Databases: Part I, pp. 195-210, 2009.
[17] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[18] M. Girolami and A. Kabán, “On an Equivalence Between Plsi and Lda,” SIGIR '03: Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Informaion Retrieval, pp. 433-434, 2003.
[19] A. Perina, P. Lovato, V. Murino, and M. Bicego, “Biologically-Aware Latent Dirichlet Allocation (Balda) for the Classification of Expression Microarray,” Proc. Int'l Conf. Pattern Recognition in Bioinformatics, pp. 230-241, 2010.
[20] M. Cristani, A. Perina, U. Castellani, and V. Murino, “Geo-Located Image Analysis using Latent Representations,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[21] U. Castellani, A. Perina, V. Murino, M. Bellani, and P. Brambilla, “Brain Morphometry by Probabilistic Latent Semantic Analysis,” Proc. Int'l Conf. Medical Image Computing and Computer Assisted Intervention, 2010.
[22] A. Martins, N. Smith, E. Xing, P. Aguiar, and M. Figueiredo, “Nonextensive Information Theoretic Kernels on Measures,” J. Machine Learning Research, vol. 10, pp. 935-975, 2009.
[23] M. Bicego, A. Perina, V. Murino, A. Martins, P. Aguiar, and M. Figueiredo, “Combining free Energy Score Spaces with Information Theoretic Kernels: Application to Scene Classification,” Proc. IEEE Int'l Conf. Image Processing, pp. 2661-2664, 2010.
[24] M. Polesani, L. Bortesi, A. Ferrarini, A. Zamboni, M. Fasoli, C. Zadra, A. Lovato, M. Pezzotti, M. Delledonne, and A. Polverari, “General and Species-Specific Transcriptional Responses to Downy Mildew Infection in a Susceptible (Vitis Vinifera) and a Resistant (v. Riparia) Grapevine Species,” BMC Genomics, vol. 11, article 117, 2010.
[25] S. Armstrong, J. Staunton, L. Silverman, R. Pieters, M. den Boer, M. Minden, S. Sallan, E. Lander, T. Golub, and S. Korsmeyer, “MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia,” Nature Genetics, vol. 30, no. 1, pp. 41-47, 2002.
[26] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, pp. 531-537, Oct. 1999.
[27] A. Su, J. Welsh, L. Sapinoso, S. Kern, P. Dimitrov, H. Lapp, P. Schultz, S. Powell, C. Moskaluk, H.F. FriersonJr., and G. Hampton, “Molecular Classification of Human Carcinomas by Use of Gene Expression Signatures,” Cancer Research, vol. 61, pp. 7388-7393, 2001.
[28] U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.
[29] S. Pomeroy, P. Tamayo, M. Gaasenbeek, L. Sturla, M. Angelo, M. McLaughlin, J. Kim, L. Goumnerova, P. Black, C. Lau, J. Allen, D. Zagzag, J. Olson, T. Curran, C. Wetmore, J. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D. Louis, J. Mesirov, E. Lander, and T. Golub, “Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression,” Nature, vol. 415, pp. 436-42, 2002.
[30] C. Nutt, D. Mani, R. Betensky, P. Tamayo, J. Cairncross, C. Ladd, U. Pohl, C. Hartmann, M. McLaughlin, T. Batchelor, P. Black, A. von Deimling, S. Pomeroy, T. Golub, and D. Louis, “Gene Expression-Based Classification of Malignant Gliomas Correlates Better with Survival than Histological Classification,” Cancer Research, vol. 63, no. 7, pp. 1602-1607, 2003.
[31] A. Bhattacherjee et al., “Classification of Human Lung Carcinomas by Mrna Expression Profiling Reveals Distinct Adenocarcinoma Subclasses,” Proc. Nat'l Academy of Sciences USA, vol. 98, pp. 13 790-13 795, 2001.
[32] D. Ross, U. Scherf, M. Eisen, C. Perou, P. Spellman, V. Iyer, S. Jeffrey, M. de Rijn, M. Waltham, A. Pergamenschikov, J. Lee, D. Lashkari, D. Shalon, T. Myers, J. Weinstein, D. Botstein, and P. Brown, “Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines,” Nature Genetics, vol. 24, pp. 227-234, 2000.
[33] D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 98, pp. 203-209, 2002.
[34] J. Staunton, D. Slonim, H. Coller, P. Tamayo, M. Angelo, J. Park, S.U.J. Lee, W. Reinhold, J. Weinstein, J. Mesirov, E. Lander, and T. Golub, “Chemosensitivity Prediction by Transcriptional Profiling,” Proc. Nat'l Academy of Sciences USA, vol. 98, no. 19, pp. 10787-10792, 2001.
[35] C. Ding and H. Peng, “Minimum Redundancy Feature Selection from Microarray Gene Expression Data,” Proc. IEEE CS Bioinformatics Conf., pp. 523-529, 2003.
[36] H. Peng, F. Long, and C. Ding, “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
[37] N. Yukinawa, S. Oba, K. Kato, and S. Ishii, “Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 333-343, Apr.-June 2009.
[38] A. Statnikov, C. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis,” Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[39] J. del Coz, J. Diez, and A. Bahamonde, “Learning Nondeterministic Classifiers,” J. Machine Learning Research, vol. 10, pp. 2273-2293, 2009.
[40] H. Liu, L. Liu, and H. Zhang, “Ensemble Gene Selection by Grouping for Microarray Data Classification,” J. Biomedical Informatics, vol. 43, no. 1, pp. 81-87, 2010.
[41] A. Osareh and B. Shadgar, “Classification and Diagnostic Prediction of Cancers Using Gene Microarray Data Analysis,” J. Applied Sciences, vol. 9, no. 3, pp. 459-468, 2009.
[42] P. Chen, S. Huang, W. Chen, and C. Hsiao, “A New Regularized Least Squares Support Vector Regression for Gene Selection,” BMC Bioinformatics, vol. 10, article 44, 2009.
[43] S. Zhu, D. Wang, K. Yu, T. Li, and Y. Gong, “Feature Selection for Gene Expression Using Model-Based Entropy,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 25-36, Jan.-Mar. 2010.
[44] X. Wang and O. Gotoh, “A Robust Gene Selection Method for Microarray-Based Cancer Classification,” Cancer Informatics, vol. 9, pp. 15-30, 2010.
[45] X. Hang, “Cancer Classification by Sparse Representation Using Microarray Gene Expression Data,” Proc. Bioinformatics and Biomedicine Workshops (BIBMW), pp. 174-177, 2008.
[46] G. Schwarz, “Estimating the Dimension of a Model,” The Ann. of Statistics, vol. 6, no. 2, pp. 461-464, 1978.
[47] A. Perina, M. Cristani, U. Castellani, V. Murino, and N. Jojic, “A Hybrid Generative/Discriminative Classification Framework Based on Free-Energy Terms,” Proc. IEEE 12th Int'l Conf. Computer Vision, 2009.
105 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool