The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2013 vol.10)
pp: 1505-1516
Chanchala D. Kaddi , Dept. of Biomed. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
R. Mitchell Parry , Dept. of Comput. Sci., Appalachian State Univ., Boone, NC, USA
May D. Wang , Dept. of Biomed. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
ABSTRACT
We propose a similarity measure based on the multivariate hypergeometric distribution for the pairwise comparison of images and data vectors. The formulation and performance of the proposed measure are compared with other similarity measures using synthetic data. A method of piecewise approximation is also implemented to facilitate application of the proposed measure to large samples. Example applications of the proposed similarity measure are presented using mass spectrometry imaging data and gene expression microarray data. Results from synthetic and biological data indicate that the proposed measure is capable of providing meaningful discrimination between samples, and that it can be a useful tool for identifying potentially related samples in large-scale biological data sets.
INDEX TERMS
Biomedical measurement, Approximation methods, Gene expression, Bioinformatics, Diseases,chemistry, Similarity measures, contingency tables, multivariate statistics, biology and genetics
CITATION
Chanchala D. Kaddi, R. Mitchell Parry, May D. Wang, "Multivariate Hypergeometric Similarity Measure", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 6, pp. 1505-1516, Nov.-Dec. 2013, doi:10.1109/TCBB.2013.28
REFERENCES
[1] R.G. Sadygov and J.R. Yates, "A Hypergeometric Probability Model for Protein Identification and Validation Using Tandem Mass Spectral Data and Protein Sequence Databases," Analytical Chemistry, vol. 75, pp. 3792-3798, 2003.
[2] I.M. Shih, K. Nakayama, G. Wu, N. Nakayama, J.H. Zhang, and T.L. Wang, "Amplification of the Ch19P13.2 NACC1 Locus in Ovarian High-Grade Serous Carcinoma," Modern Pathology, vol. 24, pp. 638-645, 2011.
[3] K.W. Boyack, D. Newman, R.J. Duhon, R. Klavans, M. Patek, J.R. Biberstine, B. Schijvenaars, A. Skupin, N.A.L. Ma, and K. Borner, "Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches," PLoS One, vol. 6, no. 3,article e18029, 2011.
[4] V. Megalooikonomou, M. Barnathan, D. Kontos, P.R. Bakic, and A.D.A. Maidment, "A Representation and Classification Scheme for Tree-Like Structures in Medical Images: Analyzing the Branching Pattern of Ductal Trees in X-Ray Galactograms," IEEE Trans. Medical Imaging, vol. 28, no. 4, pp. 487-493, Apr. 2009.
[5] T.M. Mitchell, S.V. Shinkareva, A. Carlson, K.M. Chang, V.L. Malave, R.A. Mason, and M.A. Just, "Predicting Human Brain Activity Associated with the Meanings of Nouns," Science, vol. 320, pp. 1191-1195, 2008.
[6] L. Perlman, A. Gottlieb, N. Atias, E. Ruppin, and R. Sharan, "Combining Drug and Gene Similarity Measures for Drug-Target Elucidation," J. Computational Biology, vol. 18, pp. 133-145, 2011.
[7] G. Yona, W. Dirks, S. Rahman, and D.M. Lin, "Effective Similarity Measures for Expression Profiles," Bioinformatics, vol. 22, pp. 1616-1622, 2006.
[8] S.-H. Cha, "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions," Int'l J. Math. Models and Methods in Applied Sciences, vol. 1, pp. 300-307, 2007.
[9] X.B. Li and R.C. Dubes, "A Probabilistic Measure of Similarity for Binary Data in Pattern-Recognition," Pattern Recognition, vol. 22, pp. 397-409, 1989.
[10] C. Kaddi, R.M. Parry, and M.D. Wang, "Hypergeometric Similarity Measure for Spatial Analysis in Tissue Imaging Mass Spectrometry," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine (BIBM '11), pp. 604-607, 2011.
[11] R. Steuer, J. Kurths, C.O. Daub, J. Weise, and J. Selbig, "The Mutual Information: Detecting and Evaluating Dependencies between Variables," Bioinformatics, vol. 18, pp. S231-S240, Oct. 2002.
[12] Y. Liu, Y. Chen, A. Momin, R. Shaner, E. Wang, N.J. Bowen, L.V. Matyunina, L.D. Walker, J.F. McDonald, M.C. Sullards, and A.H. MerrillJr., "Elevation of Sulfatides in Ovarian Cancer: An Integrated Transcriptomic and Lipidomic Analysis Including Tissue-Imaging Mass Spectrometry," Molecular Cancer, vol. 9, 2010.
[13] L.A. McDonnell and R.M.A. Heeren, "Imaging Mass Spectrometry," Mass Spectrometry Revs., vol. 26, pp. 606-643, 2007.
[14] R. Van de Plas, K. Pelckmans, B. De Moor, and E. Waelkens, "Spatial Querying of Imaging Mass Spectrometry Data: A Nonnegative Least Squares Approach," Proc. Neural Information Processing Systems Workshop Machine Learning in Computational Biology, 2007.
[15] G.H. Freeman and J.H. Halton, "Note on an Exact Treatment of Contingency, Goodness of Fit and Other Problems of Significance," Biometrika, vol. 38, pp. 141-149, 1951.
[16] P. Sprent and N.C. Smeeton, Applied Nonparametric Statistical Methods. Chapman & Hall/CRC, 2001.
[17] A. Verbeek and P.M. Kroonenberg, "A Survey of Algorithms for Exact Distributions of Test Statistics in R × C Contingency Tables with Fixed Margins," Computational Statistics and Data Analysis, vol. 3, pp. 159-185, 1985.
[18] F. Greselin, "Counting and Enumerating Frequency Tables with Given Margins," Statistica and Applicazioni, vol. 1, pp. 87-104, 2003.
[19] M. Gail and N. Mantel, "Counting the Number of r × c Contingency Tables with Fixed Margins," J. Am. Statistical Assoc., vol. 72, pp. 859-862, 1977.
[20] G.B. Nath and P.V.K. Iyer, "Note on the Combinatorial Formula for nHr," J. Australian Math. Soc., vol. 14, pp. 264-268, 1972.
[21] H. Anand, V.C. Dumir, and H. Gupta, "A Combinatorial Distribution Problem," Duke Math. J., vol. 33, pp. 757-769, 1966.
[22] Y.F. Chen, J. Allegood, Y. Liu, E. Wang, B. Cachon-Gonzalez, T.M. Cox, A.H. Merrill, and M.C. Sullards, "Imaging MALDI Mass Spectrometry Using an Oscillating Capillary Nebulizer Matrix Coating System and Its Application to Analysis of Lipids in Brain from a Mouse Model of Tay-Sachs/Sandhoff Disease," Analytical Chemistry, vol. 80, pp. 2780-2788, 2008.
[23] C.S. Chen, Z.M. Zhou, C.E. Sheehan, E. Slodkowska, C.B. Sheehan, A. Boguniewicz, and J.S. Ross, "Overexpression of WWP1 is Associated with the Estrogen Receptor and Insulin-Like Growth Factor Receptor 1 in Breast Carcinoma," Int'l J. Cancer, vol. 124, pp. 2829-2836, June 2009.
[24] T.Z. Parris, A. Danielsson, S. Nemes, A. Kovacs, U. Delle, G. Fallenius, E. Mollerstrom, P. Karlsson, and K. Helou, "Clinical Implications of Gene Dosage and Gene Expression Patterns in Diploid Breast Carcinoma," Clinical Cancer Research, vol. 16, pp. 3860-3874, Aug. 2010.
[25] M.W. Causey, L.J. Huston, D.M. Harold, C.J. Charaba, D.L. Ippolito, Z.S. Hoffer, T.A. Brown, and J.D. Stallings, "Transcriptional Analysis of Novel Hormone Receptors PGRMC1 and PGRMC2 as Potential Biomarkers of Breast Adenocarcinoma Staging," J. Surgical Research, vol. 171, pp. 615-622, Dec. 2011.
[26] D.A. Ferguson, M.R. Muenster, Q. Zang, J.A. Spencer, J.J. Schageman, Y. Lian, H.R. Garner, R.B. Gaynor, J.W. Huff, A. Pertsemlidis, R. Ashfaq, J. Schorge, C. Becerra, N.S. Williams, and J.M. Graff, "Selective Identification of Secreted and Transmembrane Breast Cancer Markers Using Escherichia coli Ampicillin Secretion Trap," Cancer Research, vol. 65, pp. 8209-8217, Sept. 2005.
[27] H. Ikeda, N. Taira, F. Hara, T. Fujita, H. Yamamoto, J. Soh, S. Toyooka, T. Nogami, T. Shien, H. Doihara, and S. Miyoshi, "The Estrogen Receptor Influences Microtubule-Associated Protein Tau (MAPT) Expression and the Selective Estrogen Receptor Inhibitor Fulvestrant Downregulates MAPT and Increases the Sensitivity to Taxane in Breast Cancer Cells," Breast Cancer Research, vol. 12, no. 3,article R43, 2010.
[28] D. Voduc, M. Cheang, and T. Nielsen, "GATA-3 Expression in Breast Cancer Has a Strong Association with Estrogen Receptor but Lacks Independent Prognostic Value," Cancer Epidemiology Biomarkers and Prevention, vol. 17, pp. 365-373, Feb. 2008.
[29] S.C. Drury, S. Detre, A. Leary, J. Salter, J. Reis, V. Barbashina, C. Marchio, E. Lopez-Knowles, Z. Ghazoui, K. Habben, S. Arbogast, S. Johnston, and M. Dowsett, "Changes in Breast Cancer Biomarkers in the IGF1R/PI3K Pathway in Recurrent Breast Cancer after Tamoxifen Treatment," Endocrine-Related Cancer, vol. 18, pp. 565-577, Oct. 2011.
[30] D. Fagan and D. Yee, "Crosstalk between IGF1R and Estrogen Receptor Signaling in Breast Cancer," J. Mammary Gland Biology and Neoplasia, vol. 13, pp. 423-429, Dec. 2008.
[31] B.G. Baker, G.R. Ball, E.A. Rakha, C.C. Nolan, C. Caldas, I.O. Ellis, and A.R. Green, "Lack of Expression of the Proteins GMPR2 and PPARα are Associated with the Basal Phenotype and Patient Outcome in Breast Cancer," Breast Cancer Research and Treatment, vol. 137, pp. 127-137, 2013.
[32] D.C. Tomlinson, M.A. Knowles, and V. Speirs, "Mechanisms of FGFR3 Actions in Endocrine Resistant Breast Cancer," Int'l J. Cancer, vol. 130, pp. 2857-2866, June 2012.
[33] I. Irigoien and C. Arenas, "INCA: New Statistic for Estimating the Number of Clusters and Identifying Atypical Units," Statistics in Medicine, vol. 27, pp. 2948-2973, 2008.
333 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool