|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis
PrePrint
ISSN: 1545-5963
| ASCII Text | x | ||
| Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Ivan G. Costa Filho, "Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 99, no. 1, pp. 1, , 5555. | |||
| BibTex | x | ||
| @article{ 10.1109/TCBB.2013.9, author = {Pablo A. Jaskowiak and Ricardo J. G. B. Campello and Ivan G. Costa Filho}, title = {Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis}, journal ={IEEE/ACM Transactions on Computational Biology and Bioinformatics}, volume = {99}, number = {1}, issn = {1545-5963}, year = {5555}, pages = {1}, doi = {http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.9}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics TI - Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis IS - 1 SN - 1545-5963 SP EP EPD - 1 A1 - Pablo A. Jaskowiak, A1 - Ricardo J. G. B. Campello, A1 - Ivan G. Costa Filho, PY - 5555 KW - Correlation KW - Equations KW - Cancer KW - Gene expression KW - Clustering algorithms KW - Time complexity KW - Machine learning KW - Information Technology and Systems KW - Database Management KW - Database Applications KW - Bioinformatics (genome or protein) databases KW - Clustering KW - classification KW - and association rules KW - Computing Methodologies KW - Artificial Intelligence KW - Learning VL - 99 JA - IEEE/ACM Transactions on Computational Biology and Bioinformatics ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.9
Web Extra: View Supplemental Material(PDF)
Cluster analysis is usually the first step adopted to unveil information from gene expression microarray data. Besides selecting a clustering algorithm, choosing an appropriate proximity measure (similarity or distance) is of great importance to achieve satisfactory clustering results. Nevertheless, up to date, there are no comprehensive guidelines concerning how to choose proximity measures for clustering microarray data. Pearson is the most used proximity measure, whereas characteristics of other ones remain unexplored. In this paper we investigate the choice of proximity measures for the clustering of microarray data by evaluating the performance of 16 proximity measures in 52 datasets from time-course and cancer experiments. Our results support that measures rarely employed in the gene expression literature can provide better results than commonly employed ones, such as Pearson, Spearman and, Euclidean distance. Given that different measures stood out for time-course and cancer data evaluations, their choice should be specific to each scenario. To evaluate measures on time-course data we preprocessed and compiled 17 datasets from the microarray literature in a benchmark along with a new methodology, called Intrinsic Biological Separation Ability (IBSA). Both can be employed in future research to assess the effectiveness of new measures for gene time-course data.
Index Terms:
Correlation,Equations,Cancer,Gene expression,Clustering algorithms,Time complexity,Machine learning,Information Technology and Systems,Database Management,Database Applications,Bioinformatics (genome or protein) databases,Clustering,classification,and association rules,Computing Methodologies,Artificial Intelligence,Learning
Citation:
Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Ivan G. Costa Filho, "Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis," IEEE/ACM Transactions on Computational Biology and Bioinformatics, 25 Feb. 2013. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.9>
Usage of this product signifies your acceptance of the Terms of Use.

