Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04)
A Method for Evaluating the Quality of String Dissimilarity Measures and Clustering Algorithms for EST Clustering
Taichung, Taiwan, ROC
May 19-May 21
ISBN: 0-7695-2173-8
We present a method for evaluating the suitability of different string dissimilarity measures and clustering algorithms for EST clustering, one of the main techniques used in transcriptome projects. The method comprises generating simulated ESTs with user-specified parameters, and then evaluating the quality of clusterings produced when different dissimilarity measures and different clustering algorithms are used. We implemented two tools to do this: ESTSim (EST Simulator), which generates simulated EST sequences from mRNAs/cDNAs using user-specified parameters, and ECLEST (Evaluator for CLusterings of ESTs), which computes and evaluates a clustering of a set of input ESTs, where the dissimilarity measure, the clustering algorithm, and the clustering validity index can be specified independently. We demonstrate the method on a sample of 699 cDNAs, generating approximately 16,000 simulated ESTs. We conducted two experiments and derived statistically significant results from this study comparing subword-based dissimilarity measures to alignment-based ones.
Index Terms:
string similarity and dissimilarity measures, EST clustering, transcriptome, simulated data, benchmarks
Citation:
Judith Zimmermann, Zsuzsanna Lipt?, Scott Hazelhurst, "A Method for Evaluating the Quality of String Dissimilarity Measures and Clustering Algorithms for EST Clustering," bibe, pp.301, Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04), 2004