This Article 
 Bibliographic References 
 Add to: 
2012 IEEE 12th International Conference on Data Mining Workshops
Clustering Tandem Repeats via Trinucleotides
Brussels, Belgium Belgium
December 10-December 10
ISBN: 978-1-4673-5164-5
Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, we describe a new method for post-processing tandem repeats through clustering. Our work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of these clusters for chromosome 1 of the human genomes shows that the clustering of tandem repeats according to 3-grams yields well-defined clusters. Our new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and we believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.
Index Terms:
Genomics,Clustering algorithms,DNA,Biological cells,Humans,Algorithm design and analysis,classification,tandem repeats,n-grams,clustering,human genome
Yupu Liang, Dina Sokol, Sarah Zelikovitz, "Clustering Tandem Repeats via Trinucleotides," icdmw, pp.64-71, 2012 IEEE 12th International Conference on Data Mining Workshops, 2012
Usage of this product signifies your acceptance of the Terms of Use.