2007 IEEE International Conference on Granular Computing (GRC 2007)
Comparison of Machine Learning and Pattern Discovery Algorithms for the Prediction of Human Single Nucleotide Polymorphisms
San Jose, California
November 02-November 04
ISBN: 0-7695-3032-X
This paper compares machine learning techniques and pattern discovery algorithms for the prediction of human single nucleotide polymorphisms (SNPs). We selected six pattern discovery algorithms (YMF, Projection, Weeder, MotifSampler, AlignACE and ANN-Spec) and two machine learning techniques (Random Forests and K-Nearest Neighbours) and applied them to the DNA sequences flanking non- coding SNPs on human chromosome 21. We compared the pattern similarity amongst the methods and validated the predictions using known SNPs on chromosome 22. Parameterization of both machine learning and pattern discovery algorithms was critical to their performance. Memory usage was broadly constant amongst the pattern discovery algorithms, but the CPU running time varied significantly between deterministic and probabilistic pattern discovery methods, i.e., on average, probabilistic methods run19 times slower than deterministic methods. This is the first demonstration of SNP prediction, as well as the first comparison of machine learning and pattern discovery algorithms in SNP prediction studies.
Citation:
Rui Yan, Paul C. Boutros, Igor Jurisica, Linda Z. Penn, "Comparison of Machine Learning and Pattern Discovery Algorithms for the Prediction of Human Single Nucleotide Polymorphisms," grc, pp.452, 2007 IEEE International Conference on Granular Computing (GRC 2007), 2007