The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2012 vol.9)
pp: 629-636
Z. Z. Feng , Dept. of Math. & Stat., Univ. of Guelph, Guelph, ON, Canada
Xiaojian Yang , Dept. of Animal & Poultry Sci., Univ. of Guelph, Guelph, ON, Canada
S. Subedi , Dept. of Math. & Stat., Univ. of Guelph, Guelph, ON, Canada
P. D. McNicholas , Dept. of Math. & Stat., Univ. of Guelph, Guelph, ON, Canada
ABSTRACT
Recent work concerning quantitative traits of interest has focused on selecting a small subset of single nucleotide polymorphisms (SNPs) from among the SNPs responsible for the phenotypic variation of the trait. When considered as covariates, the large number of variables (SNPs) and their association with those in close proximity pose challenges for variable selection. The features of sparsity and shrinkage of regression coefficients of the least absolute shrinkage and selection operator (LASSO) method appear attractive for SNP selection. Sparse partial least squares (SPLS) is also appealing as it combines the features of sparsity in subset selection and dimension reduction to handle correlations among SNPs. In this paper, we investigate application of the LASSO and SPLS methods for selecting SNPs that predict quantitative traits. We evaluate the performance of both methods with different criteria and under different scenarios using simulation studies. Results indicate that these methods can be effective in selecting SNPs that predict quantitative traits but are limited by some conditions. Both methods perform similarly overall but each exhibit advantages over the other in given situations. Both methods are applied to Canadian Holstein cattle data to compare their performance.
INDEX TERMS
Training, Predictive models, Biological cells, Bioinformatics, Correlation, Input variables, Accuracy,statistical computing., Bioinformatics, regression analysis
CITATION
Z. Z. Feng, Xiaojian Yang, S. Subedi, P. D. McNicholas, "The LASSO and Sparse Least Squares Regression Methods for SNP Selection in Predicting Quantitative Traits", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 2, pp. 629-636, March/April 2012, doi:10.1109/TCBB.2011.139
REFERENCES
[1] D. Gianola, R.L. Fernando, and A. Stella, “Genomic-Assisted Prediction of Genetic Value with Semiparametric Procedures,” Genetics, vol. 173, pp. 1761-1776, 2006.
[2] E.L. Heffner, M.E. Sorrells, and J.L. Jannink, “Genomic Selection for Crop Improvement,” Crop Science, vol. 49, pp. 1-12, 2009.
[3] S.H. Lee, J.H.J. van der Werf, B.J. Hayes, M.E. Goddard, and P.M. Visscher, “Predicting Unobserved Phenotypes for Complex Traits from Whole Genome SNP Data,” PLoS Genetics, vol. 4, no. 10,e10000231, doi:10.1371/journal.pgen.1000231, 2008.
[4] O. Gonzĺez-Recio, A. Gianola, G.J.M. Rosa, K.A. Weigel, and A. Kranis, “Genome-Assisted Prediction of a Quantitative Trait Measured in Parents and Progeny: Application to Food Conversion Rate in Chickens,” Genetics Selection Evolution, vol. 41, p. 3, 2009.
[5] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. Royal Statistical Soc. B., vol. 58, pp. 267-288, 1996.
[6] S. Chun and S. Keles, “Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection,” J. Royal Statistical Soc. B., vol. 72, pp. 3-25, 2010.
[7] T.H.E. Meuwissen, B.J. Hayes, and M.E. Goddard, “Prediction of Total Genetic Value Using Genomic-Wide Dense Marker Maps,” Genetics, vol. 157, pp. 1819-1829, 2001.
[8] D. Gianola, M. Perez-Enciso, and M.A. Toro, “On Marker-Assisted Prediction of Genetic Value: Beyond the Ridge,” Genetics, vol. 163, pp. 347-365, 2003.
[9] G. de los Campos, H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel, and J.M. Cotes, “Predicting Quantitative Traits with Regression Models for Dense Molecular Markers and Pedigree,” Genetics, vol. 182, pp. 375-385, 2009.
[10] M.G. Usai, M.E. Goddard, and B.J. Hayes, “LASSO with Cross-validation for Genomic Selection,” Genetics Research, vol. 91, pp. 427-436, 2009.
[11] T. Park and G. Casella, “The Baysian LASSO,” J. Am. Statistical Assoc., vol. 103, pp. 681-686, 2008.
[12] C. Colombani, A. Legarra, P. Croiseau, F. Guillaume, S. Fritz, V. Ducrocq, and C. Robert-Granié, “Application of PLS and Sparse PLS Regression in Genome Selection,” Proc. Ninth World Congress Genetics Applied to Livestock Production, Aug. 2010.
[13] N. Long, D. Gianola, G.J.M. Rosa, and K.A. Weigel, “Dimension Reduction and Variable Selection for Genomic Selection: Application to Predicting Milk Yield in Holsteins,” J. Animal Breeding and Genetics, vol. 128, pp. 1-11, 2011.
[14] J.S. Preisser, K.K. Lohman, and P.J. Rathouz, “Performance of Weighted Estimating Equations for Longitudinal Binary Data with Drop-Outs Missing at Random,” Statistics in Medicine, vol. 21, pp. 3035-3054, 2002.
[15] Z. Feng, W. Wong, X. Gao, and F. Shenkel, “Generalized Genetic Association Study with Samples of Related Individuals,” Annals of Applied Statistics, To Appear, 2011.
[16] H. Wold, “Estimation of Principal Components and Related Models by Iterative Least Squares,” Multivariate Analysis, pp. 391-442, 1966.
41 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool