This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Disease Liability Prediction from Large Scale Genotyping Data Using Classifiers with a Reject Option
January/February 2012 (vol. 9 no. 1)
pp. 88-97
J. R. Quevedo, Dept. de Inf., Univ. de Oviedo en Gijon, Gijon, Spain
A. Bahamonde, Centro de Intel. Artificial, Univ. de Oviedo en Gijon, Gijon, Spain
M. Perez-Enciso, Dept. de Cienc. Animal i dels Aliments, Univ. Autonoma de Barcelona, Bellaterra, Spain
O. Luaces, Centro de Intel. Artificial, Univ. de Oviedo en Gijon, Gijon, Spain
Genome-wide association studies (GWA) try to identify the genetic polymorphisms associated with variation in phenotypes. However, the most significant genetic variants may have a small predictive power to forecast the future development of common diseases. We study the prediction of the risk of developing a disease given genome-wide genotypic data using classifiers with a reject option, which only make a prediction when they are sufficiently certain, but in doubtful situations may reject making a classification. To test the reliability of our proposal, we used the Wellcome Trust Case Control Consortium (WTCCC) data set, comprising 14,000 cases of seven common human diseases and 3,000 shared controls.

[1] D. De los Campos, D. Gianola, and D.B. Allison, “Predicting Genetic Predisposition in Humans: The Promise of Whole-Genome Markers,” Nature Reviews Genetics, vol. 11, pp. 880-886, 2010.
[1] T. Abeel, T. Helleputte, Y.V. Peer, P. Dupont, and Y. Saeys, “Robust Biomarker Identification for Cancer Diagnosis with Ensemble Feature Selection Methods,” Bioinformatics, vol. 26, no. 3, pp. 392-398, 2010.
[2] N.P. Paynter, D.I. Chasman, J.E. Buring, D. Shiffman, N.R. Cook, and P.M. Ridker, “Cardiovascular Disease Risk Prediction with and without Knowledge of Genetic Variation at Chromosome 9p21.3,” Annals of Internal Medicine, vol. 150, no. 2, pp. 65-72, 2009.
[2] U. Alon, N. Barkai, D.A. Notterman, K. Gishdagger, S. Ybarradagger, D. Mackdagger, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences USA, vol. 96, pp. 6745-6750, 1999.
[3] M.C. Cornelis, L. Qi, C. Zhang, P. Kraft, J. Manson, T. Cai, D.J. Hunter, and F.B. Hu, “Joint Effects of Common Genetic Variants on the Risk for Type 2 Diabetes in U.S. Men and Women of European Ancestry,” Annals of Internal Medicine, vol. 150, no. 8, pp. 541-550, 2009.
[3] A.L. Boulesteix and M. Slawski, “Stability and Aggregation of Ranked Gene Lists,” Briefings in Bioinformatics, vol. 10, no. 5, pp. 556-568, 2009.
[4] J. Yang, B. Benyamin, B.P. McEvoy, S. Gordon, A.K. Henders, D.R. Nyholt, P.A. Madden, A.C. Heath, N.G. Martin, G.W. Montgomery, M.E. Goddard, and P.M. Visscher, “Common Snps Explain a Large Proportion of the Heritability for Human Height,” Nature Genetics, vol. 42, no. 7, pp. 565-569, July 2010.
[4] M. Cargill, D. Altshuler, J. Ireland, P. Sklar, K. Ardlie, N. Patil, N. Shaw, C.R. Lane, E.P. Lim, N. Kalyanaraman, J. Nemesh, L. Ziaugra, L. Friedland, A. Rolfe, J. Warrington, R. Lipshutz, G.Q. Daley, and E.S. Lander, “Characterization of Single-Nucleotide Polymorphisms in Coding Regions of Human Genes,” Nature Genetics, vol. 22, pp. 231-238, 1999.
[5] G. Corani and M. Zaffalon, “Learning Reliable Classifiers from Small or Incomplete Data Sets: The Naive Credal Classifier 2,” J. Machine Learning Research, vol. 9, pp. 581-621, 2008.
[5] C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine Learning, vol. 20, pp. 273-297, 1995.
[6] J. Alonso, J.J. del Coz, J. Díez, O. Luaces, and A. Bahamonde, “Learning to Predict One or More Ranks in Ordinal Regression Tasks,” Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, W. Daelemans, B. Goethals, and K. Morik, eds., pp. 39-54, 2008.
[6] K. Crammer, R. Gilad-Bachrach, and A. Navot, “Margin Analysis of the LVQ Algorithm,” Proc. 17th Conf. Neural Information Processing Systems, pp. 462-469, 2002.
[7] J.J. del Coz, J. Díez, and A. Bahamonde, “Learning Nondeterministic Classifiers,” J. Machine Learning Research, vol. 10, pp. 2273-2293, 2009.
[7] C.A. Davis, F. Gerick, V. Hintermair, C.C. Friedel, K. Fundel, R. Küffner, and R. Zimmer, “Reliable Gene Signatures for Microarray Classification: Assessment of Stability and Performance,” Bioinformatics, vol. 22, pp. 2356-2363, 2006.
[8] C. Chow, “On Optimum Recognition Error and Reject Tradeoff,” IEEE Trans. Information Theory, vol. IT-16, no. 1, pp. 41-46, Jan. 1970.
[8] K.B. Duan, J.C. Rajapakse, H. Wang, and F. Azuaje, “Multiple SVM-RFE for Gene Selection in Cancer Classification with Expression Data,” IEEE Trans. NanoBioscience, vol. 4, no. 3, pp. 228-234, Sept. 2005.
[9] P. Bartlett and M. Wegkamp, “Classification with a Reject Option Using a Hinge Loss,” J. Machine Learning Research, vol. 9, pp. 1823-1840, 2008.
[9] J. Dutkowski and A. Gambin, “On Consensus Biomarker Selection,” BMC Bioinformatics, vol. 8(Suppl 5):S5, 2007, doi:10.11861471-2105-8-S5-S5.
[10] B. Hanczar and E. Dougherty, “Classification with Reject Option in Gene Expression Data,” Bioinformatics, vol. 24, no. 17, pp. 1889-1895, 2008.
[10] L. Ein-Dor, I. Kela, G. Getz, D. Givol, and E. Domany, “Outcome Signature Genes in Breast Cancer: Is There a Unique Set?” Bioinformatics, vol. 21, pp. 171-178, 2005.
[11] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance and Redundancy,” J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[11] L. Ein-Dor, O. Zuk, and E. Domany, “Thousands of Samples Are Needed to Generate a Robust Gene List for Predicting Outcome in Cancer,” Proc. Nat'l Academy of Sciences USA, vol. 103, no. 15, pp. 5923-5928, 2006.
[12] The Wellcome Trust Case Control Consortium, “Genome-Wide Association Study of 14,000 Cases of Seven Common Diseases and 3,000 Shared Controls,” Nature, vol. 447, no. 7145, pp. 661-678, 2007.
[12] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Computer Systems and Science, vol. 55, no. 1, pp. 119-139, 1997.
[13] C.-J. Lin, R.C. Weng, and S.S. Keerthi, “Trust Region Newton Method for Logistic Regression,” J. Machine Learning Research, vol. 9, pp. 627-650, 2008.
[13] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, 1999.
[14] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, “LIBLINEAR: A Library for Large Linear Classification,” Machine Learning Research, vol. 9, pp. 1871-1874, 2008.
[14] G.J. Gordon, R.V. Jensen, L. Hsiaoand, S.R. Gullans, J.E. Blumenstock, S. Ramaswamy, W.G. Richards, D.J. Sugarbaker, and R. Bueno, “Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma,” Cancer Research, vol. 62, pp. 4963-4967, 2002.
[15] Y. Saeys, I. Inza, and P. Larrañaga, “A Review of Feature Selection Techniques in Bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007.
[15] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, pp. 389-422, 2002.
[16] C. Sima and E.R. Dougherty, “The Peaking Phenomenon in the Presence of Feature-Selection,” Pattern Recognition Letters, vol. 29, no. 11, pp. 1667-1674, 2008.
[16] Y. Han and L. Yu, “A Variance Reduction Framework for Stable Feature Selection,” Proc. 10th IEEE Int'l Conf. Data Mining, pp. 206-215, 2010.
[17] J. Hanley and B. McNeil, “The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve,” Radiology, vol. 143, no. 1, pp. 29-36, 1982.
[17] T. Helleputte and P. Dupont, “Feature Selection by Transfer Learning with Linear Regularized Models,” Proc. 19th European Conf. Machine Learning (ECML '09), pp. 533-547, 2009.
[18] M.A. Schaub, I.M. Kaplow, M. Sirota, C.B. Do, A.J. Butte, and S. Batzoglou, “A Classifier-Based Approach to Identify Genetic Similarities between Diseases,” Bioinformatics, vol. 25, no. 12, pp. i21-i29, 2009.
[18] T. Helleputte and P. Dupont, “Partially Supervised Feature Selection with Regularized Linear Models,” Proc. 26th Int'l Conf. Machine Learning, pp. 409-416, 2009.
[19] D.M. Evans, P.M. Visscher, and N.R. Wray, “Harnessing the Information Contained within Genome-Wide Association Studies to Improve Individual Prediction of Complex Disease Risk,” Human Molecular Genetics, vol. 18, no. 18, pp. 3525-3531, Sept. 2009.
[19] G. Jurman, S. Merler, A. Barla, S. Paoli, A. Galea, and C. Furlanello, “Algebraic Stability Indicators for Ranked Lists in Molecular Profiling,” Bioinformatics, vol. 24, no. 2, pp. 258-264, 2008.
[20] Z. Wei, K. Wang, H.-Q. Qu, H. Zhang, J. Bradfield, C. Kim, E. Frackleton, C. Hou, J.T. Glessner, R. Chiavacci, C. Stanley, D. Monos, S.F.A. Grant, C. Polychronakos, and H. Hakonarson, “From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes,” PLoS Genetics, vol. 5, no. 10: e1000678, 2009, doi:10.1371/journal.pgen.1000678.
[20] A. Kalousis, J. Prados, and M. Hilario, “Stability of Feature Selection Algorithms: A Study on High-Dimensional Spaces,” Knowledge and Information Systems, vol. 12, pp. 95-116, 2007.
[21] M.A.R. Ferreira, M.C. O'Donovan, Y.A. Meng, I.R. Jones, D.M. Ruderfer, L. Jones, J. Fan, G. Kirov, R.H. Perlis, E.K. Green, J.W. Smoller, D. Grozeva, J. Stone, I. Nikolov, K. Chambert, M.L. Hamshere, V.L. Nimgaonkar, V. Moskvina, M.E. Thase, S. Caesar, G.S. Sachs, J. Franklin, K. Gordon-Smith, K.G. Ardlie, S.B. Gabriel, C. Fraser, B. Blumenstiel, M. Defelice, G. Breen, M. Gill, D.W. Morris, A. Elkin, W.J. Muir, K.A. McGhee, R. Williamson, D.J. MacIntyre, A.W. MacLean, D. St Clair, M. Robinson, M. Van Beck, A.C.P. Pereira, R. Kandaswamy, A. McQuillin, D.A. Collier, N.J. Bass, A.H. Young, J. Lawrence, I. Nicol Ferrier, A. Anjorin, A. Farmer, D. Curtis, E.M. Scolnick, P. McGuffin, M.J. Daly, A.P. Corvin, P.A. Holmans, D.H. Blackwood, H.M. Gurling, M.J. Owen, S.M. Purcell, P. Sklar, and N. Craddock, “Collaborative Genome-Wide Association Analysis Supports a Role for ank3 and Cacna1c in Bipolar Disorder,” Nature Genetics, vol. 40, no. 9, pp. 1056-1058, 2008.
[21] L. Kuncheva, “A Stability Index for Feature Selection,” Proc. 25th Int'l Multi-Conf.: Artificial Intelligence and Applications, pp. 390-395, 2007.
[22] P. Sklar, J.W. Smoller, J. Fan, M. Ferreira, R. Perlis, K. Chambert, V. Nimgaonkar, M. McQueen, S. Faraone, A. Kirby, P. de Bakker, M. Ogdie, M. Thase, G. Sachs, K. Todd-Brown, S. Gabriel, C. Sougnez, C. Gates, B. Blumenstiel, M. Defelice, K. Ardlie, J. Franklin, W. Muir, K. McGhee, D. MacIntyre, A. McLean, M. VanBeck, A. McQuillin, N. Bass, M. Robinson, J. Lawrence, A. Anjorin, D. Curtis, E. Scolnick, M. Daly, D. Blackwood, H. Gurling, and S. Purcell, “Whole-Genome Association Study of Bipolar Disorder,” Molecular Psychiatry, vol. 13, no. 6, pp. 558-569, 2008.
[22] K.E. Lee, N. Sha, E.R. Dougherty, M. Vannucci, and B.K. Mallick, “Gene Selection: A Bayesian Variable Selection Approach,” Bioinformatics, vol. 19, no. 1, pp. 90-97, 2003.
[23] A.E. Baum, N. Akula, M. Cabanero, I. Cardona, W. Corona, B. Klemens, T.G. Schulze, S. Cichon, M. Rietschel, M.M. Nothen, A. Georgi, J. Schumacher, M. Schwarz, R. Abou Jamra, S. Hofels, P. Propping, J. Satagopan, S.D. Detera-Wadleigh, J. Hardy, and F.J. McMahon, “A Genome-Wide Association Study Implicates Diacylglycerol Kinase eta (dgkh) and Several Other Genes in the Etiology of Bipolar Disorder,” Molecular Psychiatry, vol. 13, no. 2, pp. 197-207, 2007.
[23] T. Li, C. Zhang, and M. Ogihara, “A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression,” Bioinformatics, vol. 20, pp. 2429-2437, 2004.
[24] N.J. Samani, J. Erdmann, A.S. Hall, C. Hengstenberg, M. Mangino, B. Mayer, R.J. Dixon, T. Meitinger, P. Braund, H.-E. Wichmann, J.H. Barrett, I.R. Konig, S.E. Stevens, S. Szymczak, D.-A. Tregouet, M.M. Iles, F. Pahlke, H. Pollard, W. Lieb, F. Cambien, M. Fischer, W. Ouwehand, S. Blankenberg, A.J. Balmforth, A. Baessler, S.G. Ball, T.M. Strom, I. Braenne, C. Gieger, P. Deloukas, M.D. Tobin, A. Ziegler, J.R. Thompson, and H. Schunkert, “Genomewide Association Analysis of Coronary Artery Disease,” The New England J. Medicine, vol. 357, no. 5, pp. 443-453, 2007.
[24] H. Liu, J. Li, and L. Wong, “A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns,” Genome Informatics, vol. 13, pp. 51-60, 2002.
[25] C.J. Willer, S. Sanna, A.U. Jackson, A. Scuteri, L.L. Bonnycastle, R. Clarke, S.C. Heath, N.J. Timpson, S.S. Najjar, H.M. Stringham, J. Strait, W.L. Duren, A. Maschio, F. Busonero, A. Mulas, G. Albai, A.J. Swift, M.A. Morken, N. Narisu, D. Bennett, S. Parish, H. Shen, P. Galan, P. Meneton, S. Hercberg, D. Zelenika, W.-M. Chen, Y. Li, L.J. Scott, P.A. Scheet, J. Sundvall, R.M. Watanabe, R. Nagaraja, S. Ebrahim, D.A. Lawlor, Y. Ben-Shlomo, G. Davey-Smith, A.R. Shuldiner, R. Collins, R.N. Bergman, M. Uda, J. Tuomilehto, A. Cao, F.S. Collins, E. Lakatta, G.M. Lathrop, M. Boehnke, D. Schlessinger, K.L. Mohlke, and G.R. Abecasis, “Newly Identified Loci that Influence Lipid Concentrations and Risk of Coronary Artery Disease,” Nature Genetics, vol. 40, no. 2, pp. 161-169, 2008.
[25] S. Loscalzo, L. Yu, and C. Ding, “Consensus Group Based Stable Feature Selection,” Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '09), pp. 567-576, http://portal. acm.orgcitation.cfm?id=1557019.1557084 , 2009.
[26] Y. Aulchenko et al., “Loci Influencing Lipid Levels and Coronary Heart Disease Risk in 16 European Population Cohorts,” Nature Genetics, vol. 41, no. 1, pp. 47-55, 2009.
[26] P.A. Mundra and J.C. Rajapakse, “SVM-RFE with MRMR Filter for Gene Selection,” IEEE Trans. NanoBioscience, vol. 9, no. 1, pp. 31-37, Mar. 2010.
[27] M. Parkes, J.C. Barrett, N.J. Prescott, M. Tremelling, C.A. Anderson, S.A. Fisher, R.G. Roberts, E.R. Nimmo, F.R. Cummings, D. Soars, H. Drummond, C.W. Lees, S.A. Khawaja, R. Bagnall, D.A. Burke, C.E. Todhunter, T. Ahmad, C.M. Onnie, W. McArdle, D. Strachan, G. Bethel, C. Bryan, C.M. Lewis, P. Deloukas, A. Forbes, J. Sanderson, D.P. Jewell, J. Satsangi, J.C. Mansfield, L. Cardon, and C.G. Mathew, “Sequence Variants in the Autophagy Gene Irgm and Multiple Other Replicating Loci Contribute to Crohn's Disease Susceptibility,” Nature Genetics, vol. 39, no. 7, pp. 830-832, 2007.
[27] M.S. Pepe, R. Etzioni, Z. Feng, J.D. Potter, M.L. Thompson, M. Thornquist, M. Winget, and Y. Yasui, “Phases of Biomarker Development for Early Detection of Cancer,” J. Nat'l Cancer Inst., vol. 93, pp. 1054-1060, 2001.
[28] J.C. Barrett, S. Hansoul, D.L. Nicolae, J.H. Cho, R.H. Duerr, J.D. Rioux, S.R. Brant, M.S. Silverberg, K.D. Taylor, M.M. Barmada, A. Bitton, T. Dassopoulos, L.W. Datta, T. Green, A.M. Griffiths, E.O. Kistner, M.T. Murtha, M.D. Regueiro, J.I. Rotter, L.P. Schumm, A.H. Steinhart, S.R. Targan, R.J. Xavier, C. Libioulle, C. Sandor, M. Lathrop, J. Belaiche, O. Dewit, I. Gut, S. Heath, D. Laukens, M. Mni, P. Rutgeerts, A. Van Gossum, D. Zelenika, D. Franchimont, J.-P. Hugot, M. de Vos, S. Vermeire, E. Louis, L.R. Cardon, C.A. Anderson, H. Drummond, E. Nimmo, T. Ahmad, N.J. Prescott, C.M. Onnie, S.A. Fisher, J. Marchini, J. Ghori, S. Bumpstead, R. Gwilliam, M. Tremelling, P. Deloukas, J. Mansfield, D. Jewell, J. Satsangi, C.G. Mathew, M. Parkes, M. Georges, and M.J. Daly, “Genome-Wide Association Defines More than 30 Distinct Susceptibility Loci for Crohn's Disease,” Nature Genetics, vol. 40, no. 8, pp. 955-962, 2008.
[28] E.F. Petricoin, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, and L.A. Liotta, “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer,” Lancet, vol. 359, pp. 572-577, 2002.
[29] W. Thomson, A. Barton, X. Ke, S. Eyre, A. Hinks, J. Bowes, R. Donn, D. Symmons, S. Hider, I.N. Bruce, A.G. Wilson, I. Marinou, A. Morgan, P. Emery, A. Carter, S. Steer, L. Hocking, D.M. Reid, P. Wordsworth, P. Harrison, D. Strachan, and J. Worthington, “Rheumatoid Arthritis Association at 6q23,” Nature Genetics, vol. 39, no. 12, pp. 1431-1433, 2007.
[29] M. Robnik-Sikonja and I. Kononenko, “Theoretical and Empirical Analysis of Relief and ReliefF,” Machine Learning, vol. 53, pp. 23-69, 2003.
[30] A. Barton, W. Thomson, X. Ke, S. Eyre, A. Hinks, J. Bowes, D. Plant, L.J. Gibbons, A.G. Wilson, D.E. Bax, A.W. Morgan, P. Emery, S. Steer, L. Hocking, D.M. Reid, P. Wordsworth, P. Harrison, and J. Worthington, “Rheumatoid Arthritis Susceptibility Loci at Chromosomes 10p15, 12q13 and 22q13,” Nature Genetics, vol. 40, no. 10, pp. 1156-1159, 2008.
[30] B.Y. Rubinstein, Simulation and the Monte Carlo Method. John Wiley & Sons, 1981.
[31] R.M. Plenge, M. Seielstad, L. Padyukov, A.T. Lee, E.F. Remmers, B. Ding, A. Liew, H. Khalili, A. Chandrasekaran, L.R. Davies, W. Li, A.K. Tan, C. Bonnard, R.T. Ong, A. Thalamuthu, S. Pettersson, C. Liu, C. Tian, W.V. Chen, J.P. Carulli, E.M. Beckman, D. Altshuler, L. Alfredsson, L.A. Criswell, C.I. Amos, M.F. Seldin, D.L. Kastner, L. Klareskog, and P.K. Gregersen, “TRAF1-C5 as a Risk Locus for Rheumatoid Arthritis—A Genomewide Study,” The New England J. Medicine, vol. 357, no. 12, pp. 1199-1209, 2007.
[31] Y. Saeys, I. Inza, and P. Larranaga, “A Review of Feature Selection Techniques in Bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007.
[32] J.A. Todd, N.M. Walker, J.D. Cooper, D.J. Smyth, K. Downes, V. Plagnol, R. Bailey, S. Nejentsev, S.F. Field, F. Payne, C.E. Lowe, J.S. Szeszko, J.P. Hafler, L. Zeitels, J.H.M. Yang, A. Vella, S. Nutland, H.E. Stevens, H. Schuilenburg, G. Coleman, M. Maisuria, W. Meadows, L.J. Smink, B. Healy, O.S. Burren, A.A.C. Lam, N.R. Ovington, J. Allen, E. Adlem, H.-T. Leung, C. Wallace, J.M.M. Howson, C. Guja, C. Ionescu-Tirgoviste, M.J. Simmonds, J.M. Heward, S.C.L. Gough, D.B. Dunger, L.S. Wicker, and D.G. Clayton, “Robust Associations of Four New Chromosome Regions from Genome-Wide Analyses of Type 1 Diabetes,” Nature Genetics, vol. 39, no. 7, pp. 857-864, 2007.
[32] D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, A.V. D'Amico, J.P. Richie, E.S. Lander, M. Loda, P.W. Kantoff, T.R. Golub, and W.R. Sellers, “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 2, pp. 203-209, 2002.
[33] H. Hakonarson, S.F.A. Grant, J.P. Bradfield, L. Marchand, C.E. Kim, J.T. Glessner, R. Grabs, T. Casalunovo, S.P. Taback, E.C. Frackelton, M.L. Lawson, L.J. Robinson, R. Skraban, Y. Lu, R.M. Chiavacci, C.A. Stanley, S.E. Kirsch, E.F. Rappaport, J.S. Orange, D.S. Monos, M. Devoto, H.-Q. Qu, and C. Polychronakos, “A Genome-Wide Association Study Identifies Kiaa0350 as a Type 1 Diabetes Gene,” Nature, vol. 448, no. 7153, pp. 591-594, 2007.
[33] Y. Tang, Y.Q. Zhang, and Z. Huang, “Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 365-381, July 2007.
[34] E. Zeggini, M.N. Weedon, C.M. Lindgren, T.M. Frayling, K.S. Elliott, H. Lango, N.J. Timpson, J.R.B. Perry, N.W. Rayner, R.M. Freathy, J.C. Barrett, B. Shields, A.P. Morris, S. Ellard, C.J. Groves, L.W. Harries, J.L. Marchini, K.R. Owen, B. Knight, L.R. Cardon, M. Walker, G.A. Hitman, A.D. Morris, A.S.F. Doney, T.W.T.C.C.C. (WTCCC), M.I. McCarthy, and A.T. Hattersley, “Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes,” Science, vol. 316, no. 5829, pp. 1336-1341, 2007.
[34] I.H. Witten and E. Frank, Data Mining - Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, 2005.
[35] N.R. Wray, J. Yang, M.E. Goddard, and P.M. Visscher, “The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling,” PLoS Genetics, vol. 6, no. 2: e1000864, 2010, doi:10.1371/journal.pgen.1000864.
[35] Y.H. Yang, Y. Xiao, and M.R. Segal, “Identifying Differentially Expressed Genes from Microarray Experiments via Statistic Synthesis,” Bioinformatics, vol. 21, no. 7, pp. 1084-1093, 2005.
[36] J. Ye, J. Chen, R. Janardan, and S. Kumar, “Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 4, pp. 181-190, Oct.-Dec. 2004.
[37] K.Y. Yeung, R.E. Bumgarner, and A.E. Raftery, “Bayesian Model Averaging: Development of an Improved Multi-Class, Gene Selection and Classification Tool for Microarray Data,” Bioinformatics, vol. 21, no. 10, pp. 2394-2402, 2005.
[38] M. Zhang, L. Zhang, J. Zou, C. Yao, H. Xiao, Q. Liu, J. Wang, D. Wang, C. Wang, and Z. Guo, “Evaluating Reproducibility of Differential Expression Discoveries in Microarray Studies by Considering Correlated Molecular Changes,” Bioinformatics, vol. 25, no. 13, pp. 1662-1668, 2009.
[39] S. Zhu, D. Wang, K. Yu, T. Li, and Y. Gong, “Feature Selection for Gene Expression Using Model-Based Entropy,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 25-36, Jan.-Mar. 2010.

Index Terms:
polymorphism,diseases,genetics,genomics,genome-wide association,disease liability prediction,large scale genotyping data,genetic polymorphism,genome-wide genotypic data,Wellcome Trust Case Control Consortium data set,WTCCC data set,Diseases,Bioinformatics,Diabetes,Biological cells,Input variables,Genomics,risk of common human diseases.,Genome-wide analysis,classification with a reject option
Citation:
J. R. Quevedo, A. Bahamonde, M. Perez-Enciso, O. Luaces, "Disease Liability Prediction from Large Scale Genotyping Data Using Classifiers with a Reject Option," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 1, pp. 88-97, Jan.-Feb. 2012, doi:10.1109/TCBB.2011.44
Usage of this product signifies your acceptance of the Terms of Use.