Data Mining Techniques for the Identification of Genes with Expression Levels Related to Breast Cancer Prognosis
13th IEEE International Conference on BioInformatics and BioEngineering (2009)
June 22, 2009 to June 24, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/BIBE.2009.37
Providing clinical predictions for cancer patients by analyzing their genetic make-up is a difficult and very important issue. With the goal of identifying genes more correlated with the prognosis of breast cancer, we used data mining techniques to study the gene expression values of breast cancer patients with known clinical outcome. Focus of our work was the creation of a classification model to be used in the clinical practice to support therapy prescription. We randomly subdivided a gene expression dataset of 311 samples into a training set to learn the model and a test set to validate the model and assess its performance. We evaluated several learning algorithms in their not weighted and weighted form, which we defined to take into account the different clinical importance of false positive and false negative classifications. Based on our results, these last, especially when used in their combined form, appear to provide better results.
data mining, gene expression, breast cancer prognosis
Marco Pizzera, Pier Luca Lanzi, Enzo Medico, Gabriele Giarratana, Marco Masseroli, "Data Mining Techniques for the Identification of Genes with Expression Levels Related to Breast Cancer Prognosis", 13th IEEE International Conference on BioInformatics and BioEngineering, vol. 00, no. , pp. 295-300, 2009, doi:10.1109/BIBE.2009.37