19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)
On Preprocessing of SELDI-MS Data and its Evaluation
Salt Lake City, Utah
June 22-June 23
ISBN: 0-7695-2517-1
Mass spectrometry is becoming an important tool in proteomics. Mass spectral data are characterized by very high dimensionality and a high level of redundancy. Both issues are quite challenging when one wants to perform knowledge discovery and push existing tools to their limits. We tackle both via a preprocessing pipeline that drastically reduces dimensionality and redundancy of the initial representation in order to focus on biologically relevant information. Essentially preprocessing performs feature extraction in a manner that reflects domain knowledge. We propose a framework for the evaluation of the given pipeline and in fact of any mass spectrometry preprocessing pipeline which is based on the level of conservation of discriminatory information. The discriminatory information content of a given representation is objectively measured by the classification performance of a number of classification algorithms evaluated on the given representation. This approach also allows us to compare a number of different preprocessing possibilities, namely using peak intensities vs peak areas to represent peaks and how non observed peaks should be treated, and demonstrate which is the most informative one.
Citation:
Julien Prados, Alexandros Kalousis, Melanie Hilario, "On Preprocessing of SELDI-MS Data and its Evaluation," cbms, pp.953-958, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06), 2006