This Article 
 Bibliographic References 
 Add to: 
Multiple Peak Alignment in Sequential Data Analysis: A Scale-Space-Based Approach
July-September 2006 (vol. 3 no. 3)
pp. 208-219
In this paper, we address the multiple peak alignment problem in sequential data analysis with an approach based on the Gaussian scale-space theory. We assume that multiple sets of detected peaks are the observed samples of a set of common peaks. We also assume that the locations of the observed peaks follow unimodal distributions (e.g., normal distribution) with their means equal to the corresponding locations of the common peaks and variances reflecting the extension of their variations. Under these assumptions, we convert the problem of estimating locations of the unknown number of common peaks from multiple sets of detected peaks into a much simpler problem of searching for local maxima in the scale-space representation. The optimization of the scale parameter is achieved using an energy minimization approach. We compare our approach with a hierarchical clustering method using both simulated data and real mass spectrometry data. We also demonstrate the merit of extending the binary peak detection method (i.e., a candidate is considered either as a peak or as a nonpeak) with a quantitative scoring measure-based approach (i.e., we assign to each candidate a possibility of being a peak).

[1] J. Aach and G.M. Church, “Aligning Gene Expression Time Series with Time Warping Algorithms,” Bioinformatics, vol. 17, no. 6, pp. 495-508, 2001.
[2] C. Ambroise and G.J. McLachlan, “Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data,” Proc. Nat'l Academy of Sciences of the United States of Am., vol. 99, no. 10, pp. 6562-6566, 2002.
[3] T.S. Anantharaman, B. Mishra, and D.C. Schwartz, “Genomics via Optical Mapping II: Ordered Restriction Maps,” J. Computational Biology, vol. 4, pp. 91-118, 1997.
[4] J. Babaud, A.P. Witkin, M. Baudin, and R.O. Duda, “Uniqueness of the Gaussian Kernel for Scale-Space Filtering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, pp. 26-33, 1986.
[5] K.A. Baggerly, J.S. Morris, J. Wang, D. Gold, L.C. Xiao, and K.R. Coombes, “A Comprehensive Approach to the Analysis of Matrix-Assisted Laser Desorption/Ionization-Time of Flight Proteomics Spectra from Serum Samples,” Proteomics, vol. 3, pp. 1667-1672, 2003.
[6] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[7] K.R. Coombes, H.A. FritscheJr., C. Clarke, J. Chen, K.A. Baggerly, J.S. Morris, L. Xiao, M. Hung, and H.M. Kuerer, “Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization,” Clinical Chemistry, vol. 49, no. 10, pp. 1615-1623, 2003.
[8] K.R. Coombes, S. Tsavachidis, J.S. Morris, K.A. Baggerly, M. Hung, and H.M. Kuerer, “Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from Surface-Enhanced Laser Desorption and Ionization by Denoising Secptra with the Undecimated Discrete Wavelet Transform,” technical report, Univ. of Texas M.D. Anderson Cancer Center, 2004.
[9] B. Efron and R. Tibshirani, “Improvements on Cross-Validation: The .632+ Bootstrap Method,” J. Am. Statistical Assoc., vol. 92, no. 438, pp. 548-560, 1997.
[10] P.H.C. Eilers, “Parametric Time Warping,” Analytical Chemistry, vol. 76, no. 2, pp. 404-411, 2004.
[11] A. Gryff-Keller and S. Molchanov, “Systematic Discrepancy of Theoretical Predictions of NMR Chemical Shifts for Chlorinated Aromatic Carbons Using the GIAO DFT Method,” Molecular Physics, vol. 102, no. 18, pp. 1903-1908, 2004.
[12] K.J. Johnson, B.W. Wright, K.H. Jarman, and R.E. Synovec, “High-Speed Peak Matching Algorithm for Retention Time Alignment of Gas Chromatographic Data for Chemometric Analysis,” J. Chromatography A, no. 996, pp. 141-155, 2003.
[13] J. Listgarten, R.M. Nealy, S.T. Roweis, and A. Emili, “Multiple Alignment of Continuous Time Series,” Advances in Neural Information Processing Systems 17, Cambridge, Mass.: MIT Press, 2004.
[14] N.V. Nielsen, J.M. Carstensen, and J. Smedsgaard, “Aligning of Single and Multiple Wavelength Chromatographic Profiles for Chemometric Data Analysis Using Correlation Optimised Warping,” J. Chromatography A, vol. 805, pp. 17-35, 1998.
[15] F. Nomura, T. Tomonaga, K. Sogawa, T. Ohashi, M. Nezu, M. Sunaga, N. Kondo, M. Iyo, H. Shimada, and T. Ochiai, “Identification of Novel and Downregulated Biomarkers for Alcoholism by Surface Enhanced Laser Desorption/Ionization-Mass Spectrometry,” Proteomics, vol. 4, no. 4, pp. 1187-1194, 2004.
[16] M.C. Papadopoulos, P.M. Abel, D. Agranoff, A. Stich, E. Tarelli, B.A. Bell, T. Planche, A. Loosemore, S. Saadoun, P. Wilkins, and S. Krishna, “A Novel and Accurate Diagnostic Test for Human African Trypanosomiasis,” The Lancet, vol. 363, no. 9418, pp. 1358-1363, 2004.
[17] E.F. PetricoinIII, A.M Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohnb, and L.A. Liottab, “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer,” The Lancet, vol. 359, no. 9306 pp. 572-577, 2002.
[18] T.W. Randolph and Y. Yasui, “Multiscale Processing of Mass Spectrometry Data,” Univ. of Washington Biostatistics Working Paper Series, Number 230, 2004.
[19] G.A. Satten, S. Datta, H. Moura, A.R. Woolfitt, G. Carvalho, R. Facklam, and J.R. Barr, “Standardization and Denoising Algorithms for Mass Spectra to Classify Whole-Organism Bacterial Specimens,” Bioinformatics, vol. 20, no. 17, pp. 3128-3136, 2004.
[20] R. Tibshirani, T. Hastie, B. Narasimhan, S. Soltys, G. Shi, A. Koong, and Q. Le, “Sample Classification from Protein Mass Spectrometry, by ‘Peak Probability Contrasts,’” Bioinformatics, vol. 20, no. 17, pp. 3034-3044, 2004.
[21] R.J.O. Torgrip, M. Aberg, B. Karlberg, and S.P. Jacobsson, “Peak Alignment Using Reduced Set Mapping,” J. Chemometrics, vol. 17, pp. 573-582, 2003.
[22] J.T. Wadsworth, K.D. Somers, L.H. Cazares, G. Malik, B.L. Adam, B.C. StackJr., G.L. WrightJr., and O.J. Semmes, “Serum Protein Profiles to Identify Head and Neck Cancer,” Clinical Cancer Research, vol. 10, no. 5, pp. 1625-1632, 2004.
[23] M. Wagner, D. Naik, and A. Pothen, “Protocols for Disease Classification from Mass Spectrometry Data,” Proteomics, vol. 3, no. 9, pp. 1692-1698, 2003.
[24] B. Wu, T. Abbott, D. Fishman, W. McMurray, G. Mor, K. Stone, D. Ward, K. Williams, and H. Zhao, “Comparison of Statistical Methods for Classification of Ovarian Cancer Using Mass Spectrometry Data,” Bioinformatics, vol. 19, pp. 1636-1643, 2003.
[25] Y. Yasui, M. Pepe, M.L. Thompson, B. Adam, G.L. WrightJr., Y. Qu, J.D. Potter, M. Winget, M. Thornquist, and Z. Feng, “A Data-Analytic Strategy for Protein Biomarker Discovery: Profiling of High-Dimensional Proteomic Data for Cancer Detection,” Biostatistics, vol. 4, no. 3, pp. 449-463, 2003.
[26] J.S. Yu and X.W. Chen, “Bayesian Neural Network Approaches to Ovarian Cancer Identification from High-Resolution Mass Spectrometry Data,” Bioinformatics, vol. 21, supplement 1, pp. i487-i494, 2005.
[27] J.S. Yu, S. Ongarello, R. Fiedler, X.W. Chen, G. Toffolo, C. Cobelli, and Z. Trajanoski, “Ovarian Cancer Identification Based on Dimensionality Reduction for High-Throughput Mass Spectrometry Data,” Bioinformatics, vol. 21, no. 10, pp. 2200-2209, 2005.
[28] W. Yu, B. Wu, N. Lin, K. Stone, K. Williams, and H. Zhao, “Detecting and Aligning Peaks in Mass Spectrometry Data with Applications to MALDI,” Computational Biology and Chemistry, vol. 30, no.1, pp. 27-38, 2006.

Index Terms:
Biomarker discovery, peak identification, multiple peak alignment, scale-space, prior information, energy minimization, parameter optimization.
Weichuan Yu, Xiaoye Li, Junfeng Liu, Baolin Wu, Kenneth R. Williams, Hongyu Zhao, "Multiple Peak Alignment in Sequential Data Analysis: A Scale-Space-Based Approach," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 208-219, July-Sept. 2006, doi:10.1109/TCBB.2006.41
Usage of this product signifies your acceptance of the Terms of Use.