This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Biological Sequence Classification with Multivariate String Kernels
Sept.-Oct. 2013 (vol. 10 no. 5)
pp. 1201-1210
Pavel P. Kuksa, NEC Laboratories America Inc, Princeton
String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.
Index Terms:
Sequential analysis,Kernel,Amino acids,Protein sequence,Quantization,Machine learning,kernel methods,Biological sequence classification
Citation:
Pavel P. Kuksa, "Biological Sequence Classification with Multivariate String Kernels," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 5, pp. 1201-1210, Sept.-Oct. 2013, doi:10.1109/TCBB.2013.15
Usage of this product signifies your acceptance of the Terms of Use.