2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) (2010)
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
Chang Sheng , National University of Singapore, Singapore
Wynne Hsu , National University of Singapore, Singapore
Mong Li Lee , National University of Singapore, Singapore
Joo Chuan Tong , Institute of Infocomm Research, Singapore
See-Kiong Ng , Institute of Infocomm Research, Singapore
The increasing infectious disease outbreaks has led to a need for new research to better understand the disease's origins, epidemiological features and pathogenicity caused by fast-mutating, fast-spreading viruses. Traditional sequence analysis methods do not take into account the spatio-temporal dynamics of rapidly evolving and spreading viral species. They are also focused on identifying single-point mutations. In this paper, we propose a novel approach that incorporates space-time relationships for studying changes in protein sequences from fast mutating viruses. We aim to detect both single-point mutations as well as k-mutations in the viral sequences. We define the problem of mutation chain pattern mining and design algorithms to discover valid mutation chains. Compact data structures to facilitate the mining process as well as pruning strategies to increase the scalability of the algorithms are devised. Experiments on both synthetic datasets and real world influenza A virus dataset show that our algorithms are scalable and effective in discovering mutations that occur geographically over time.
J. C. Tong, W. Hsu, C. Sheng, M. L. Lee and S. Ng, "Mining mutation chains in biological sequences," 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)(ICDE), Long Beach, CA, USA, 2010, pp. 473-484.