Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
Wynne Hsu , National University of Singapore, Singapore
Mong Li Lee , National University of Singapore, Singapore
Joo Chuan Tong , Institute of Infocomm Research, Singapore
See-Kiong Ng , Institute of Infocomm Research, Singapore
The increasing infectious disease outbreaks has led to a need for new research to better understand the disease's origins, epidemiological features and pathogenicity caused by fast-mutating, fast-spreading viruses. Traditional sequence analysis methods do not take into account the spatio-temporal dynamics of rapidly evolving and spreading viral species. They are also focused on identifying single-point mutations. In this paper, we propose a novel approach that incorporates space-time relationships for studying changes in protein sequences from fast mutating viruses. We aim to detect both single-point mutations as well as k-mutations in the viral sequences. We define the problem of mutation chain pattern mining and design algorithms to discover valid mutation chains. Compact data structures to facilitate the mining process as well as pruning strategies to increase the scalability of the algorithms are devised. Experiments on both synthetic datasets and real world influenza A virus dataset show that our algorithms are scalable and effective in discovering mutations that occur geographically over time.
Wynne Hsu, Mong Li Lee, Joo Chuan Tong, See-Kiong Ng, "Mining mutation chains in biological sequences", ICDE, 2010, 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013 IEEE 29th International Conference on Data Engineering (ICDE) 2010, pp. 473-484, doi:10.1109/ICDE.2010.5447869