loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Mining on DNA Sequences of Hepatitis B Virus
PrePrint
ISSN: 1545-5963
KwongSak Leung, The Chinese University of Hong Kong, Hong Kong
KinHong Lee, The Chinese University of Hong Kong, Hong Kong
JinFeng Wang, The Chinese University of Hong Kong, Hong Kong
Eddie YT Ng, The Chinese University of Hong Kong, Hong Kong
Henry LY Chan, The Chinese University of Hong Kong, Hong Kong
Stephen KW Tsui, The Chinese University of Hong Kong, Hong Kong
Tony SK Mok, The Chinese University of Hong Kong, Hong Kong
Pete Chi-Hang Tse, The Chinese University of Hong Kong, Hong Kong
Joseph JY Sung, The Chinese University of Hong Kong, Hong Kong
In this study, a data mining framework which includes molecular evolution analysis, clustering, feature selection, classifier learning and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. In the feature selection process, potential markers are selected based on Information Gain for further classifier learning. Then meaningful rules are learnt by our algorithm called the Rule Learning which is based on Evolutionary Algorithm. Also, a new classification method by Nonlinear Integral has been developed. Good performance of this method comes from the use of the fuzzy measure and the relevant nonlinear integral. The nonadditivity of the fuzzy measure reflects the importance of the feature attributes as well as their interactions. These two classifiers give explicit information on the importance of the individual mutated sites and their interactions towards the classification (potential causes to liver cancer in our case). A thorough comparison study of these two methods with existing methods is detailed.
Index Terms:
Data mining, Bioinformatics (genome or protein) databases, Mining methods and algorithms
Citation:
KwongSak Leung, KinHong Lee, JinFeng Wang, Eddie YT Ng, Henry LY Chan, Stephen KW Tsui, Tony SK Mok, Pete Chi-Hang Tse, Joseph JY Sung, "Data Mining on DNA Sequences of Hepatitis B Virus," IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14 Jan. 2009. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.6>
Usage of this product signifies your acceptance of the Terms of Use.