Issue No. 03 - May-June (2013 vol. 10)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.38
Md. Muksitul Haque , Washington State University, Pullman
Lawrence B. Holder , Washington State University, Pullman
Michael K. Skinner , Washington State University, Pullman
Diane J. Cook , Washington State University, Pullman
Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.
Training, Learning systems, DNA, Uncertainty, Bioinformatics, Accuracy
M. M. Haque, L. B. Holder, M. K. Skinner and D. J. Cook, "Generalized Query-Based Active Learning to Identify Differentially Methylated Regions in DNA," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 632-644, 2013.