Circuits, Communications and Systems, Pacific-Asia Conference on (2009)
May 16, 2009 to May 17, 2009
This paper reports a comparative study for medical text categorizations on four machine learning methods: k Nearest Neighbor (kNN), Support Vector Machines (SVM), Naïve Bayes (NB) and Clonal Selection Algorithm Based on Antibody Density (CSABAD). CSABAD is an improved immune algorithm proposed by us. According to the clonal selection principle and density control mechanism, only those cells that have higher affinity and lower density are selected to proliferate. In addition, we propose an improved approach, called Term Frequency, Inverted Document Frequency and Inverted Entropy (TFIDFIE), to compute term weights in document indexing. It considers the distribution of documents in the training set in which the term occurs. Our experiments show that SVM and CSABAD outperform significantly kNN and Naive Bayes, and TFIDFIE is more effective than TFIDF on OHSCAL data set.
medical text categorization, machine learning, immune algorithm, document indexing
H. Zhou, J. Tan, Q. Zhang, K. He and W. Tao, "Machine Learning Methods for Medical Text Categorization," 2009 Pacific-Asia Conference on Circuits, Communications and Systems (PACCS 2009)(PACCS), Chengdu, 2009, pp. 494-497.