The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - Sept.-Oct. (2013 vol.10)
pp: 1211-1217
Said Bleik , Inf. Syst. Dept., Univ. Heights, Newark, NJ, USA
Meenakshi Mishra , Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas, Lawrence, KS, USA
Jun Huan , Dept. of Electr. Eng. & Comput. Sci., Univ. of Kansas, Lawrence, KS, USA
Min Song , Dept. of Libr. & Inf. Sci., Yonsei Univ., Seoul, South Korea
ABSTRACT
Recently, graph representations of text have been showing improved performance over conventional bag-of-words representations in text categorization applications. In this paper, we present a graph-based representation for biomedical articles and use graph kernels to classify those articles into high-level categories. In our representation, common biomedical concepts and semantic relationships are identified with the help of an existing ontology and are used to build a rich graph structure that provides a consistent feature set and preserves additional semantic information that could improve a classifier's performance. We attempt to classify the graphs using both a set-based graph kernel that is capable of dealing with the disconnected nature of the graphs and a simple linear kernel. Finally, we report the results comparing the classification performance of the kernel classifiers to common text-based classifiers.
INDEX TERMS
Kernel, Text categorization, Unified modeling language, Support vector machine classification, Graph representations, Semantics,textual and multimedia data, Text categorization, graph representations, graph kernels, biomedical ontologies, mining methods and algorithms, text mining, classifier design and evaluation, modeling structured
CITATION
Said Bleik, Meenakshi Mishra, Jun Huan, Min Song, "Text Categorization of Biomedical Data Sets Using Graph Kernels and a Controlled Vocabulary", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 5, pp. 1211-1217, Sept.-Oct. 2013, doi:10.1109/TCBB.2013.16
REFERENCES
[1] M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki, "Link Prediction Using Supervised Learning," Proc. of the Fourth Workshop Link Analysis, Counter-Terrorism and Security Held in Conjunction with the SIAM International Data Mining Conference (SDM '06). 2006.
[2] D.A. Lindberg, B.L. Humphreys, and A.T. McCray, "The Unified Medical Language System," Methods of information in Medicine, vol. 32, no. 4, pp. 281-291, 1993.
[3] S. Bleik, M. Song, A. Smalter, J. Huan, and G. Lushington, "CGM: A Biomedical Text Categorization Approach Using Concept Graph Mining," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine Workshop (BIBMW '09), pp. 38-43, 2009.
[4] M. Mishra, J. Huan, S. Bleik, and M. Song, "Biomedical Text Categorization with Concept Graph Representations Using a Controlled Vocabulary," Proc. 11th Int'l Workshop Data Mining in Bioinformatics, pp. 26-32, 2012.
[5] G. Salton, A. Wong, and C.S. Yang, "A Vector Space Model for Automatic Indexing," Comm. ACM, vol. 18, no. 11, pp. 613-620, 1975.
[6] F. Sebastiani, "Machine Learning in Automated Text Categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[7] M.E. Maron, "Automatic Indexing: An Experimental Inquiry," J. ACM, vol. 8, no. 3, pp. 404-417, 1961.
[8] T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," Proc. European Conf. Machine Learning (ECML '98), pp. 137-142, 1998.
[9] B.V. Dasarathy, Nearest Neighbor ({NN}) Norms: {NN} Pattern Classification Techniques. IEEE Press, 1991.
[10] C. Apté, F. Damerau, and S.M. Weiss, "Automated Learning of Decision Rules for Text Categorization," ACM Trans. Information Systems, vol. 12, no. 3, pp. 233-251, 1994.
[11] P. Wang and C. Domeniconi, "Building Semantic Kernels for Text Classification Using Wikipedia," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 713-721, 2008.
[12] A. Schenker, M. Last, H. Bunke, and A. Kandel, "Classification of Web Documents Using a Graph Model," Proc. Seventh Int'l Conf. Document Analysis and Recognition, pp. 240-244, vol. 1, 2003.
[13] R. Angelova and G. Weikum, "Graph-Based Text Classification: Learn from Your Neighbors," Proc. 29th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 485-492, 2006.
[14] C. Jiang, F. Coenen, R. Sanderson, and M. Zito, "Text Classification Using Graph Mining-Based Feature Extraction," Research and Development in Intelligent Systems XXVI, pp. 21-34, Springer, 2010.
[15] M. Arey and S. Chakravarthy, "InfoSift: Adapting Graph Mining Techniques for Text Classification," Proc. 18th Int'l FLAIRS Conf., 2005.
[16] K.R. Gee and D.J. Cook, "Text Classification Using Graph-Encoded Linguistic Elements," Proc. 18th Int'l FLAIRS Conf., 2005.
[17] Y.M. Chen, X.L. Wang, and B.Q. Liu, "Multi-Document Summarization Based on Lexical Chains," Proc. Int'l Conf. Machine Learning and Cybernetics, pp. 1937-1942, 2005.
[18] X. Wan, J. Yang, and J. Xiao, "Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction," Proc. Ann. Meeting Assoc. Computational Linguistics, vol. 45, pp. 552-559, 2007.
[19] K.M. Borgwardt and H.P. Kriegel, "Shortest-Path Kernels on Graphs," Proc. IEEE Fifth Int'l Conf. Data Mining, 2005.
[20] H. Kashima, K. Tsuda, and A. Inokuchi, "Marginalized Kernels Between Labeled Graphs," Proc. Int'l Conf. Machine Learning, vol. 20, pp. 321-328, 2003.
[21] C. Leslie, E. Eskin, and W.S. Noble, "The Spectrum Kernel: A String Kernel for SVM Protein Classification," Proc. Pacific Symp. Biocomputing, vol. 7, pp. 566-575, 2002.
[22] T. Horváth, T. Gärtner, and S. Wrobel, "Cyclic Pattern Kernels for Predictive Graph Mining," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 158-167, 2004.
[23] P. Mahé and J.P. Vert, "Graph Kernels Based on Tree Patterns for Molecules," Machine Learning, vol. 75, no. 1, pp. 3-35, 2009.
[24] J. Huan, D. Bandyopadhyay, J. Prins, J. Snoeyink, A. Tropsha, and W. Wang, "Distance-Based Identification of Structure Motifs in Proteins Using Constrained Frequent Subgraph Mining," Proc. Third Int'l Conf. Computational Systems Bioinformatics (CSB '06), vol. 4, pp. 227-238, 2006.
[25] H. Fröhlich, J.K. Wegner, F. Sieker, and A. Zell, "Optimal Assignment Kernels for Attributed Molecular Graphs," Proc. 22nd Int'l Conf. Machine Learning, pp. 225-232, 2005.
[26] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, "Text Classification Using String Kernels," J. Machine Learning Research, vol. 2, pp. 419-444, 2002.
[27] J.C. Gower, "A General Coefficient of Similarity and Some of Its Properties," Biometrics, vol. 27, pp. 857-871, 1971.
[28] P. Baldi and L. Ralaivola, "Graph Kernels for Molecular Classification and Prediction of Mutagenicity, Toxicity, and Anticancer Activity," Proc. Computational Biology Workshop Neural Information Processing Systems (NIPS '04), 2004.
[29] G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing and Management, vol. 24, no. 5, pp. 513-523, 1988.
[30] J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Comm. ACM, vol. 51, no. 1, pp. 107-113, 2008.
171 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool