The Community for Technology Leaders
2016 IEEE International Conference on Healthcare Informatics (ICHI) (2016)
Chicago, Illinois, United States
Oct. 4, 2016 to Oct. 7, 2016
ISBN: 978-1-5090-6117-4
pp: 527-533
ABSTRACT
Neural language models, such as word embedding, can effectively embed words into vector spaces and preserve linguistic regularities and semantic relationships. However, few researchers have shown their effectiveness on medical terms and relationships. In this paper, we study the applicability of word2vec, a well-known technique for word embedding, to embed medical terms and relations based on different medical text corpora, including biomedical abstracts of scientific papers, health-related discussion forums, and a commonly available general-purpose information resource. We empirically evaluate the applicability of this approach by studying how the word embedding projects certain classes of medical terms and relations to the word space and analyzing the differences between the three corpora for embedding medical terms and relations. Results show that the corpus of health-related discussion forum posts, authored by lay persons and medical novices, trains a comparable word embedding for popular medical terms, when compared against a professionally authored corpus of published biomedical abstracts.
INDEX TERMS
Semantics, Biomedical imaging, Encyclopedias, Electronic publishing, Internet, Diseases
CITATION

J. Huang, K. Xu and V. G. Vydiswaran, "Analyzing Multiple Medical Corpora Using Word Embedding," 2016 IEEE International Conference on Healthcare Informatics (ICHI), Chicago, Illinois, United States, 2016, pp. 527-533.
doi:10.1109/ICHI.2016.94
93 ms
(Ver 3.3 (11022016))