2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017)
Kansas City, MO, USA
Nov. 13, 2017 to Nov. 16, 2017
Ling Zheng , Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
Hua Min , Department of Health Administration and Policy, George Mason University, Fairfax, VA, USA
Yehoshua Perl , Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
James Geller , Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
The Gene hierarchy of the National Cancer Institute (NCI) Thesaurus (NCIt) is of high priority for NCI. It is important to have quality assurance (QA) techniques to improve its content quality. We present a two-step methodology concentrating on auditing the modeling of complex concepts, which are shown to have a higher error rate compared to control concepts. In the first step, we test whether concepts that appear complex in a so called “partial-area taxonomy” have a higher error rate than control concepts. In the second step, we introduce an innovative technique based on a “partial-area sub-taxonomy” (constructed with a subset of roles) to discover additional complex concepts. The results of the QA study show that these concepts are indeed statistically significantly more likely to have more errors than control concepts. This makes it easier for NCI staff to improve the modeling quality of gene concepts in NCIt.
Taxonomy, Error analysis, Ontologies, Cancer, Thesauri, Terminology, Quality assurance
L. Zheng, H. Min, Y. Perl and J. Geller, "Discovering additional complex NCIt gene concepts with high error rate," 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017, pp. 653-657.