The Community for Technology Leaders
2016 IEEE Pacific Visualization Symposium (PacificVis) (2016)
Taipei, Taiwan
April 19, 2016 to April 22, 2016
ISSN: 2165-8773
ISBN: 978-1-5090-1451-4
pp: 136-143
Chong Zhang , University of North Carolina at Charlotte
Jing Yang , University of North Carolina at Charlotte
F. Benjamin Zhan , Texas State University
Xi Gong , Texas State University
Jean D. Brender , Texas A&M Health Science Center
Peter H. Langlois , Texas Department of State Health Services
Scott Barlowe , Western Carolina University
Ye Zhao , Kent State University
ABSTRACT
In the domain of epidemiology, logistic regression modeling is widely used to explain the relationships among explanatory variables and dichotomous outcome variables. However, logistic regression modeling faces challenges such as overfitting, confounding, and multicollinearity when there is a large number of explanatory variables. For example, in the birth defect study presented in this paper, variable selection for building high quality models to identify risk factors from hundreds of pollutant variables is difficult. To address this problem, we propose a novel visual analytics approach to logistic regression modeling for high-dimensional datasets. It leverages the traditional modeling pipeline by providing (1) intuitive visualizations for inspecting statistical indicators and the relationships among the variables and (2) a seamless, effective dimension reduction pipeline for selecting variables for inclusion in high quality logistic regression models. A fully working prototype of this approach has been developed and successfully applied to the birth defect study, which illustrates its effectiveness and efficiency. Its application in an insurance policy study and feedback from domain experts further demonstrate its usefulness.
INDEX TERMS
Visual Analytics, High-dimensional, Logistic Regression, Dimension Reduction
CITATION
Chong Zhang, Jing Yang, F. Benjamin Zhan, Xi Gong, Jean D. Brender, Peter H. Langlois, Scott Barlowe, Ye Zhao, "A visual analytics approach to high-dimensional logistic regression modeling and its application to an environmental health study", 2016 IEEE Pacific Visualization Symposium (PacificVis), vol. 00, no. , pp. 136-143, 2016, doi:10.1109/PACIFICVIS.2016.7465261
160 ms
(Ver 3.3 (11022016))