Removing Confounding Factors via Constraint Based Clustering: An Application to Finding Homogeneous Groups of Multiple Sclerosis Patients
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICHI.2013.75
Jingjing Liu , Dept. of Comput. Sci., Tufts Univ., Medford, MA, USA
Carla E. Brodley , Dept. of Comput. Sci., Tufts Univ., Medford, MA, USA
Brian C. Healy , Biostat. Center, Massachusetts Gen. Hosp., Boston, MA, USA
Tanuja Chitnis , Partners MS Center, Brigham & Women's Hosp., Brookline, MA, USA
Confounding factors in unsupervised data can lead to undesirable clustering results. For example in medical datasets, age is often a confounding factor in tests designed to judge the severity of a patient's disease through measures of mobility, eyesight and hearing. In such cases, removing age from each instance will not remove its effect from the data as other features will be correlated with age. We present a method based on constraint-based clustering to remove the impact of such confounding factors. Motivated by the need to find homogeneous groups of multiple sclerosis patients, we apply our approach to remove physician subjectivity from patient data. The result is a promising novel grouping of patients that can help uncover the factors that impact disease progression in MS.