Issue No.08 - August (2007 vol.19)
Charu C. Aggarwal , IEEE
High-dimensional data is a difficult case for most subspace-based classification methods because of the large number of combinations of dimensions, which have discriminatory power. This is because there are an exponential number of combinations of dimensions that could decide the correct class instance, and this combination could vary with data locality and test instance. Therefore, most summarized models such as decision trees and rule-based systems only aim to have a global summary of the data, which is used for classification. Because of this incompleteness, a particular classification model may be more or less suited to individual test instances. Furthermore, it may not provide sufficient insight into the most representative characteristics of a particular test instance. This is undesirable for many classification applications in which the diagnostic reasoning behind the classification of a test instance is as important as the classification process itself. In an interactive application, a user may find it more valuable to develop a diagnostic decision support method, which can reveal significant classification behaviors of exemplar records. Such an approach has the additional advantage of being able to optimize the decision process for the individual record in order to design more effective classification methods. In this paper, we propose the Subspace Decision Path (SD-Path) method, which provides the user with the ability to interactively explore a small number of nodes of a hierarchical decision process so that the most significant classification characteristics for a given test instance are revealed. In addition, the SD-Path method can provide enormous interpretability by constructing views of the data in which the different classes are clearly separated out. Even in difficult cases where the classification behavior of the test instance is ambiguous, the SD-Path method provides a diagnostic understanding of the characteristics, which results in this ambiguity. Therefore, this method combines the abilities of the human and the computer in creating an effective diagnostic tool for instance-centered high-dimensional classification.
Classification, visual data mining, interactive exploration.
Charu C. Aggarwal, "Toward Exploratory Test-Instance-Centered Diagnosis in High-Dimensional Classification", IEEE Transactions on Knowledge & Data Engineering, vol.19, no. 8, pp. 1001-1015, August 2007, doi:10.1109/TKDE.2007.1034