2013 IEEE 13th International Conference on Data Mining (2007)
Omaha, Nebraska, USA
Oct. 28, 2007 to Oct. 31, 2007
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2007.10
It is estimated that less than 10 percent of the world's species have been described, yet species are being lost daily due to human destruction of natural habitats. The job of describing the earth's remaining species is exacerbated by the shrinking number of practicing taxonomists and the very slow pace of traditional taxonomic research. In this article, we tackle, from a novelty detection perspective, one of the most important and challenging research objectives in taxonomy ? new species identification. We propose a unique and efficient novelty detection framework based on statistical depth functions. Statistical depth functions provide from the "deepest" point a "center-outward ordering" of multidimensional data. In this sense, they can detect observations that appear extreme relative to the rest of the observations, i.e., novelty. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. We propose a novel statistical depth, the kernelized spatial depth (KSD) that generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. Observations with depth values less than a threshold are declared as novel. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. We give an upper bound on the false alarm probability of a depth-based detector, which can be used to determine the threshold. Experimental study demonstrates its excellent potential in new species discovery.
Hanxiang Peng, Henry L. Bart Jr., Yixin Chen, Xin Dang, "Depth-Based Novelty Detection and Its Application to Taxonomic Research", 2013 IEEE 13th International Conference on Data Mining, vol. 00, no. , pp. 113-122, 2007, doi:10.1109/ICDM.2007.10