This Article 
 Bibliographic References 
 Add to: 
Analyzing Outliers Cautiously
March/April 2002 (vol. 14 no. 2)
pp. 432-437

Outliers are difficult to handle because some of them can be measurement errors, while others may represent phenomena of interest, something significant from the viewpoint of the application domain. Statistical and computational methods have been proposed to detect outliers, but further analysis of outliers requires much relevant domain knowledge. In our previous work, we suggested a knowledge-based method for distinguishing between the measurement errors and phenomena of interest by modeling real measurements —how measurements should be distributed in an application domain. In this paper, we make this distinction by modeling measurement errors instead. This is a cautious approach to outlier analysis, which has been successfully applied to a medical problem and may find interesting applications in other domains such as science, engineering, finance, and economics.

[1] V. Barnet, “The Ordering of Multivariate Data (with Discussion),” J. Royal Statististical Society A, vol. 139, pp. 318-54, 1976.
[2] V. Barnet and T. Lewis, Outliers in Statistical Data. Wiley, 1994.
[3] C. Brodley and M. Friedl, “Identifying and Eliminating Mislabeled Training Instances,” Proc. 13th Nat'l Conf. Artificial Intelligence (AAAI-96), pp. 799-805, 1996.
[4] K. Carling, “Resistant Outlier Rules and the Non-Gaussian Case,” Computational Statistics and Data Analysis, vol. 33, no. 3, pp. 249-258, 2000.
[5] P.R. Cohen, Empirical Methods for Artificial Intelligence. Cambridge, Mass.: MIT Press, 1995.
[6] D. Collet and T. Lewis, “The Subjective Nature of Outlier Rejection Procedures,” Applied Statistics, vol. 25, pp. 228-237, 1976.
[7] R. Gnanadesikan and J.R. Kettenring, “Robust Estimates, Residuals and Outlier Detection with Multi-Response Data,” Biometrics, vol. 28, pp. 81-124, 1972.
[8] D.J. Hand, Construction and Assessment of Classification Rules. Wiley, 1997.
[9] J. Hanely and B. McNeil, “The Meaning and Use of the Area under a Receiver Operator Curve,” Radiology, vol. 143, pp. 29-36, 1982.
[10] D.M. Hawkins, Identification of Outliers. London: Chapman and Hall, 1980.
[11] P.J. Huber, Robust Statistics. Wiley, 1981.
[12] B. Kleiner and J. Hartigan, “Representing Points in Many Dimensions by Trees and Castles (with Discussion),” J. Am. Statistical Assoc., vol. 76, pp. 260-276, 1981.
[13] E. Knorr and R. Ng, “A Unified Notion of Outliers: Properties and Computation,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD-97), pp. 219-222, 1997.
[14] T. Kohonen, "Self-Organization and Associated Memory," Berlin Heidelberg. New York: Springer-Verlag, 1988.
[15] X. Liu, G. Cheng, and J.X. Wu, “Identifying the Measurement Noise in Glaucomatous Testing: An Artificial Neural Network Approach,” Artificial Intelligence in Medicine, vol. 6, pp. 401-416, 1994.
[16] X. Liu, G. Cheng, and J.X. Wu, “Noise and Uncertainty Management in Intelligent Data Modeling,” Proc. 12th Nat'l Conf. Artificial Intelligence (AAAI-94), pp. 263-268, 1994.
[17] N. Matic, I. Guyon, L. Bottou, J. Denker, and V. Vapnik, “Computer Aided Cleaning of Large Databases for Character Recognition,” Proc. 11th Int'l Conf. Pattern Recognition, pp. 330-333, 1992.
[18] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[19] S.M. Weiss and C.A. Kulikowski, Computer Systems that Learn. Morgan Kaufmann, 1995.
[20] J.X. Wu, “Visual Screening for Blinding Diseases in the Community Using Computer Controlled Video Perimetry,” PhD thesis, Univ. of London, 1993.

Index Terms:
Outliers, domain knowledge, AI modeling, self-organizing maps
X. Liu, G. Cheng, J.X. Wu, "Analyzing Outliers Cautiously," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2, pp. 432-437, March-April 2002, doi:10.1109/69.991726
Usage of this product signifies your acceptance of the Terms of Use.