
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Hui Xiong, Gaurav Pandey, Michael Steinbach, Vipin Kumar, "Enhancing Data Analysis with Noise Removal," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp. 304319, March, 2006.  
BibTex  x  
@article{ 10.1109/TKDE.2006.46, author = {Hui Xiong and Gaurav Pandey and Michael Steinbach and Vipin Kumar}, title = {Enhancing Data Analysis with Noise Removal}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {18}, number = {3}, issn = {10414347}, year = {2006}, pages = {304319}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2006.46}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Enhancing Data Analysis with Noise Removal IS  3 SN  10414347 SP304 EP319 EPD  304319 A1  Hui Xiong, A1  Gaurav Pandey, A1  Michael Steinbach, A1  Vipin Kumar, PY  2006 KW  Index Terms Data cleaning KW  very noisy data KW  hyperclique pattern discovery KW  local outlier factor (LOF) KW  noise removal. VL  18 JA  IEEE Transactions on Knowledge and Data Engineering ER   
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD, 1993.
[2] F. Angiulli and C. Pizzuti, “Fast Outlier Detection in HighDimensional Spaces,” Proc. Sixth European Conf. Principles of Data Mining and Knowledge Discovery, 2002.
[3] S.D. Bay and M. Schwabacher, “Mining DistanceBased Outliers in Near Linear Time with Randomization and a Simple Pruning Rule,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 2938, 2003.
[4] M.M. Breunig, H.P. Kriegel, R.T. Ng, and J. Sander, “LOF: Identifing Density Based Local Outliers,” Proc. 2000 ACM SIGMOD Int'l Conf. Management of Data, 2000.
[5] C.E. Brodley and M.A. Friedl, “Identifying Mislabeled Training Data,” J. Artificial Intelligence Research, vol. 11, pp. 131167, 1999.
[6] www.dictionary.com, 2005.
[7] M.B. Eisen, P.T. Spellman, P.O. Browndagger, and D. Botstein, “Cluster Analysis and Display of GenomeWide Expression Patterns,” Proc. Nat'l Academy of Sciences of the United States of Am. (PNAS), vol. 95, no. 25, 1998.
[8] L. Ertöz, M. Steinbach, and V. Kumar, “Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data,” Proc. Third SIAM Int'l Conf. Data Mining, May 2003.
[9] M. Ester, H.P. Kriegel, J. Sander, and X. Xu, “A DensityBased Algorithm for Discoverying Clusters in Large Spatial Databases with Noise,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, 1996.
[10] A. Gavin et al., “Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes,” Nature, 415, pp. 141147, 2002.
[11] V. Gaede and O. Günther, “Multidimensional Access Methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170231, 1998.
[12] H. Galhardas, D. Florescu, D. Shasha, and E. Simon, “Ajax: An Extensible Data Cleaning Tool,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
[13] H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita, “Declarative Data Cleaning: Language, Model, and Algorithms,” Proc. 2001 Very Large Data Bases (VLDB) Conf., 2001.
[14] S. Guha, R. Rastogi, and K. Shim, “Cure: An Efficient Clustering Algorithm for Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 7384, June 1998.
[15] E.H. Han, D. Boley, M. Gini, R. Gross, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore, “Webace: A Web Agent for Document Categorization and Exploration,” Proc. Second Int'l Conf. Autonomous Agents, 1998.
[16] M. Hernandez and S. Stolfo, “The Merge/Purge Problem for Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 127138, May 1995.
[17] M.A. Hernandez and S.J. Stolfo, “RealWorld Data Is Dirty: Data Cleansing and the Merge/Purge Problem,” Data Mining and Knowldge Discovery, vol. 2, pp. 937, 1998.
[18] V.J. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artificial Intelligence Rev., vol. 22, pp. 85126, 2004.
[19] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall Advanced Reference Series, Englewood Cliffs, N.J.: Prentice Hall, Mar. 1988, http://www.cse.msu. edu/~jainClustering_ Jain_Dubes.pdf .
[20] G. Karypis, “Cluto: Software for Clustering High Dimensional Data Sets,” www.cs.umn.edu~karypis, 2005.
[21] E.M. Knorr, R.T. Ng, and V. Tucakov, “DistanceBased Outliers: Algorithms and Applications,” VLDB J.: Very Large Databases, vol. 8, pp. 237253, 2000.
[22] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 12, pp. 273324, 1997.
[23] B. Larsen and C. Aone, “Fast and Effective Text Mining Using LinearTime Document Clustering,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 1622, 1999.
[24] M.L. Lee, T.W. Ling, and W.L. Low, “Intelliclean: A KnowledgeBased Intelligent Data Cleaner,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2000.
[25] D. Lewis, “Reuters21578 Text Categorization Text Collection 1.0,” http://www.research.att.comlewis, 1997.
[26] Infoshare Limited, “Best Value Guide to Data Standardization,” InfoDB, July 1998, http:/www.infoshare.ltd.uk.
[27] A.E. Monge and C.P. Elkan, “An Efficient DomainIndependent Algorithm for Detecting Approximately Duplicate Database Records,” Proc. ACMSIGMOD Workshop Research Issues on Knowledge Discovery and Data Mining, 1997.
[28] K. Orr, “Data Quality and Systems Theory,” Comm. ACM, vol. 41, pp. 6671, 1998.
[29] M.F. Porter, “An Algorithm for Suffix Stripping,” Program, vol. 14, no. 3, 1980.
[30] L. Portnoy, E. Eskin, and S.J. Stolfo, “Intrusion Detection with Unlabeled Data Using Clustering,” Proc. ACM CSS Workshop Data Mining Applied to Security (DMSA2001), 2001.
[31] S. Ramaswamy, R. Rastogi, and S. Kyuseok, “Efficient Algorithms for Mining Outliers from Large Data Sets,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
[32] T. Redman, “The Impact of Poor Data Quality on the Typical Enterprise,” Comm. ACM, vol. 41, pp. 7982, 1998.
[33] C.J. Van Rijsbergen, Information Retrieval, second ed. London: Butterworths, 1979.
[34] J. Sander, M. Ester, H.P. Kriegel, and X. Xu, “DensityBased Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169194, 1998.
[35] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “Wavecluster: A MultiResolution Clustering Approach for Very Large Spatial Databases,” Proc. Int'l Conf. Very Large Databases, 1998.
[36] P.N. Tan, V. Kumar, and J. Srivastava, “Selecting the Right Objective Measure for Association Analysis,” Information Systems, vol. 29, no. 4, pp. 293313, 2004.
[37] P.N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Pearson AddisonWesley, 2005.
[38] TREC, Text Retrieval Conference, http:/trec.nist.gov, 2005.
[39] H. Xiong, P.N. Tan, and V. Kumar, “Mining Hyperclique Patterns with Confidence Pruning,” Technical Report 03006, Dept. of Computer Science, Univ. of MinnesotaTwin Cities, Jan. 2003.
[40] H. Xiong, P.N. Tan, and V. Kumar, “Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution,” Proc. Third IEEE Int'l Conf. Data Mining, pp. 387394, 2003.
[41] Y. Yang, “Noise Reduction in a Statistical Approach to Text Categorization,” Proc. 18th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 256263, 1995.
[42] L. Yi, B. Liu, and X. Li, “Eliminating Noisy Information in Web Pages for Data Mining,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery & Data Mining, pp. 296305, 2003.
[43] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an Efficient Data Clustering Method for Very Large Databases,” Proc. 1996 ACM SIGMOD Int'l Conf. Management of data, pp. 103114, 1996.