|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Hui Xiong, Gaurav Pandey, Michael Steinbach, Vipin Kumar, "Enhancing Data Analysis with Noise Removal," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp. 304-319, March, 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/TKDE.2006.46, author = {Hui Xiong and Gaurav Pandey and Michael Steinbach and Vipin Kumar}, title = {Enhancing Data Analysis with Noise Removal}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {18}, number = {3}, issn = {1041-4347}, year = {2006}, pages = {304-319}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2006.46}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - Enhancing Data Analysis with Noise Removal IS - 3 SN - 1041-4347 SP304 EP319 EPD - 304-319 A1 - Hui Xiong, A1 - Gaurav Pandey, A1 - Michael Steinbach, A1 - Vipin Kumar, PY - 2006 KW - Index Terms- Data cleaning KW - very noisy data KW - hyperclique pattern discovery KW - local outlier factor (LOF) KW - noise removal. VL - 18 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD, 1993.
[2] F. Angiulli and C. Pizzuti, “Fast Outlier Detection in High-Dimensional Spaces,” Proc. Sixth European Conf. Principles of Data Mining and Knowledge Discovery, 2002.
[3] S.D. Bay and M. Schwabacher, “Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 29-38, 2003.
[4] M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander, “LOF: Identifing Density Based Local Outliers,” Proc. 2000 ACM SIGMOD Int'l Conf. Management of Data, 2000.
[5] C.E. Brodley and M.A. Friedl, “Identifying Mislabeled Training Data,” J. Artificial Intelligence Research, vol. 11, pp. 131-167, 1999.
[6] www.dictionary.com, 2005.
[7] M.B. Eisen, P.T. Spellman, P.O. Browndagger, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Nat'l Academy of Sciences of the United States of Am. (PNAS), vol. 95, no. 25, 1998.
[8] L. Ertöz, M. Steinbach, and V. Kumar, “Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data,” Proc. Third SIAM Int'l Conf. Data Mining, May 2003.
[9] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discoverying Clusters in Large Spatial Databases with Noise,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, 1996.
[10] A. Gavin et al., “Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes,” Nature, 415, pp. 141-147, 2002.
[11] V. Gaede and O. Günther, “Multidimensional Access Methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170-231, 1998.
[12] H. Galhardas, D. Florescu, D. Shasha, and E. Simon, “Ajax: An Extensible Data Cleaning Tool,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
[13] H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita, “Declarative Data Cleaning: Language, Model, and Algorithms,” Proc. 2001 Very Large Data Bases (VLDB) Conf., 2001.
[14] S. Guha, R. Rastogi, and K. Shim, “Cure: An Efficient Clustering Algorithm for Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 73-84, June 1998.
[15] E.-H. Han, D. Boley, M. Gini, R. Gross, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore, “Webace: A Web Agent for Document Categorization and Exploration,” Proc. Second Int'l Conf. Autonomous Agents, 1998.
[16] M. Hernandez and S. Stolfo, “The Merge/Purge Problem for Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 127-138, May 1995.
[17] M.A. Hernandez and S.J. Stolfo, “Real-World Data Is Dirty: Data Cleansing and the Merge/Purge Problem,” Data Mining and Knowldge Discovery, vol. 2, pp. 9-37, 1998.
[18] V.J. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artificial Intelligence Rev., vol. 22, pp. 85-126, 2004.
[19] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall Advanced Reference Series, Englewood Cliffs, N.J.: Prentice Hall, Mar. 1988, http://www.cse.msu. edu/~jainClustering_ Jain_Dubes.pdf .
[20] G. Karypis, “Cluto: Software for Clustering High Dimensional Data Sets,” www.cs.umn.edu~karypis, 2005.
[21] E.M. Knorr, R.T. Ng, and V. Tucakov, “Distance-Based Outliers: Algorithms and Applications,” VLDB J.: Very Large Databases, vol. 8, pp. 237-253, 2000.
[22] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[23] B. Larsen and C. Aone, “Fast and Effective Text Mining Using Linear-Time Document Clustering,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 16-22, 1999.
[24] M.L. Lee, T.W. Ling, and W.L. Low, “Intelliclean: A Knowledge-Based Intelligent Data Cleaner,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2000.
[25] D. Lewis, “Reuters-21578 Text Categorization Text Collection 1.0,” http://www.research.att.comlewis, 1997.
[26] Infoshare Limited, “Best Value Guide to Data Standardization,” InfoDB, July 1998, http:/www.infoshare.ltd.uk.
[27] A.E. Monge and C.P. Elkan, “An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records,” Proc. ACM-SIGMOD Workshop Research Issues on Knowledge Discovery and Data Mining, 1997.
[28] K. Orr, “Data Quality and Systems Theory,” Comm. ACM, vol. 41, pp. 66-71, 1998.
[29] M.F. Porter, “An Algorithm for Suffix Stripping,” Program, vol. 14, no. 3, 1980.
[30] L. Portnoy, E. Eskin, and S.J. Stolfo, “Intrusion Detection with Unlabeled Data Using Clustering,” Proc. ACM CSS Workshop Data Mining Applied to Security (DMSA-2001), 2001.
[31] S. Ramaswamy, R. Rastogi, and S. Kyuseok, “Efficient Algorithms for Mining Outliers from Large Data Sets,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
[32] T. Redman, “The Impact of Poor Data Quality on the Typical Enterprise,” Comm. ACM, vol. 41, pp. 79-82, 1998.
[33] C.J. Van Rijsbergen, Information Retrieval, second ed. London: Butterworths, 1979.
[34] J. Sander, M. Ester, H.-P. Kriegel, and X. Xu, “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169-194, 1998.
[35] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “Wavecluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases,” Proc. Int'l Conf. Very Large Databases, 1998.
[36] P.-N. Tan, V. Kumar, and J. Srivastava, “Selecting the Right Objective Measure for Association Analysis,” Information Systems, vol. 29, no. 4, pp. 293-313, 2004.
[37] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Pearson Addison-Wesley, 2005.
[38] TREC, Text Retrieval Conference, http:/trec.nist.gov, 2005.
[39] H. Xiong, P.-N. Tan, and V. Kumar, “Mining Hyperclique Patterns with Confidence Pruning,” Technical Report 03-006, Dept. of Computer Science, Univ. of Minnesota-Twin Cities, Jan. 2003.
[40] H. Xiong, P.-N. Tan, and V. Kumar, “Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution,” Proc. Third IEEE Int'l Conf. Data Mining, pp. 387-394, 2003.
[41] Y. Yang, “Noise Reduction in a Statistical Approach to Text Categorization,” Proc. 18th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 256-263, 1995.
[42] L. Yi, B. Liu, and X. Li, “Eliminating Noisy Information in Web Pages for Data Mining,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery & Data Mining, pp. 296-305, 2003.
[43] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an Efficient Data Clustering Method for Very Large Databases,” Proc. 1996 ACM SIGMOD Int'l Conf. Management of data, pp. 103-114, 1996.

