The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2010 vol.22)
pp: 305-317
Neil Mac Parthaláin , Aberystwyth University, Wales
Qiang Shen , Aberystwyth University, Wales
Richard Jensen , Aberystwyth University, Wales
ABSTRACT
Feature Selection (FS) or Attribute Reduction techniques are employed for dimensionality reduction and aim to select a subset of the original features of a data set which are rich in the most useful information. The benefits of employing FS techniques include improved data visualization and transparency, a reduction in training and utilization times and potentially, improved prediction performance. Many approaches based on rough set theory up to now, have employed the dependency function, which is based on lower approximations as an evaluation step in the FS process. However, by examining only that information which is considered to be certain and ignoring the boundary region, or region of uncertainty, much useful information is lost. This paper examines a rough set FS technique which uses the information gathered from both the lower approximation dependency value and a distance metric which considers the number of objects in the boundary region and the distance of those objects from the lower approximation. The use of this measure in rough set feature selection can result in smaller subset sizes than those obtained using the dependency function alone. This demonstrates that there is much valuable information to be extracted from the boundary region. Experimental results are presented for both crisp and real-valued data and compared with two other FS techniques in terms of subset size, runtimes, and classification accuracy.
INDEX TERMS
Rough sets, fuzzy sets, attribute reduction, boundary region, classification.
CITATION
Neil Mac Parthaláin, Qiang Shen, Richard Jensen, "A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 3, pp. 305-317, March 2010, doi:10.1109/TKDE.2009.119
REFERENCES
[1] C. Armanino, R. Leardi, S. Lanteri, and G. Modi, “Chemometric Analysis of Tuscan Olive Oils,” Chemometrics and Intelligent Laboratory Systems, vol. 5, no. 4, pp. 343-354, Apr. 1989.
[2] A. Chouchoulas and Q. Shen, “Rough Set-Aided Keyword Reduction for Text Categorisation,” Applied Artificial Intelligence, vol. 15, no. 9, pp. 843-873, 2001.
[3] W.W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int'l Conf. Machine Learning, pp. 115-123, 1995.
[4] J.S. Deogun, V.V. Raghavan, and H. Sever, “Exploiting Upper Approximation in the Rough Set Methodology,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 1-10, 1995.
[5] D. Dubois and H. Prade, “Putting Rough Sets and Fuzzy Sets Together,” Intelligent Decision Support, pp. 203-232, Kluwer Academic Publishers, 1992.
[6] Rough-Fuzzy Hybridization: A New Trend in Decision Making, S.K. Pal and A. Skowron, eds. Springer Verlag, 1999.
[7] P. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice Hall, 1982.
[8] A. Hassanien, “Rough Set Approach for Attribute Reduction and Rule Generation: A Case of Patients with Suspected Breast Cancer,” J. Am. Soc. Information Science and Technology, vol. 55, no. 11, pp. 954-962, 2004.
[9] A. Hedar, J. Wang, and M. Fukushima, “Tabu Search for Attribute Reduction in Rough Set Theory,” Technical Report 2006-008, Dept. of Applied Mathematics and Physics, Kyoto Univ., 2006.
[10] M. Inuiguchi and T. Tanino, New Fuzzy-Rough Sets Based on Certainty Qualification, Rough-Neural Computing: Techniques for Computing with Words, S.K. Pal, L. Polkowski, and A. Skowron, eds. Springer-Verlag, 2003.
[11] M. Inuiguchi and M. Tsurumi, “Measures Based on Upper Approximations of Rough Sets for Analysis of Attribute Importance and Interaction,” Int'l J. Innovative Computing, Information and Control, vol. 2, no. 1, pp. 1-12, 2006.
[12] R. Jensen and Q. Shen, “Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 12, pp. 1457-1471, Dec. 2004.
[13] H.R. Li and W.X. Zhang, “Applying Indiscernibility Attribute Sets to Knowledge Reduction,” Lecture Notes in Artificial Intelligence, pp. 816-821, Springer, 2005.
[14] K. Li, Y. Liu, “Rough Set Based Attribute Reduction Approach in Data Mining,” Proc. 2002 Int'l Conf. Machine Learning and Cybernetics, vol. 1, pp. 60-63, 2002.
[15] N. Mac Parthaláin, R. Jensen, and Q. Shen, “Fuzzy Entropy-Assisted Fuzzy-Rough Feature Selection,” Proc. 15th Int'l Conf. Fuzzy Systems (FUZZ-IEEE '06) 2006.
[16] N. Mac Parthaláin, R. Jensen, and Q. Shen, “Distance Measure Assisted Rough Set Feature Selection,” Proc. 16th Int'l Conf. Fuzzy Systems (FUZZ-IEEE '07), pp. 1084-1089, 2007.
[17] N. Mac Parthaláin and Q. Shen, “Exploring the Boundary Region of Tolerance Rough Sets for Feature Selection,” Pattern Recognition, vol. 42, pp. 655-667, , May 2009.
[18] M. Modrzejewski, “Feature Selection Using Rough Sets Theory,” Proc. European Conf. Machine Learning, P.B. Brazdil, ed., pp. 213-226, 1993.
[19] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, http://www.sciencedirect.com/science/article/ B6V14-4TDC09M-1/2/9735ab90392246f032a2632 eda77ae0ehttp:/ /www.ics. uci.edu/ mlearnMLRepository.html, 1998.
[20] R. Nie and J. Yue, “An Attribute Reduction Method Based on Rough Set and SVM and with Application in Oil-Gas Prediction,” Proc. Sixth IEEE/ACIS Int'l Conf. Computer and Information Science (ICIS '07), pp. 502-506, 2007.
[21] S.H. Nguyen and A. Skowron, “Searching for Relational Patterns in Data,” Proc. First European Symp. Principles of Data Mining and Knowledge Discovery, pp. 265-276, 1997.
[22] S. Piramuthu, “The Hausdorff Distance Measure for Feature Selection in Learning Applications,” Proc. 32nd Ann. Hawaii Int'l Conf. System Sciences, vol. 6, 1999.
[23] S.K. Pal and P. Mitra, “Case Generation Using Rough Sets with Fuzzy Representation,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 3, pp. 292-300, Mar. 2004.
[24] Z. Pawlak, “Rough Sets,” Int'l J. Computer and Information Science, vol. 11, pp. 341-356, 1982.
[25] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
[26] W. Rucklidge, Efficient Visual Recognition Using the Hausdorff Distance. Springer, 1996.
[27] B. Sendov, “Hausdorff Distance and Image Processing,” Russian Math Surveys, vol. 59, no. 2, pp. 319-328, 2004.
[28] Q. Shen and R. Jensen, “Selecting Informative Features with Fuzzy-Rough Sets and Its Application for Complex Systems Monitoring,” Pattern Recognition, vol. 37, no. 7, pp. 1351-1363, 2004.
[29] A. Skowron and J. Stepaniuk, “Tolerance Approximation Spaces,” Fundamenta Informaticae, vol. 27, pp. 245-253, 1996.
[30] D. Slezak, “Various Approaches to Reasoning with Frequency Based Decision Reducts: A Survey,” Rough Set Methods and Applications, L. Polkowski, S. Tsumoto, T.Y. Lin, eds., pp 235-285, Physica-Verlag, 2000.
[31] Intelligent Decision Support, R. Slowinski, ed. Kluwer Academic Publishers, 1992.
[32] E.P.M. de Sousa, C. Traina, A.J.M. Traina, L. Wu, and C. Faloutsos, “A Fast and Effective Method to Find Correlations among Attributes in Databases,” Data Mining and Knowledge Discovery, vol. 14, pp. 367-407, 2007.
[33] R.W. Swiniarski and A. Skowron, “Rough Set Methods in Feature Selection and Recognition,” Pattern Recognition Letters, vol. 24, no. 6, pp. 833-849, 2003.
[34] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, 2000.
[35] I.H. Witten and E. Frank, “Generating Accurate Rule Sets without Global Optimization,” Proc. 15th Int'l Conf. Machine Learning, 1998.
[36] Y. Yao, “A Comparative Study of Fuzzy Sets and Rough Sets,” Information Sciences, vol. 109, pp. 21-47, 1998.
[37] N. Zhong, J. Dong, and S. Ohsuga, “Using Rough Sets with Heuristics for Feature Selection,” J. Intelligent Information Systems, vol. 16, no. 3, pp. 199-214, 2001.
[38] W. Ziarko, “Variable Precision Rough Set Model,” J. Computer and System Sciences, vol. 46, no. 1, pp. 39-59, 1993.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool