Comparison between Classifier's Accuracies Based on Different Outlier Methods Generated by Frequent and Infrequent Categorical Data
2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA) (2016)
March 23, 2016 to March 25, 2016
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WAINA.2016.59
Outlier analysis is an essential task in data science to find out inconsistencies in data, to build a good classifier and in decision making. Finding outliers from categorical data is a tough task. In this work, a comparative study is made between classifier accuracies which are built by different outlier analysis methods generated by frequent and infrequent itemsets from categorical data. In modeling a classifier for categorical data, high frequent records are most useful and the infrequent records are obstacles in modeling the classifiers. The experiments are done on Bank dataset and Nursery dataset, taken from UCI ML Repository to compare the available methods with the proposed method. For normally distributed OFI, the number of outliers to be eliminated need not be given as input since it generates the number of outliers automatically. However the threshold value is needed to be given to generate infrequent item sets for NOFI.
Reliability, Data models, Time complexity, Itemsets, Classification algorithms, Entropy
B. R. Babu and L. S. D., "Comparison between Classifier's Accuracies Based on Different Outlier Methods Generated by Frequent and Infrequent Categorical Data," 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), Crans-Montana, Switzerland, 2016, pp. 18-23.