2012 Eighth International Conference on the Quality of Information and Communications Technology (2012)
Lisbon, TBD, Portugal Portugal
Sept. 3, 2012 to Sept. 6, 2012
A number of V&V datasets are publicly available. These datasets have software measurements and defectiveness information regarding the software modules. To facilitate V&V, numerous defect prediction studies have used these datasets and have detected defective modules effectively. Software developers and managers can benefit from the existing studies to avoid analogous defects and mistakes if they are able to find similarity between their software and the software represented by the public datasets. This paper identifies the similar datasets by comparing association patterns in the datasets. The proposed approach finds association rules from each dataset and identifies the overlapping rules from the 100 strongest rules from each of the two datasets being compared. Afterwards, average support and average confidence of the overlap is calculated to determine the strength of the similarity between the datasets. This study compares eight public datasets and results show that KC2 and PC2 have the highest similarity 83% with 97% support and 100% confidence. Datasets with similar attributes and almost same number of attributes have shown higher similarity than the other datasets.
association rules, defect prediction, dataset similarity, software measures
Saba Anwar, Zeeshan Ali Rana, Shafay Shamail, Mian M. Awais, "Using Association Rules to Identify Similarities between Software Datasets", 2012 Eighth International Conference on the Quality of Information and Communications Technology, vol. 00, no. , pp. 114-119, 2012, doi:10.1109/QUATIC.2012.66