The Community for Technology Leaders
Green Image
Issue No. 08 - Aug. (2013 vol. 39)
ISSN: 0098-5589
pp: 1054-1068
Hongyu Zhang , Tsinghua University, Beijing
Liang Gong , Tsinghua University, Beijing
Fayola Peters , West Virginia University, Morgantown
Tim Menzies , West Virginia University, Morgantown
ABSTRACT
Background: Cross-company defect prediction (CCDP) is a field of study where an organization lacking enough local data can use data from other organizations for building defect predictors. To support CCDP, data must be shared. Such shared data must be privatized, but that privatization could severely damage the utility of the data. Aim: To enable effective defect prediction from shared data while preserving privacy. Method: We explore privatization algorithms that maintain class boundaries in a dataset. CLIFF is an instance pruner that deletes irrelevant examples. MORPH is a data mutator that moves the data a random distance, taking care not to cross class boundaries. CLIFF+MORPH are tested in a CCDP study among 10 defect datasets from the PROMISE data repository. Results: We find: 1) The CLIFFed+MORPHed algorithms provide more privacy than the state-of-the-art privacy algorithms; 2) in terms of utility measured by defect prediction, we find that CLIFF+MORPH performs significantly better. Conclusions: For the OO defect data studied here, data can be privatized and shared without a significant degradation in utility. To the best of our knowledge, this is the first published result where privatization does not compromise defect prediction.
INDEX TERMS
Testing, Software, Genetic algorithms, Sociology, Statistics, Search problems, Arrays, defect prediction, Privacy, classification
CITATION
Hongyu Zhang, Liang Gong, Fayola Peters, Tim Menzies, "Balancing Privacy and Utility in Cross-Company Defect Prediction", IEEE Transactions on Software Engineering, vol. 39, no. , pp. 1054-1068, Aug. 2013, doi:10.1109/TSE.2013.6
100 ms
(Ver )