This Article 
 Bibliographic References 
 Add to: 
Balancing Privacy and Utility in Cross-Company Defect Prediction
Aug. 2013 (vol. 39 no. 8)
pp. 1054-1068
Fayola Peters, West Virginia University, Morgantown
Tim Menzies, West Virginia University, Morgantown
Liang Gong, Tsinghua University, Beijing
Hongyu Zhang, Tsinghua University, Beijing
Background: Cross-company defect prediction (CCDP) is a field of study where an organization lacking enough local data can use data from other organizations for building defect predictors. To support CCDP, data must be shared. Such shared data must be privatized, but that privatization could severely damage the utility of the data. Aim: To enable effective defect prediction from shared data while preserving privacy. Method: We explore privatization algorithms that maintain class boundaries in a dataset. CLIFF is an instance pruner that deletes irrelevant examples. MORPH is a data mutator that moves the data a random distance, taking care not to cross class boundaries. CLIFF+MORPH are tested in a CCDP study among 10 defect datasets from the PROMISE data repository. Results: We find: 1) The CLIFFed+MORPHed algorithms provide more privacy than the state-of-the-art privacy algorithms; 2) in terms of utility measured by defect prediction, we find that CLIFF+MORPH performs significantly better. Conclusions: For the OO defect data studied here, data can be privatized and shared without a significant degradation in utility. To the best of our knowledge, this is the first published result where privatization does not compromise defect prediction.
Index Terms:
Testing,Software,Genetic algorithms,Sociology,Statistics,Search problems,Arrays,defect prediction,Privacy,classification
Fayola Peters, Tim Menzies, Liang Gong, Hongyu Zhang, "Balancing Privacy and Utility in Cross-Company Defect Prediction," IEEE Transactions on Software Engineering, vol. 39, no. 8, pp. 1054-1068, Aug. 2013, doi:10.1109/TSE.2013.6
Usage of this product signifies your acceptance of the Terms of Use.