Sistemas Colaborativos, Simpósio Brasilerio de (2012)
Sao Paulo, Brazil Brazil
Oct. 15, 2012 to Oct. 18, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SBSC.2012.27
Developers of distributed open source projects use management and issues tracking tool to communicate. These tools provide a large volume of unstructured information that makes the triage of issues difficult, increasing developers' overhead. This problem is common to online communities based on volunteer participation. This paper shows the importance of the content of comments in an open source project to build a classifier to predict the participation for a developer in an issue. To design this prediction model, we used two machine learning algorithms called Naive Bayes and J48. We used the data of three Apache Hadoop subprojects to evaluate the use of the algorithms. By applying our approach to the most active developers of these subprojects we have achieved an accuracy ranging from 79% to 96%. The results indicate that the content of comments in issues of open source projects is a relevant factor to build a classifier of issues for developers.
machine learning, Content analysis, prediction model, issue tracking classifier
Andre Luis Schwerz, Rafael Liberato, Igor Scaliante Wiese, Igor Steinmacher, Marco Aurelio Gerosa, Joao Eduardo Ferreira, "Prediction of Developer Participation in Issues of Open Source Projects", Sistemas Colaborativos, Simpósio Brasilerio de, vol. 00, no. , pp. 109-114, 2012, doi:10.1109/SBSC.2012.27