2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011) (2011)
Lawrence, KS, USA
Nov. 6, 2011 to Nov. 10, 2011
Daryl Posnett , Department of Computer Science, University of California, Davis, USA
Vladimir Filkov , Department of Computer Science, University of California, Davis, USA
Premkumar Devanbu , Department of Computer Science, University of California, Davis, USA
Software systems are decomposed hierarchically, for example, into modules, packages and files. This hierarchical decomposition has a profound influence on evolvability, maintainability and work assignment. Hierarchical decomposition is thus clearly of central concern for empirical software engineering researchers; but it also poses a quandary. At what level do we study phenomena, such as quality, distribution, collaboration and productivity? At the level of files? packages? or modules? How does the level of study affect the truth, meaning, and relevance of the findings? In other fields it has been found that choosing the wrong level might lead to misleading or fallacious results. Choosing a proper level, for study, is thus vitally important for empirical software engineering research; but this issue hasn't thus far been explicitly investigated. We describe the related idea of ecological inference and ecological fallacy from sociology and epidemiology, and explore its relevance to empirical software engineering; we also present some case studies, using defect and process data from 18 open source projects to illustrate the risks of modeling at an aggregation level in the context of defect prediction, as well as in hypothesis testing.
P. Devanbu, D. Posnett and V. Filkov, "Ecological inference in empirical software engineering," 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011)(ASE), Lawrence, KS, USA, 2011, pp. 362-371.