Issue No.01 - January/February (2009 vol.11)
Roger D. Peng , Johns Hopkins Bloomberg School of Public Health
Sandrah P. Eckel , Johns Hopkins Bloomberg School of Public Health
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2009.6
The ability to make scientific findings reproducible is increasingly important in areas where substantive results are the product of complex statistical computations. Reproducibility can allow others to verify the published findings and conduct alternate analyses of the same data. A question that arises naturally is how to conduct and distribute reproducible research. The authors describe a simple framework in which reproducible research can be conducted and distributed via cached computations and tools for both authors and readers. As a prototype implementation they also describe a software package written in the R language. The "cacher" package provides tools for caching computational results in a key-value style database, which can be published to a public repository for readers to download. As a case study, they demonstrate the use of the package on a study of ambient air pollution exposure and mortality in the US.
Reproducible research, database, software
Roger D. Peng, Sandrah P. Eckel, "Distributed Reproducible Research Using Cached Computations", Computing in Science & Engineering, vol.11, no. 1, pp. 28-34, January/February 2009, doi:10.1109/MCSE.2009.6