This Article 
 Bibliographic References 
 Add to: 
Provenance in High-Energy Physics Workflows
May/June 2008 (vol. 10 no. 3)
pp. 22-29
Andrew Dolgert, Cornell University
Lawrence Gibbons, Cornell University
Christopher D. Jones, Cornell University
Valentin Kuznetsov, Cornell University
Mirek Riedewald, Cornell University
Daniel Riley, Cornell University
Gregory J. Sharp, Cornell University
Peter Wittich, Cornell University
The adoption of large-scale distributed computing for high-energy physics presents new opportunities and challenges for physicists analyzing the data from the Large Hadron Collider experiments. With petabytes of data to manage, accessed by thousands of systems and used by thousands of collaborators, effective provenance is critical to the understanding of how the physics results were produced. In this article, the authors discuss several uses of data provenance in high-energy physics workflows and the opportunities for improvements in data analysis workflows that result from decentralized provenance collection and fine-grained object annotations.

1. A. Arbree et al., "Virtual Data in CMS Production," Proc. Int'l Conf. Computing in High-Energy Physics and Nuclear Physics (CHEP 03), CERN, 2003;
2. C.D. Jones et al., "EventStore: Managing Event Versioning and Data Partitioning Using Legacy Data Formats," Proc. Int'l Conf. Computing in High-Energy Physics and Nuclear Physics (CHEP 04), CERN, 2004; .
3. A. Afaq et al., "The CMS Dataset Bookkeeping Service," Proc. Int'l Conf. Computing in High-Energy Physics and Nuclear Physics (CHEP 07), CERN, 2007; .
4. Y. Simmhan, B. Plale, and D. Gannon, "A Survey of Data Provenance in e-Science," SIGMOD Record, vol. 34, no. 3, 2005, pp. 31–36.
5. C.D. Jones et al., "The New CMS Event Data Model and Framework," Proc. Int'l Conf. Computing in High-Energy Physics (CHEP 06), CERN, 2006; .
6. I. Altintas et al., "Kepler: An Extensible System for Design and Execution of Scientific Workflows," Proc. 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM 04), IEEE CS Press, 2004, pp. 423–424.
7. S.P. Callahan et al., "VisTrails: Visualization Meets Data Management," Proc. ACM SIGMOD, ACM Press, 2006, pp. 745–747.
8. J. Freire et al., "Managing Rapidly-Evolving Scientific Workflows," Proc. Int'l Provenance and Annotation Workshop, LNCS 4145, Springer, 2006, pp. 10–18.

Index Terms:
Provenance, physics, scientific databases, metadata
Andrew Dolgert, Lawrence Gibbons, Christopher D. Jones, Valentin Kuznetsov, Mirek Riedewald, Daniel Riley, Gregory J. Sharp, Peter Wittich, "Provenance in High-Energy Physics Workflows," Computing in Science and Engineering, vol. 10, no. 3, pp. 22-29, May-June 2008, doi:10.1109/MCSE.2008.81
Usage of this product signifies your acceptance of the Terms of Use.