The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - May/June (2008 vol.10)
pp: 22-29
Andrew Dolgert , Cornell University
Lawrence Gibbons , Cornell University
Christopher D. Jones , Cornell University
Valentin Kuznetsov , Cornell University
Mirek Riedewald , Cornell University
Daniel Riley , Cornell University
Gregory J. Sharp , Cornell University
Peter Wittich , Cornell University
ABSTRACT
The adoption of large-scale distributed computing for high-energy physics presents new opportunities and challenges for physicists analyzing the data from the Large Hadron Collider experiments. With petabytes of data to manage, accessed by thousands of systems and used by thousands of collaborators, effective provenance is critical to the understanding of how the physics results were produced. In this article, the authors discuss several uses of data provenance in high-energy physics workflows and the opportunities for improvements in data analysis workflows that result from decentralized provenance collection and fine-grained object annotations.
INDEX TERMS
Provenance, physics, scientific databases, metadata
CITATION
Andrew Dolgert, Lawrence Gibbons, Christopher D. Jones, Valentin Kuznetsov, Mirek Riedewald, Daniel Riley, Gregory J. Sharp, Peter Wittich, "Provenance in High-Energy Physics Workflows", Computing in Science & Engineering, vol.10, no. 3, pp. 22-29, May/June 2008, doi:10.1109/MCSE.2008.81
REFERENCES
1. A. Arbree et al., "Virtual Data in CMS Production," Proc. Int'l Conf. Computing in High-Energy Physics and Nuclear Physics (CHEP 03), CERN, 2003; http://arxiv.org/abs/cs.DC0306009.
2. C.D. Jones et al., "EventStore: Managing Event Versioning and Data Partitioning Using Legacy Data Formats," Proc. Int'l Conf. Computing in High-Energy Physics and Nuclear Physics (CHEP 04), CERN, 2004; http://indico.cern.chmaterialDisplay.py?contribId=199&sessionId=6&materialId=paper&confId=0 .
3. A. Afaq et al., "The CMS Dataset Bookkeeping Service," Proc. Int'l Conf. Computing in High-Energy Physics and Nuclear Physics (CHEP 07), CERN, 2007; http://indico.cern.chmaterialDisplay.py?contribId=325&sessionId=28&materialId=paper&confId=3580 .
4. Y. Simmhan, B. Plale, and D. Gannon, "A Survey of Data Provenance in e-Science," SIGMOD Record, vol. 34, no. 3, 2005, pp. 31–36.
5. C.D. Jones et al., "The New CMS Event Data Model and Framework," Proc. Int'l Conf. Computing in High-Energy Physics (CHEP 06), CERN, 2006; http://indico.cern.ch/getFile.pyaccess?contribId=242&sessionId=3&resId=0&materialId=paper&confId=048 .
6. I. Altintas et al., "Kepler: An Extensible System for Design and Execution of Scientific Workflows," Proc. 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM 04), IEEE CS Press, 2004, pp. 423–424.
7. S.P. Callahan et al., "VisTrails: Visualization Meets Data Management," Proc. ACM SIGMOD, ACM Press, 2006, pp. 745–747.
8. J. Freire et al., "Managing Rapidly-Evolving Scientific Workflows," Proc. Int'l Provenance and Annotation Workshop, LNCS 4145, Springer, 2006, pp. 10–18.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool