The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2009 vol.20)
pp: 1246-1259
Paul Groth , University of Southern California, Marina del Rey
Luc Moreau , University of Southampton, Southampton
ABSTRACT
Scientific and business communities are adopting large-scale distributed systems as a means to solve a wide range of resource-intensive tasks. These communities also have requirements in terms of provenance. We define the provenance of a result produced by a distributed system as the process that led to that result. This paper describes a protocol for recording documentation of a distributed system's execution. The distributed protocol guarantees that documentation with characteristics suitable for accurately determining the provenance of results is recorded. These characteristics are confirmed through a number of proofs based on an abstract state machine formalization.
INDEX TERMS
Provenance, lineage, grids, distributed systems, data protocols.
CITATION
Paul Groth, Luc Moreau, "Recording Process Documentation for Provenance", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 9, pp. 1246-1259, September 2009, doi:10.1109/TPDS.2008.215
REFERENCES
[1] P.C. Bates, “Debugging Heterogeneous Distributed Systems Using Event-Based Models of Behavior,” ACM Trans. Computer Systems, vol. 13, no. 1, pp. 1-31, 1995.
[2] R. Bose and J. Frew, “Lineage Retrieval for Scientific Data Processing: A Survey,” ACM Computing Surveys, vol. 37, no. 1, pp. 1-28, 2005.
[3] P. Buneman, S. Khanna, and W. Tan, “Why and Where: A Characterization of Data Provenance,” Proc. Int'l Conf. Databases Theory (ICDT '01), pp. 316-330, 2001.
[4] K.M. Chandy and L. Lamport, “Distributed Snapshots: Determining Global States of Distributed Systems,” ACM Trans. Computer Systems, vol. 3, no. 1, pp. 63-75, 1985.
[5] Y. Cui, J. Widom, and J.L. Wiener, “Tracing the Lineage of View Data in a Warehousing Environment,” ACM Trans. Database Systems, vol. 25, no. 2, pp. 179-227, 2000.
[6] E.N.M. Elnozahy, L. Alvisi, Y.-M. Wang, and D.B. Johnson, “A Survey of Rollback-Recovery Protocols in Message-Passing Systems,” ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, 2002.
[7] J.N. Foster, M.B. Greenwald, C. Kirkegaard, B.C. Pierce, and A. Schmitt, “Exploiting Schemas in Data Synchronization,” Proc. Int'l Symp. Database Programming Languages (DBPL '05), pp. 42-57, 2005.
[8] P. Groth, M. Luck, and L. Moreau, “Formalising a Protocol for Recording Provenance in Grids,” Proc. UK OST e-Science Second All Hands Meeting (AHM '04), Sept. 2004.
[9] P. Groth, M. Luck, and L. Moreau, “A Protocol for Recording Provenance in Service-Oriented Grids,” Proc. Eighth Int'l Conf. Principles of Distributed Systems (OPODIS '04), T. Higashino, ed., pp. 124-139, Dec. 2004.
[10] P. Groth, S. Miles, W. Fang, S.C. Wong, K.-P. Zauner, and L. Moreau, “Recording and Using Provenance in a Protein Compressibility Experiment,” Proc. 14th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2005.
[11] P. Groth, S. Miles, and L. Moreau, “A Model of Process Documentation to Determine Provenance in Mash-Ups,” Trans. Internet Technology, 2008.
[12] P. Groth, S. Miles, V. Tan, and L. Moreau, “Architecture for Provenance Systems,” technical report, Univ. of Southampton, http://eprints.ecs.soton.ac.uk11310/, Oct. 2005.
[13] J. Joyce, G. Lomow, K. Slind, and B. Unger, “Monitoring Distributed Systems,” ACM Trans. Computer Systems, vol. 5, no. 2, pp. 121-150, 1987.
[14] J. Ledlie, C. Ng, D.A. Holland, K.-K. Muniswamy-Reddy, U. Braun, and M. Seltzer, “Provenance-Aware Sensor Data Storage,” Proc. First Int'l Workshop Networking Meets Databases (NetDB '05), Apr. 2005.
[15] S. Miles, P. Groth, M. Branco, and L. Moreau, “The Requirements of Using Provenance in e-Science Experiments,” J. Grid Computing, vol. 5, no. 1, pp. 1-25, 2007.
[16] L. Moreau, P. Dickman, and R. Jones, “Birrell's Distributed Reference Listing Revisited,” ACM Trans. Programming Languages and Systems, vol. 27, no. 6, pp. 1344-1395, Nov. 2005.
[17] L. Moreau and I. Foster, eds., Provenance and Annotation of Data—Proc. Int'l Provenance and Annotation Workshop (IPAW '06), Springer-Verlag, May 2006.
[18] L. Moreau, P. Groth, S. Miles, J. Vazquez, J. Ibbotson, S. Jiang, S. Munroe, O. Rana, A. Schreiber, V. Tan, and L. Varga, “The Provenance of Electronic Data,” Comm. ACM, Apr. 2008.
[19] A. Schreiber, “The Integrated Simulation Environment TENT,” Concurrency and Computation: Practice and Experience, vol. 14, nos.13-15, Jan. 2003.
[20] C.D. Snow, H. Ngyen, V.S. Pande, and M. Gruebele, “Absolute Comparison of Simulated and Experimental Protein-Folding Dynamics,” Nature, vol. 420, pp. 102-106, 2002.
[21] W.-C. Tan, “Research Problems in Data Provenance,” IEEE Data Eng. Bull., vol. 27, no. 4, pp. 45-52, 2004.
[22] S. Tezuka, H. Murata, S. Tanaka, and S. Yumae, “Monte Carlo Grid for Financial Risk Management,” Future Generation Computer Systems, vol. 21, no. 5, pp. 811-821, 2005.
[23] A. Tridgell, “Efficient Algorithms for Sorting and Synchronization,” PhD thesis, Australian Nat'l Univ., Feb. 1999.
[24] J. Zhao, C. Goble, R. Stevens, and D. Turi, “Mining Taverna's Semantic Web of Provenance,” J. Concurrency and Computation: Practice and Experience, 2007.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool