This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Semantic Provenance for eScience: Managing the Deluge of Scientific Data
July/August 2008 (vol. 12 no. 4)
pp. 46-54
Satya S. Sahoo, Kno.e.sis Center, Wright State University
Amit Sheth, Kno.e.sis Center, Wright State University
Cory Henson, Kno.e.sis Center, Wright State University
Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises expressive provenance information and domain-specific provenance ontologies and applies this information to data management. The authors' "two degrees of separation" approach advocates the creation of high-quality provenance information using specialized services. In contrast to workflow engines generating provenance information as a core functionality, the specialized provenance services are integrated into a scientific workflow on demand. This article describes an implementation of the semantic provenance framework for glycoproteomics.

1. D. Atkins, Revolutionizing Science and Engineering through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, Nat'l Science Foundation, 2003.
2. T. Hey and A.E. Trefethen, "Cyberinfrastructure for e-Science," Science, May 2005, pp. 817–821.
3. V. Kashyap and A. Sheth, "Schematic and Semantic Similarities between Database Objects: A Context-Based Approach," Very Large Databases J., vol. 5, no. 4, 1996, pp. 276–304.
4. E. Camon et al., "The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro.," Genome Res., vol. 13, no. 4, 2003, pp. 662–672.
5. A. Sheth, "Semantic Meta Data for Enterprise Information Integration," Data Management Rev., July 2003; www.dmreview.com/issues/2003070169621.html .
6. C. Goble, "Position Statement: Musings on Provenance, Workflow and (Semantic Web) Annotations for Bioinformatics," Proc. Workshop on Data Derivation and Provenance, 2002; http://people.cs.uchicago.edu/~yongzh/papers provenance_workshop_3.doc.
7. Y.L. Simmhan, A.B. Plale, and A.D Gannon, "A Survey of Data Provenance in e-Science," SIGMOD Record, vol. 34, no. 3, 2005, pp. 31–36.
8. W.C. Tan, "Provenance in Databases: Past, Current, and Future," IEEE Data Eng. Bull., vol. 30, no. 4, 2007, pp. 3–12.
9. R. Stevens, J. Zhao, and C. Goble, "Using Provenance to Manage Knowledge of In Silico Experiments," Briefings in Bioinformatics, vol. 8, no. 3, 2007, pp. 183–194.
10. I. Sommerville, Software Engineering, Pearson Education, 2004.
11. P. Hayes, RDF Semantics, World Wide Web Consortium (W3C), B. McBride, ed., 2004; www.w3.org/TRrdfmt.
12. S.S. Sahoo et al., "Knowledge Modeling and its Application in Life Sciences: A Tale of Two Ontologies," Proc. 15th Int'l Conf. World Wide Web (WWW 06), ACM Press, 2006, pp. 317–326.
1. P.N. Taylor et al., "A Systematic Approach to Modeling, Capturing, and Disseminating Proteomics Experimental Data," Nature Biotechnology, Mar. 2003, pp. 247–254.
2. R. Stevens, J. Zhao, and C. Goble, "Using Provenance to Manage Knowledge of In Silico Experiments," Briefings in Bioinformatics, vol. 8, no. 3, 2007, pp. 183–194.
3. P. Pinheiro da Silva, D.L. McGuinness, and R. McCool, "Knowledge Provenance Infrastructure," IEEE Data Eng. Bulletin, vol. 26, no. 4, 2003, pp. 26–32.
4. D.L. McGuinness and P. Pinheiro da Silva, "Explaining Answers from the Semantic Web: The Inference Web Approach," J. Web Semantics, vol. 1, no. 4, 2004, pp. 397–413.
5. W.C. Tan, "Provenance in Databases: Past, Current, and Future," IEEE Data Eng. Bull., vol. 30, no. 4, pp. 3–12.

Index Terms:
semantic provenance, metadata, provenance, eScience, cyberinfrastructure, Spade
Citation:
Satya S. Sahoo, Amit Sheth, Cory Henson, "Semantic Provenance for eScience: Managing the Deluge of Scientific Data," IEEE Internet Computing, vol. 12, no. 4, pp. 46-54, July-Aug. 2008, doi:10.1109/MIC.2008.86
Usage of this product signifies your acceptance of the Terms of Use.