This Article 
 Bibliographic References 
 Add to: 
On Embedding Machine-Processable Semantics into Documents
July 2005 (vol. 17 no. 7)
pp. 1014-1018
Most Web and legacy paper-based documents are available in human comprehensible text form, not readily accessible to or understood by computer programs. Here, we investigate an approach to amalgamate XML technology with programming languages for representational purposes that can enhance traceability, thereby facilitating semiautomatic extraction and update. Specifically, we propose a modular technique to embed machine-processable semantics into a text document with tabular data via annotations, resulting sometimes in ill-formed XML fragments, and evaluate this technique vis a vis document querying, manipulation, and integration. The ultimate aim is to be able to author and extract human-readable and machine-comprehensible parts of a document hand in hand and keep them side by side.

[1] Towards the Semantic Web: Ontology-Driven Knowledge Management, J. Davies, D. Fensel, and F. van Harmelen, eds. John Wiley and Sons, Inc., 2003.
[2] D. Knuth, “Literate Programming,” CSLI Lecture Notes Number 27, Center for the Study of Language and Information, Stanford Univ., California, 1992.
[3] D. Fensel, “Semantic Web Enabled Web Services,” Proc. Seventh Int'l Conf. Applications of Natural Language to Information Systems, June 2002.
[4] Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, D. Fensel, J. Hendler, H. Lieberman, and W. Wahlster, eds. The MIT Press, 2003.
[5], retrieved 1/20/2005.
[6] C. Kunicki, “What's New with Smart Tags in Office 2003,” MSDN Library Article, Microsoft Corp., Jan. 2003.
[7] P. Pyreddy and W.B. Croft, “TINTIN: A System for Retrieval in Text Tables,” Proc. Second ACM Int'l Conf. Digital Libraries, pp. 193-200, 1997.
[8] M. Plusch, Water: Simplified Web Services and XML Programming. Wiley Publishing, 2003, http:/, retrieved 1/20/2005.
[9] H. Silberhorn, “TabulaMagica: An Integrated Approach to Manage Complex Tables,” Proc. 2001 ACM Symp. Document Eng., pp. 68-75, 2001.
[10] index.html, retrieved 1/20/2005.
[11] http:/, retrieved 1/20/2005.
[12] D. Tidwell, XSLT. O'Reilly & Assoc., Aug. 2001.
[13] P. Wadler, An Introduction to Orwell. Programming Research Group, Oxford Univ., Apr. 1985.
[14] P. Wadler, “XML: Some Hyperlinks Minus the Hype,” http://home /, retrieved 1/20/2005.
[15] http://www.vbxml.comxpl/, retrieved 1/20/2005.

Index Terms:
Index Terms- Structured data and knowledge representation, XML-based programming language, Semantic Web.
Krishnaprasad Thirunarayan, "On Embedding Machine-Processable Semantics into Documents," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 7, pp. 1014-1018, July 2005, doi:10.1109/TKDE.2005.113
Usage of this product signifies your acceptance of the Terms of Use.