The Community for Technology Leaders
2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011) (2011)
Lawrence, KS, USA
Nov. 6, 2011 to Nov. 10, 2011
ISBN: 978-1-4577-1638-6
pp: 476-479
Michele Lanza , Faculty of Informatics, Univ. of Lugano, Switzerland
Anthony Cleve , Faculty of Informatics, University of Namur, Belgium
Andrea Mocci , DEI, Politecnico di Milano, Italy
Alberto Bacchelli , Faculty of Informatics, Univ. of Lugano, Switzerland
ABSTRACT
The design and evolution of a software system leave traces in various kinds of artifacts. In software, produced by humans for humans, many artifacts are written in natural language by people involved in the project. Such entities contain structured information which constitute a valuable source of knowledge for analyzing and comprehending a system's design and evolution. However, the ambiguous and informal nature of narrative is a serious challenge in gathering such information, which is scattered throughout natural language text. We present an approach-based on island parsing-to recognize and enable the parsing of structured information that occur in natural language artifacts. We evaluate our approach by applying it to mailing lists pertaining to three software systems. We show that this approach allows us to extract structured data from emails with high precision and recall.
INDEX TERMS
CITATION
Michele Lanza, Anthony Cleve, Andrea Mocci, Alberto Bacchelli, "Extracting structured data from natural language documents with island parsing", 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), vol. 00, no. , pp. 476-479, 2011, doi:10.1109/ASE.2011.6100103
192 ms
(Ver 3.3 (11022016))