DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MC.2013.403
Alberto Bartoli , University of Trieste, Trieste
Giorgio Davanzo , University of Trieste, Trieste
Andrea De Lorenzo , University of Trieste, Trieste
Eric Medvet , University of Trieste, Trieste
Enrico Sorio , University of Trieste, Trieste
We propose a system for the automatic generation of regular expressions for text-extraction tasks. The user describes the desired task only by means of a set of labeled examples. The generated regexes may be used with common engines such as those that are part of Java, PHP, Perl and so on. Usage of the system does not require any familiarity with regular expressions syntax. We performed an extensive experimental evaluation on 12 different extraction tasks applied to real-world datasets. We obtained very good results in terms of precision and recall, even in comparison to earlier state-of-the-art proposals. Our results are highly promising toward the achievement of a practical surrogate for the specific skills required for generating regular expressions, and significant as a demonstration of what can be achieved with GP-based approaches on modern IT technology.
Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Eric Medvet, Enrico Sorio, "Automatic Synthesis of Regular Expressions from Examples", Computer, vol. , no. , pp. 0, 5555, doi:10.1109/MC.2013.403