2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)
ITPilot: A Toolkit for Industrial-Strength Web Data Extraction
Compi?gne University of Technology, France
September 19-September 22
ISBN: 0-7695-2415-X
In recent years, many research systems have been proposed to perform data extraction and automation tasks on Web sources. Since most of today?s Web sources are "human-readable" but not "machine-readable", these systems must address a number of difficult challenges, such as dealing with complex navigation sequences, extracting data from HTML pages and reacting to source changes. Denodo Corporation has developed ITPilot, an industrial-strength solution that allows complex "wrappers" for Web sources to be graphically generated and automatically maintained. This paper presents the architecture and the basic ideas "behind the scenes" in ITPilot.
Citation:
Alberto Pan, Juan Raposo, Manuel ?lvarez, Paula Montoto, Jos? Losada, Justo Hidalgo, "ITPilot: A Toolkit for Industrial-Strength Web Data Extraction," wi, pp.798-801, 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05), 2005