Proceedings of Sixth International Conference on Document Analysis and Recognition (2001)
Sept. 10, 2001 to Sept. 13, 2001
Gerald Penn , Lucent Bell Labs
Jianying Hu , Lucent Bell Labs
Hengbin Luo , Lucent Bell Labs
Ryan McDonald , University of Toronto
Abstract: We propose a set of baseline heuristics for identifying genuinely tabular information and news links in HTML documents. A prototype implementation of these heuristics is described for delivering content from news providers' home pages to a narrow-bandwidth device such as a portable digital assistant or cellular phone display. Its evaluation on 75 web-sites is provided, along with a discussion of topics for future research.
J. Hu, R. McDonald, H. Luo and G. Penn, "Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices," Proceedings of Sixth International Conference on Document Analysis and Recognition(ICDAR), Seattle, Washington, 2001, pp. 1074.