Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1
Automatic Discovery of Semantic Structures in HTML Documents
Edinburgh, Scotland
August 03-August 06
ISBN: 0-7695-1960-1
Template-driven HTML documents posses an implicit, fixed schema denoting concepts and their relationships in a hierarchical fashion. Discovering this schema remains a relatively unexplored problem. By exploiting a key observation that semantically related items in HTML documents exhibit spatial locality, we develop an algorithm for automatically partitioning them into tree-like semantic structures which expose the implicit schema.
Citation:
Saikat Mukherjee, Guizhen Yang, Wenfang Tan, I.V. Ramakrishnan, "Automatic Discovery of Semantic Structures in HTML Documents," icdar, vol. 1, pp.245, Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1, 2003