Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1
Classification of Web Documents Using a Graph Model
Edinburgh, Scotland
August 03-August 06
ISBN: 0-7695-1960-1
In this paper we describe work relating to classification of web documents using a graph-based model instead of the traditional vector-based model for document representation. We compare the classification accuracy of the vector model approach using the k- Nearest Neighbor (k-NN) algorithm to a novel approach which allows the use of graphs for document representation in the k-NN algorithm. The proposed method is evaluated on three different web document collections using the leave-one-out approach for measuring classification accuracy. The results show that the graph-based k-NN approach can outperform traditional vector-based k-NN methods in terms of both accuracy and execution time.
Citation:
Adam Schenker, Mark Last, Horst Bunke, Abraham Kandel, "Classification of Web Documents Using a Graph Model," icdar, vol. 1, pp.240, Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1, 2003