loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 4
Big Island, Hawaii
January 07-January 10
ISBN: 0-7695-1435-9
We analyse academic Web pages in order to automatically classify them into Web genres. For this purpose, we have developed a database-driven corpus, currently containing 1300000+ documents, which comprises our empirical research basis. We introduce the notions of Web genre type which constitutes the framework for a certain Web genre, and compulsory and optional Web genre modules. These act as building blocks which go together to make up the structure characterised by the Web genre type and operate as modifiers for the default assignment. The analysis of a 200 document sample illustrates our notion of Web genre hierarchy into which Web genre types and modules are embedded. The analysis of four documents of the Web Genre Academic's Personal Homepage demonstrates our approach and our long-term goal of automatically extracting the contents of Web genre modules in order to build up structured XML documents of unstructured HTML documents.
Index Terms:
Digital Genre, Web Genre, Automatic Genre Identification, Computational Linguistics, XML, Natural Language Processing
Citation:
G. Rehm, "Towards Automatic Web Genre Identification," hicss, vol. 4, pp.101, 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 4, 2002
Usage of this product signifies your acceptance of the Terms of Use.