Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 1 Learning of Pattern-Based Rules for Document Classification Curitiba, Parana, Brazil September 23-September 26 ISBN: 0-7695-2822-8
Automatic processing of office documents, such as orders, invoices, or offers entails a significant poten- tial for saving costs. Because such domains have a high percentage of special vocabulary, purely statisti- cal approaches fail in automatic classification. The inherent structure and short text messages require spe- cific approaches. We propose a rule-based method to classify mixed stacks of documents into a set of hierar- chically organized classes. Rules are learned by ex- tracting patterns of different types from a document sample. The paper focuses on the architecture and on the learning process, presents comparing results to other techniques, and gives an outlook on how to fur- ther improve the system.
Citation:
A. Dengel, "Learning of Pattern-Based Rules for Document Classification," icdar, vol. 1, pp.123-127, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 1, 2007 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||