loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth International Conference Document Analysis and Recognition (ICDAR'97)
Language identification of on-line documents using word shapes
Ulm, GERMANY
August 18-August 20
ISBN: 0-8186-7898-4
N. Nobile, Centre for Pattern Recognition & Machine Intelligence, Concordia Univ., Montreal, Que., Canada
S. Bergler, Centre for Pattern Recognition & Machine Intelligence, Concordia Univ., Montreal, Que., Canada
C.Y. Suen, Centre for Pattern Recognition & Machine Intelligence, Concordia Univ., Montreal, Que., Canada
S. Khoury, Centre for Pattern Recognition & Machine Intelligence, Concordia Univ., Montreal, Que., Canada
The authors have extended existing methods to identify the language of an on-line document after the characters have been coded using 10 character classes based on visual characteristics. In particular, they exploit word bigrams and trigrams in both a linear combination of score values and an expert systems approach. Knowledge about each language as acquired from a large number of on-line texts. Using a small set of rules, the expert system outperforms the linear combination in accuracy and shows more stability when parameter settings are varied.
Index Terms:
identification; language identification; on-line documents; word shapes; coded characters; character classes; visual characteristics; word bigrams; word trigrams; linear score value combination; expert system; knowledge acquisition; on-line texts; rules; accuracy; stability; varied parameter settings
Citation:
N. Nobile, S. Bergler, C.Y. Suen, S. Khoury, "Language identification of on-line documents using word shapes," icdar, pp.258, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997
Usage of this product signifies your acceptance of the Terms of Use.