loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007)
A Combining Approach for Chinese Word Segmentation
Haier International Training Center, Qingdao, China
July 30-August 01
ISBN: 0-7695-2909-7
Aiqing Wang, Qingdao Technological University, China
Sen Zhang, Beijing University of Technology, China
In Chinese and many other Asian languages which are based on non-ASCII alphabet, words are not delimited with whitespace (space, tab etc.), and word boundaries must therefore be reconstructed. Further syntactic analysis is based on the output of word segmentation result. Ambiguity and unregistered words are the most important problems in Chinese word segmentation. In this paper we analyzed the ambiguous reasons and presented a one-pass scan method for the detection and modification of ambiguous cases. To deal with the unregistered words and special words (such as names), we proposed a combination method that can recognize new words, hence the accuracy can be increased. In the realization, we used the bi-section search method to look up words in a large dictionary (more than 40,000 items), and the average search cost for a word is less than 16 operations, so the speed is satisfactory if the system is embedded into Chinese understanding systems or Chinese speech processing systems.
Citation:
Aiqing Wang, Sen Zhang, "A Combining Approach for Chinese Word Segmentation," snpd, vol. 3, pp.738-743, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.