The Community for Technology Leaders
Computer Science and Information Engineering, World Congress on (2009)
Los Angeles, California USA
Mar. 31, 2009 to Apr. 2, 2009
ISBN: 978-0-7695-3507-4
pp: 458-461
In this paper, an efficient text classification algorithm for repeating-text information on the e-commerce site can automatically classify and sort the similar string. This algorithm will greatly increase the efficiency and accuracy of audited information. All tests show that for the number of information between 100 and 1000 the algorithm is very efficient, and the 1000 text information(strings) comparison can be controlled in two seconds. When the amount of information is over 1000, the computation time will be significantly increased. The precision can be rectified to adjust the relevant parameters of the algorithm, such as the number of the same substring in comparison results and the length of split string. For too short information, such as less than 10 words, the algorithm can be combined with the Levenshtein algorithm, in order to improve the text-search flexibility.
text classification algorithm, e-commerce, Text similarity
