2012 International Conference on Cloud and Service Computing (2013)
Beijing, China China
Nov. 4, 2013 to Nov. 6, 2013
pp: 134-139
Analyzing and mining the massive data recorded in microblog in order to discover the characteristics and rules of individual behaviors, group behaviors and interactive behaviors is now the research hotspot of massive data mining and behavioral analysis area. However, the influence of social attributes, such as user's occupation, to his behavior and social relations is always neglected in the existing researches. Concerning this issue, the paper proposed a high accuracy microblog user classification method for professional analysis -- -- CMPK (Classification Method based on Professional lexicon and K-nearest neighbor algorithm), this method uses vector space model combined with the professional lexicon and KNN (K-Nearest Neighbor algorithm) classification algorithm to analyze the industry that the microblog user belongs to based on all kinds of information he put on the network. The experiment proved that the accuracy rate of CMPK is nearly 90% which is high precision.
K-Nearest Neighbor algorithm, text mining, user classification, vector space model
