This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies
Classification Based on Specific Vocabulary
Lyon France
August 22-August 27
ISBN: 978-0-7695-4513-4
Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms characterizing a document (or a sample of texts). We then show how these Z score values can be used to derive an efficient categorization scheme. To evaluate this proposition we categorize speeches given by B. Obama as either electoral or presidential. The results tend to show that the suggested classification scheme performs better than a Support Vector Machine scheme, and a Naive Bayes classifier (10-fold cross validation).
Index Terms:
Text Categorization, Machine Learning, Lexical Analysis, Political Discourse, Natural Language Processing
Citation:
Jacques Savoy, Olena Zubaryeva, "Classification Based on Specific Vocabulary," wi-iat, vol. 1, pp.120-123, 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies, 2011
Usage of this product signifies your acceptance of the Terms of Use.