Subscribe
Lyon, France
Aug. 22, 2011 to Aug. 27, 2011
ISBN: 978-0-7695-4513-4
pp: 120-123
ABSTRACT
Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms characterizing a document (or a sample of texts). We then show how these Z score values can be used to derive an efficient categorization scheme. To evaluate this proposition we categorize speeches given by B. Obama as either electoral or presidential. The results tend to show that the suggested classification scheme performs better than a Support Vector Machine scheme, and a Naive Bayes classifier (10-fold cross validation).
INDEX TERMS
Text Categorization, Machine Learning, Lexical Analysis, Political Discourse, Natural Language Processing
CITATION
Jacques Savoy, "Classification Based on Specific Vocabulary", WI-IAT, 2011, Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on 2011, pp. 120-123, doi:10.1109/WI-IAT.2011.19