The Community for Technology Leaders
2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) (2017)
Atlanta, Georgia, USA
June 5, 2017 to June 8, 2017
ISSN: 1063-6927
ISBN: 978-1-5386-1792-2
pp: 2333-2337
ABSTRACT
In this paper we propose and evaluate three approaches for automated classification of texts in over 60 languages without the need for a manually annotated dataset in those languages. All approaches are based on the randomized Explicit Semantic Analysis method using multilingual Wikipedia articles as their knowledge repository. We evaluate the proposed approaches by classifying a Twitter dataset in English and Portuguese into relevant and irrelevant items with respect to landslide as a natural disaster, where the highest achieved F1-score is 0.93. These approaches can be used in various applications where multilingual classification is needed, including multilingual disaster reporting using Social Media to improve coverage and increase confidence. As illustration, we present a demonstration that combines data from physical sensors and social networks to detect landslide events reported in English and Portuguese.
INDEX TERMS
Terrain factors, Encyclopedias, Electronic publishing, Internet, Training, Social network services
CITATION

A. Musaev and C. Pu, "Towards Multilingual Automated Classification Systems," 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, Georgia, USA, 2017, pp. 2333-2337.
doi:10.1109/ICDCS.2017.208
83 ms
(Ver 3.3 (11022016))