2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI) (2018)
Nov 5, 2018 to Nov 7, 2018
The demand for mining massive short-text data from the Internet has promoted researches on topic models. There exist many schemes trying to solve the sparsity problems brought by short texts, mainly based on data aggregation or model improvement. Among them, Biterm Topic Model changes the way of modeling topics, which is on document-level biterms and has shown creativity and effectiveness. However, this may ignore those semantically similar and rarely co-occurrent word pairs, which are denoted as global biterms in this paper. Inspired by the successful application of word embeddings in GPU-DMM, we exploit word embeddings to extract semantically similar word pairs from the whole corpus to help discover better topics. We call this model as GloSS, which takes advantages of both the approach to model topics and word embeddings. Experimental results on two open-source and real datasets are superior to state-of-the-art topic models for short texts.
data mining, text analysis
H. Lu, G. Ge, Y. Li, C. Wang and J. Xie, "Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery," 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI), Volos, Greece, 2019, pp. 975-982.