2014 Brazilian Conference on Intelligent Systems (BRACIS) (2014)
Sao Paulo, Brazil
Oct. 18, 2014 to Oct. 22, 2014
Given the amount of information stored in textual data and the fact that it is unstructured, algorithms able to process and transform it to a format useful to solve real world problems are desirable. Tasks like organization and exploration of large document collections can benefit from the design of such methods. This work proposes an incremental, online and probabilistic clustering algorithm for textual data, based on a mixture of Multinomial distributions. The main advantage of the model is that only a single step over the training data is necessary to learn from it. As more texts are processed, the model improves its structure to better represent the data stream.
Mathematical model, Vocabulary, Vectors, Clustering algorithms, Data models, Training, Probabilistic logic
T. F. Rodrigues and P. M. Engel, "Probabilistic Clustering and Classification for Textual Data: An Online and Incremental Approach," 2014 Brazilian Conference on Intelligent Systems (BRACIS), Sao Paulo, Brazil, 2014, pp. 288-293.