Issue No. 03 - March (2008 vol. 20)
The goal of on-line event analysis is to detect events and their associated documents in real-time from a continuous stream of documents generated by multiple information sources. Existing approaches (e.g., window-based, decay function, and adaptive threshold methods) incorporate the temporal relations of documents into traditional text categorization methods for event analysis. However, these methods suffer from the threshold dependence problem, i.e., their performance is only acceptable for a narrow range of thresholds; thus, it is difficult to designate an appropriate threshold in advance. In this paper, we propose a threshold resilient algorithm, called Incremental Probabilistic Latent Semantic Indexing (IPLSI), which can capture the storyline development of an event without the threshold dependence problem. The IPLSI algorithm is theoretically sound and more efficient than na?ve PLSI approaches. The results of the performance evaluation based on the TDT 4 corpus show that the proposed algorithm reduces the error tradeoff cost of event detection by as much as 14.51% and increases the threshold range for acceptable performance by 300% - 800%
Web mining, Clustering, Knowledge life cycles, Probabilistic algorithms
T. Chou and M. C. Chen, "Using Incremental PLSI for Threshold-Resilient Online Event Analysis," in IEEE Transactions on Knowledge & Data Engineering, vol. 20, no. , pp. 289-299, 2007.