The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2014 vol.26)
pp: 120-130
Toshimitsu Takahashi , The University of Tokyo, Tokyo
Ryota Tomioka , The University of Tokyo, Tokyo
Kenji Yamanishi , The University of Tokyo, Tokyo
ABSTRACT
Detection of emerging topics is now receiving renewed interest motivated by the rapid growth of social networks. Conventional-term-frequency-based approaches may not be appropriate in this context, because the information exchanged in social-network posts include not only text but also images, URLs, and videos. We focus on emergence of topics signaled by social aspects of theses networks. Specifically, we focus on mentions of users--links between users that are generated dynamically (intentionally or unintentionally) through replies, mentions, and retweets. We propose a probability model of the mentioning behavior of a social network user, and propose to detect the emergence of a new topic from the anomalies measured through the model. Aggregating anomaly scores from hundreds of users, we show that we can detect emerging topics only based on the reply/mention relationships in social-network posts. We demonstrate our technique in several real data sets we gathered from Twitter. The experiments show that the proposed mention-anomaly-based approaches can detect new topics at least as early as text-anomaly-based approaches, and in some cases much earlier when the topic is poorly identified by the textual contents in posts.
INDEX TERMS
Social network services, Maximum likelihood estimation, Encoding, Hidden Markov models, Density functional theory, Training,burst detection, Topic detection, anomaly detection, social networks, sequentially discounted normalized maximum-likelihood coding
CITATION
Toshimitsu Takahashi, Ryota Tomioka, Kenji Yamanishi, "Discovering Emerging Topics in Social Streams via Link-Anomaly Detection", IEEE Transactions on Knowledge & Data Engineering, vol.26, no. 1, pp. 120-130, Jan. 2014, doi:10.1109/TKDE.2012.239
REFERENCES
[1] J. Allan et al., "Topic Detection and Tracking Pilot Study: Final Report," Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998.
[2] J. Kleinberg, "Bursty and Hierarchical Structure in Streams," Data Mining Knowledge Discovery, vol. 7, no. 4, pp. 373-397, 2003.
[3] Y. Urabe, K. Yamanishi, R. Tomioka, and H. Iwai, "Real-Time Change-Point Detection Using Sequentially Discounting Normalized Maximum Likelihood Coding," Proc. 15th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD' 11), 2011.
[4] S. Morinaga and K. Yamanishi, "Tracking Dynamics of Topic Trends Using a Finite Mixture Model," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 811-816, 2004.
[5] Q. Mei and C. Zhai, "Discovering Evolutionary Theme Patterns from Text: An Exploration of Temporal Text Mining," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining, pp. 198-207, 2005.
[6] A. Krause, J. Leskovec, and C. Guestrin, "Data Association for Topic Intensity Tracking," Proc. 23rd Int'l Conf. Machine Learning (ICML' 06), pp. 497-504, 2006.
[7] D. He and D.S. Parker, "Topic Dynamics: An Alternative Model of Bursts in Streams of Topics," Proc. 16th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 443-452, 2010.
[8] H. Small, "Visualizing Science by Citation Mapping," J. Am. Soc. Information Science, vol. 50, no. 9, pp. 799-813, 1999.
[9] D. Aldous, "Exchangeability and Related Topics," École d'Été de Probabilités de Saint-Flour XIII—1983, pp. 1-198, Springer, 1985.
[10] Y. Teh, M. Jordan, M. Beal, and D. Blei, "Hierarchical Dirichlet Processes," J. Am. Statistical Assoc., vol. 101, no. 476, pp. 1566-1581, 2006.
[11] D. Lewis, "Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval," Proc. 10th European Conf. Machine Learning (ECML' 98), pp. 4-15, 1998.
[12] K. Yamanishi and J. Takeuchi, "A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2002.
[13] J. Takeuchi and K. Yamanishi, "A Unifying Framework for Detecting Outliers and Change Points from Time Series," IEEE Trans. Knowledge Data Eng., vol. 18, no. 4, pp. 482-492, Apr. 2006.
[14] J. Rissanen, "Strong Optimality of the Normalized ML Models as Universal Codes and Information in Data," IEEE Trans. Information Theory, vol. 47, no. 5, pp. 1712-1717, July 2001.
[15] T. Roos and J. Rissanen, "On Sequentially Normalized Maximum Likelihood Models," Proc. Workshop Information Theoretic Methods in Science and Eng., 2008.
[16] J. Rissanen, T. Roos, and P. Myllymäki, "Model Selection by Sequentially Normalized Least Squares," J. Multivariate Analysis, vol. 101, no. 4, pp. 839-849, 2010.
[17] C. Giurcăneanu, S. Razavi, and A. Liski, "Variable Selection in Linear Regression: Several Approaches Based on Normalized Maximum Likelihood," Signal Processing, vol. 91, pp. 1671-1692, 2011.
[18] C. Giurcăneanu and S. Razavi, "AR Order Selection in the Case When the Model Parameters Are Estimated by Forgetting Factor Least-Squares Algorithms," Signal Processing, vol. 90, no. 2, pp. 451-466, 2010.
[19] K. Yamanishi and Y. Maruyama, "Dynamic Syslog Mining for Network Failure Monitoring," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining, pp. 499-508, 2005.
42 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool