The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - Nov. (2012 vol.24)
pp: 1963-1976
Chien Chin Chen , National Taiwan University, Taipei
Zhong-Yong Chen , National Taiwan University, Taipei
Chen-Yuan Wu , National Taiwan University, Taipei
ABSTRACT
A topic is usually associated with a specific time, place, and person(s). Generally, topics that involve bipolar or competing viewpoints are attention getting and are thus reported in a large number of documents. Identifying the association between important persons mentioned in numerous topic documents would help readers comprehend topics more easily. In this paper, we propose an unsupervised approach for identifying bipolar person names in a set of topic documents. Specifically, we employ principal component analysis (PCA) to discover bipolar word usage patterns of person names in the documents, and show that the signs of the entries in the principal eigenvector of PCA partition the person names into bipolar groups spontaneously. To reduce the effect of data sparseness, we introduce two techniques, called the weighted correlation coefficient and off-topic block elimination. We also present a timeline system that shows the intensity and activeness development of the identified bipolar person groups. Empirical evaluations demonstrate the efficacy of the proposed approach in identifying bipolar person names in topic documents, while the generated timelines provide comprehensive storylines of topics.
INDEX TERMS
Principal component analysis, Correlation, Symmetric matrices, Hidden Markov models, Web pages, Matrix decomposition, Internet, bipolar timeline, Topic mining, sentiment analysis
CITATION
Chien Chin Chen, Zhong-Yong Chen, Chen-Yuan Wu, "An Unsupervised Approach for Person Name Bipolarization Using Principal Component Analysis", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 11, pp. 1963-1976, Nov. 2012, doi:10.1109/TKDE.2011.177
REFERENCES
[1] J.M. Kleinberg, "Bursty and Hierarchical Structure in Streams," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 91-101, 2002.
[2] R. Nallapati, A. Feng, F. Peng, and J. Allan, "Event Threading within News Topics," Proc. 13th ACM Int'l Conf. Information and Knowledge Management, pp. 446-453, 2004.
[3] B. Pang and L. Lee, "Opinion Mining and Sentiment Analysis," Foundations and Trends in Information Retrieval, vol. 2, nos. 1/2, pp. 1-135, 2008.
[4] G.A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K.J. Miller, "Introduction to WordNet: An On-line Lexical Database," Int'l J. Lexicography, vol. 3, no. 4, pp. 235-244, 1990.
[5] L.I. Smith, A Tutorial on Principal Components Analysis. Cornell Univ., 2002.
[6] J. Artiles, J. Gonzalo, and S. Sekine, "The Semeval-2007 WePS Evaluation: Establishing a Benchmark for the Web People Search Task," Proc. Int'l Workshop Semantic Evaluations, pp. 64-69, 2007.
[7] X. Wan, J. Gao, M. Li, and B. Ding, "Person Resolution in Person Search Results: WebHawk," Proc. 14th ACM Int'l Conf. Information and Knowledge Management, pp. 163-170, 2005.
[8] D.V. Kalashnikov, R. Nuray-Turan, and S. Mehrotra, "Towards Breaking the Quality Curse: A Web-Querying Approach to Web People Search," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 27-34, 2008.
[9] Y. Song, J. Huang, I.G. Councill, J. Li, and C.L. Giles, "Efficient Topic-Based Unsupervised Name Disambiguation," Proc. ACM/IEEE CS Seventh Joint Conf. Digital Libraries, pp. 342-351, 2007.
[10] V. Hatzivassiloglou and K.R. McKeown, "Predicting the Semantic Orientation of Adjectives," Proc. Eighth Conf. European Chapter of the Assoc. for Computational Linguistics, pp. 174-181, 1997.
[11] P.D. Turney and M.L. Littman, "Measuring Praise and Criticism: Inference of Semantic Orientation from Association," ACM Trans. Information Systems, vol. 21, pp. 315-346, 2003.
[12] A. Esuli and F. Sebastiani, "SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining," Proc. Fifth Conf. Language Resources and Evaluation, pp. 417-422, 2006.
[13] H. Kanayama and T. Nasukawa, "Fully Automatic Lexicon Expansion for Domain-Oriented Sentiment Analysis," Proc. Conf. Empirical Methods in Natural Language Processing, pp. 355-363, 2006.
[14] M. Ganapathibhotla and B. Liu, "Mining Opinions in Comparative Sentences," Proc. 22nd Int'l Conf. Computational Linguistics, pp. 241-248, 2008.
[15] Q. Mei and C.X. Zhai, "Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining, pp. 198-207, 2005.
[16] A. Feng and J. Allan, "Finding and Linking Incidents in News," Proc. 16th ACM Conf. Information and Knowledge Management, pp. 821-830, 2007.
[17] C.C. Chen and M.C. Chen, "TSCAN: A Novel Method for Topic Summarization and Content Anatomy," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 579-586, 2008.
[18] L.E. Spence, A.J. Insel, and S.H. Friedberg, Elementary Linear Algebra, A Matrix Approach. Prentice Hall, 2000.
[19] W.L. Winston, Operations Research. Thomson, 2004.
[20] Y. Gong and X. Liu, "Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 19-25, 2001.
[21] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
[22] C.D. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[23] J.M. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," J. ACM, vol. 46, no. 5, pp. 604-632, 1999.
[24] T. Hofmann, "Probabilistic Latent Semantic Indexing," Proc. 22nd Ann. Int'l SIGIR Conf. Research and Development in Information Retrieval, pp. 50-57, 1999.
[25] A. Farahat and F. Chen, "Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis," Proc. 11th Conf. European Chapter of the Assoc. for Computational Linguistics (EACL), pp. 105-112, 2006.
[26] G. Keller and B. Warrack, Statistics for Management and Economics. Duxbury, 1999.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool