The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2011 vol.23)
pp: 447-462
Stephen France , University of Wisconsin-Milwaukee, Milwaukee, WI
Zhu Zhang , University of Arizona, Tucson, AZ
Ahmed Abbasi , University of Wisconsin-Milwaukee, Milwaukee, WI
ABSTRACT
A major concern when incorporating large sets of diverse n-gram features for sentiment classification is the presence of noisy, irrelevant, and redundant attributes. These concerns can often make it difficult to harness the augmented discriminatory potential of extended feature sets. We propose a rule-based multivariate text feature selection method called Feature Relation Network (FRN) that considers semantic information and also leverages the syntactic relationships between n-gram features. FRN is intended to efficiently enable the inclusion of extended sets of heterogeneous n-gram features for enhanced sentiment classification. Experiments were conducted on three online review testbeds in comparison with methods used in prior sentiment classification research. FRN outperformed the comparison univariate, multivariate, and hybrid feature selection methods; it was able to select attributes resulting in significantly better classification accuracy irrespective of the feature subset sizes. Furthermore, by incorporating syntactic information about n-gram relations, FRN is able to select features in a more computationally efficient manner than many multivariate and hybrid techniques.
INDEX TERMS
Natural language processing, machine learning, text mining, subspace selection, affective computing.
CITATION
Stephen France, Zhu Zhang, Ahmed Abbasi, "Selecting Attributes for Sentiment Classification Using Feature Relation Networks", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 3, pp. 447-462, March 2011, doi:10.1109/TKDE.2010.110
REFERENCES
[1] A. Abbasi and H. Chen, "CyberGate: A System and Design Framework for Text Analysis of Computer Mediated Communication," MIS Quarterly, vol. 32, no. 4, pp. 811-837, 2008.
[2] A. Abbasi, H. Chen, S. Thoms, and T. Fu, "Affect Analysis of Web Forums and Blogs Using Correlation Ensembles," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 9, pp. 1168-1180, Sept. 2008.
[3] A. Abbasi, H. Chen, and A. Salem, "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums," ACM Trans. Information Systems, vol. 26, no. 3,article no. 12, 2008.
[4] S. Argamon, C. Whitelaw, P. Chase, S.R. Hota, N. Garg, and S. Levitan, "Stylistic Text Classification Using Functional Lexical Features," J. Am. Soc. Information Science and Technology, vol. 58, no. 6, pp. 802-822, 2008.
[5] P.V. Balakrishnan, R. Gupta, and V.S. Jacobs, "Development of Hybrid Genetic Algorithms for Product Line Designs," IEEE Trans. Systems, Man, and Cybernetics, vol. 34, no. 1, pp. 468-483, Feb. 2004.
[6] A. Burgun and O. Bodenreider, "Comparing Terms, Concepts, and Semantic Classes in WordNet and the Unified Medical Language System," Proc. North Am. Assoc. Computational Linguistics Workshop, pp. 77-82, 2001.
[7] H. Cui, V. Mittal, and M. Datar, "Comparative Experiments on Sentiment Classification for Online Product Reviews," Proc. 21st AAAI Conf. Artificial Intelligence, pp. 1265-1270, 2006.
[8] S.R. Das and M.Y. Chen, "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, vol. 53, no. 9, pp. 1375-1388, 2007.
[9] A. Esuli and F. Sebastiani, "SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining," Proc. Fifth Conf. Language Resources and Evaluation, pp. 417-422, 2006.
[10] Z. Fei, J. Liu, and G. Wu, "Sentiment Classification Using Phrase Patterns," Proc. Fourth IEEE Int'l Conf. Computer Information Technology, pp. 1147-1152, 2004.
[11] G. Forman, "An Extensive Empirical Study of Feature Selection Metrics for Text Classification," J. Machine Learning Research, vol. 3, pp. 1289-1305, 2004.
[12] M. Gamon, "Sentiment Classification on Customer Feedback Data: Noisy Data, Large Feature Vectors, and the Role of Linguistic Analysis," Proc. 20th Int'l Conf. Computational Linguistics, pp. 841-847, 2004.
[13] M. Genereux and M. Santini, "Exploring the Use of Linguistic Features in Sentiment Analysis," Proc. Corpus Linguistics Conf., pp. 27-30, 2007.
[14] F. Fleuret, "Fast Binary Feature Selection with Conditional Mutual Information," J. Machine Learning Research, vol. 5, pp. 1531-1555, 2004.
[15] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002.
[16] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[17] M. Hall and L.A. Smith, "Feature Subset Selection: A Correlation Based Filter Approach," Proc. Fourth Int'l Conf. Neural Information Processing and Intelligent Information Systems, pp. 855-858, 1997.
[18] M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," Proc. ACM SIGKDD, pp. 168-177, 2004.
[19] S. Kim and E. Hovy, "Determining the Sentiment of Opinions," Proc. 20th Int'l Conf. Computational Linguistics, pp. 1367-1373, 2004.
[20] J. Li, R. Zheng, and H. Chen, "From Fingerprint to Writeprint," Comm. ACM, vol. 49, no. 4, pp. 76-82, 2006.
[21] H. Liu and H. Motada, Feature Extraction, Construction, and Selection—Data Mining Perspective. Kluwer Academic Publishers, 1998.
[22] H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 4, pp. 491-502, Apr. 2005.
[23] G. Mishne, "Experiments with Mood Classification," Proc. Stylistic Analysis of Text for Information Access Workshop, 2005.
[24] D. Mladenic, J. Brank, M. Grobelnik, and N. Milic-Frayling, "Feature Selection Using Linear Classifier Weights: Interaction with Classification Models," Proc. ACM SIGIR, pp. 234-241, 2004.
[25] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima, "Mining Product Reputations on the Web," Proc. ACM SIGKDD, pp. 341-349, 2002.
[26] T. Nasukawa and T. Nagano, "Text Analysis and Knowledge Mining System," IBM Systems J., vol. 40, no. 4, pp. 967-984, 2001.
[27] V. Ng, S. Dasgupta, and S.M.N. Arifin, "Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews," Proc. Conf. Computational Linguistics, Assoc. for Computational Linguistics, pp. 611-618, 2006.
[28] B. Pang, L. Lee, and S. Vaithyanathain, "Thumbs Up? Sentiment Classification Using Machine Learning Techniques," Proc. Conf. Empirical Methods in Natural Language Processing, pp. 79-86, 2002.
[29] B. Pang and L. Lee, "A Sentimental Education: Sentimental Analysis Using Subjectivity Summarization Based on Minimum Cuts," Proc. 42nd Ann. Meeting of the Assoc. Computational Linguistics, pp. 271-278, 2004.
[30] F. Peng, D. Schuurmans, V. Keselj, and S. Wang, "Automated Authorship Attribution with Character Level Language Models," Proc. 10th Conf. European Chapter of the Assoc. Computational Linguistics, 2003.
[31] A. Popescu and O. Etzioni, "Extracting Product Features and Opinions from Reviews," Proc. Human Language Technology, Empirical Methods in Natural Language Processing, pp. 339-346, 2005.
[32] E. Riloff and J. Wiebe, "Learning Extraction Patterns for Subjective Expressions," Proc. Conf. Empirical Methods in Natural Language Processing, pp. 105-112, 2003.
[33] E. Riloff, J. Wiebe, and T. Wilson, "Learning Subjective Nouns Using Extraction Pattern Bootstrapping," Proc. Seventh Conf. Natural Language Learning, pp. 25-32, 2003.
[34] E. Riloff, S. Patwardhan, and J. Wiebe, "Feature Subsumption for Opinion Analysis," Proc. Conf. Empirical Methods in Natural Language Processing, pp. 440-448, 2006.
[35] K. Tsutsumi, K. Shimada, and T. Endo, "Movie Review Classification Based on Multiple Classifier," Proc. 21st Pacific Asia Conf. Language, Information, and Computation, pp. 481-488, 2007.
[36] P.D. Turney and M.L. Littman, "Measuring Praise and Criticism: Inference of Semantic Orientation from Association," ACM Trans. Information Systems, vol. 21, no. 4, pp. 315-346, 2003.
[37] J. Wiebe, T. Wilson, and M. Bell, "Identifying Collocations for Recognizing Opinions," Proc. Assoc. for Computational Linguistics, European Chapter of the Assoc. for Computational Linguistics Workshop Collocation, 2001.
[38] J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin, "Learning Subjective Language," Computational Linguistics, vol. 30, no. 3, pp. 277-308, 2004.
[39] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, "Sentiment Analyzer: Extracting Sentiments about a Given Topic Using Natural Language Processing Techniques," Proc. Third IEEE Int'l Conf. Data Mining, pp. 427-434, 2003.
[40] L. Yu and H. Liu, "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution," Proc. 20th Int'l Conf. Machine Learning, pp. 856-863, 2003.
[41] L. Yu and H. Liu, "Efficient Feature Selection via Analysis of Relevance and Redundancy," J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[42] Z. Zhang, "Weighing Stars: Aggregating Online Product Reviews for Intelligent E-Commerce Applications," IEEE Intelligent Systems, vol. 23, no. 5, pp. 42-49, Sept. 2008.
[43] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.
[44] C.E. Shannon, "A Mathematical Theory of Communication," Bell Systems Technical J., vol. 27, no. 10, pp. 379-423, 1948.
[45] J.R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[46] A. Abbasi and H. Chen, "Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace," ACM Trans. Information Systems, vol. 26, no. 2,article no. 7, 2008.
[47] W. Bian and D. Tao, "Harmonic Mean for Subspace Selection," Proc. 19th Int'l Conf. Pattern Recognition, 2008.
[48] D. Tao, X. Li, X. Wu, and S.J. Maybank, "Geometric Mean for Subspace Selection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 260-274, Feb. 2009.
[49] D. Tao, X. Li, X. Wu, and S.J. Maybank, "General Averaged Divergence Analysis," Proc. Seventh IEEE Int'l Conf. Data Mining, pp. 302-311, 2007.
[50] T. Zhang, D. Tao, X. Li, and J. Yang, "Patch Alignment for Dimensionality Reduction," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 9, pp. 1299-1313, Sept. 2009.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool