This Article 
 Bibliographic References 
 Add to: 
A Statistical Language Modeling Approach to Online Deception Detection
August 2008 (vol. 20 no. 8)
pp. 1077-1081

Abstract—Online deception is disrupting our daily life, organizational process, and even national security. Existing approaches to online deception detection follow a traditional paradigm by using a set of cues as antecedents for deception detection, which may be hindered by ineffective cue identification. Motivated by the strength of statistical language models (SLMs) in capturing the dependency of words in text without explicit feature extraction, we developed SLMs to detect online deception. We also addressed the data sparsity problem in building SLMs in general and in deception detection in specific using smoothing and vocabulary pruning techniques. The developed SLMs were evaluated empirically with diverse datasets. The results showed that the proposed SLM approach to deception detection outperformed a state-of-the-art text categorization method as well as traditional feature-based methods.

[1] D.B. Buller and J.K. Burgoon, “Interpersonal Deception Theory,” Comm. Theory, vol. 6, pp. 203-242, 1996.
[2] ICC Center, http:/, Aug. 2006.
[3] L. Zhou, D. Twitchell, J. Burgoon, T. Qin, and J. Nunamaker, “A Comparison of Classification Methods for Predicting Deception in Computer-Mediated Communication,” J. Management Information Systems, vol. 20, pp. 139-165, 2004.
[4] J.K. Burgoon, J.P. Blair, T. Qin, and J.F. Nunamaker, “Detecting Deception through Linguistic Analysis,” Proc. First NSF/NIJ Symp. Intelligence and Security Informatics, 2003.
[5] B.M. DePaulo, J.J. Lindsay, B.E. Malone, L. Muhlenbruck, K. Charlton, and H. Cooper, “Cues to Deception,” Psychological Bull., vol. 129, pp. 74-112, 2003.
[6] J.R. Carlson, J.F. George, J.K. Burgoon, M. Adkins, and C. White, “Deception in Computer-Mediated Communication,” Group Decision and Negotiation, vol. 24, pp. 5-28, 2004.
[7] L. Zhou, J.K. Burgoon, J.F. Nunamaker, and D. Twitchell, “Automated Linguistics Based Cues for Detecting Deception in Text-Based Asynchronous Computer-Mediated Communication: An Empirical Investigation,” Group Decision and Negotiation, vol. 13, pp. 81-106, 2004.
[8] F. Peng, D. Schuurmans, and S. Wang, “Augmenting Naive Bayes Classifiers with Statistical Language Models,” Information Retrieval, vol. 7, pp. 317-345, 2004.
[9] L. Zhou and D. Zhang, “A Heuristic Approach to Establishing Punctuation Convention in Instant Messaging,” IEEE Trans. Professional Comm., vol. 48, pp. 391-400, 2005.
[10] A.R. Dennis and S.T. Kinney, “Testing Media Richness Theory in the New Media: The Effects of Cues, Feedback, and Task Equivocality,” Information Systems Research, vol. 9, pp. 256-274, 1998.
[11] G. DeSanctis and R.B. Gallupe, “A Foundation for the Study of Group Decision Support Systems,” Management Science, vol. 33, pp. 589-609, 1987.
[12] E. Höfer, L. Akehurst, and G. Metzger, “Reality Monitoring: A Chance for Further Development of CBCA?” Proc. Ann. Meeting of the European Assoc. Psychology and Law, 1996.
[13] S. Porter and J.C. Yuille, “The Language of Deceit: An Investigation of the Verbal Clues to Deception in the Interrogation Context,” Law and Human Behavior, vol. 20, pp. 443-458, 1996.
[14] D.P. Biros, J.F. George, and R.W. Zmud, “Inducing Sensitivity to Deception in Order to Improve Decision Making Performance: A Field Study,” MIS Quarterly, vol. 26, pp. 119-144, 2002.
[15] F. Jelinek, “Self-Organizing Language Modeling for Speech Recognition,” Readings in Speech Recognition, A. Waibel and K.-F. Lee, eds., pp. 450-506, Morgan Kaufmann, 1990.
[16] S.F. Chen and R. Rosenfeld, “A Survey of Smoothing Techniques for ME Models,” IEEE Trans. Speech and Audio Processing, vol. 8, pp. 37-50, 2000.
[17] S.F. Chen and J. Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling,” Technical Report TR-10-98, Harvard Univ., Aug. 1998.
[18] R. Kneser and H. Ney, “Improved Backing-Off for N-Gram Language Modeling,” Proc. IEEE Int'l Conf. Acoustics, Speech and Signal Processing, 1995.
[19] A. Stolcke, “SRILM—An Extensible Language Modeling Toolkit,” Proc. Int'l Conf. Spoken Language Processing, 2002.
[20] J.E. McGrath, Groups: Interaction and Performance. Prentice Hall, 1984.
[21] J.K. Burgoon, J.P. Blair, and R.E. Strom, “Heuristics and Modalities in Determining Truth versus Deception,” Proc. 38th Hawaii Int'l Conf. System Sciences, 2005.
[22] D.P. Twitchell, K. Wiers, M. Adkins, J.K. Burgoon, and J.F. Nunamaker, “StrikeCOM: A Multi-Player Online Strategy Game for Researching and Teaching Group Dynamics,” Proc. 38th Hawaii Int'l Conf. System Sciences, 2005.
[23] Y. Yang and X. Liu, “A Re-Examination of Text Categorization Methods,” Proc. 22nd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 1999.
[24] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
[25] T. Joachims, “Making Large-Scale SVM Learning Practical,” Advances in Kernel Methods—Support Vector Learning, B. Schölkopf, C. Burges, and A.Smola, eds., pp. 169-184, MIT Press, 1999.
[26] T. Qin and J. Burgoon, “An Empirical Study on Dynamic Effects on Deception Detection,” Proc. IEEE Int'l Conf. Intelligence and Security Informatics (ISI), 2005.
[27] I. Fette, N. Sadeh, and A. Tomasic, “Learning to Detect Phishing Emails,” Proc. Int'l World Wide Web Conf., 2007.
[28] Phishingcorpus, PhishingCorpus , Feb. 2007.
[29] Spamassassin Public Corpus, http://spamassassin.pache.orgpubliccor pus /, Feb. 2007.
[30] T.K. Landauer, D. Laham, and P.W. Foltz, “Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor,” Automated Essay Scoring: A Cross-Disciplinary Perspective, M.D. Shermis and J.C.Burstein, eds., pp. 87-112, Lawrence Erlbaum Assoc., 2002.

Index Terms:
Machine learning, classification, language models, text mining, knowledge management applications, security
Lina Zhou, Yongmei Shi, Dongsong Zhang, "A Statistical Language Modeling Approach to Online Deception Detection," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 8, pp. 1077-1081, Aug. 2008, doi:10.1109/TKDE.2007.190624
Usage of this product signifies your acceptance of the Terms of Use.