The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2013 vol.62)
pp: 1193-1206
Silvio Cesare , Deakin University, Victoria
Yang Xiang , Deakin University, Victoria
Wanlei Zhou , Deakin University, Victoria
Signature-based malware detection systems have been a much used response to the pervasive problem of malware. Identification of malware variants is essential to a detection system and is made possible by identifying invariant characteristics in related samples. To classify the packed and polymorphic malware, this paper proposes a novel system, named Malwise, for malware classification using a fast application-level emulator to reverse the code packing transformation, and two flowgraph matching algorithms to perform classification. An exact flowgraph matching algorithm is employed that uses string-based signatures, and is able to detect malware with near real-time performance. Additionally, a more effective approximate flowgraph matching algorithm is proposed that uses the decompilation technique of structuring to generate string-based signatures amenable to the string edit distance. We use real and synthetic malware to demonstrate the effectiveness and efficiency of Malwise. Using more than 15,000 real malware, collected from honeypots, the effectiveness is validated by showing that there is an 88 percent probability that new malware is detected as a variant of existing malware. The efficiency is demonstrated from a smaller sample set of malware where 86 percent of the samples can be classified in under 1.3 seconds.
Malware, Flow graphs, Entropy, Databases, Emulation, Classification algorithms, Approximation algorithms, unpacking, Computer security, malware, control flow, structural classification, structured control flow
Silvio Cesare, Yang Xiang, Wanlei Zhou, "Malwise—An Effective and Efficient Classification System for Packed and Polymorphic Malware", IEEE Transactions on Computers, vol.62, no. 6, pp. 1193-1206, June 2013, doi:10.1109/TC.2012.65
[1] "Symantec Internet Security Threat Report: Volume XII," Symantec, 2008.
[2] "F-Secure Reports Amount of Malware Grew by 100 Percent during 2007," F-Secure, pressroom/news/2007fs_news_20071204_1_eng. html , 2007.
[3] K. Griffin, S. Schneider, X. Hu, and T. Chiueh, "Automatic Generation of String Signatures for Malware Detection," Proc. 12th Int'l Symp. Recent Advances in Intrusion Detection (RAID '09), 2009.
[4] J.O. Kephart and W.C. Arnold, "Automatic Extraction of Computer Virus Signatures," Proc. Int'l Conf. Fourth Virus Bull., pp. 178-184, 1994.
[5] J.Z. Kolter and M.A. Maloof, "Learning to Detect Malicious Executables in the Wild," Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 470-478, 2004.
[6] M.E. Karim, A. Walenstein, A. Lakhotia, and L. Parida, "Malware Phylogeny Generation Using Permutations of Code," J. Computer Virology, vol. 1, pp. 13-23, 2005.
[7] M. Gheorghescu, "An Automated Virus Classification System," Proc. Virus Bull. Conf., pp. 294-300, 2005.
[8] Y. Ye, D. Wang, T. Li, and D. Ye, "IMDS: Intelligent Malware Detection System," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2007.
[9] E. Carrera and G. Erdélyi, "Digital Genome Mapping-Advanced Binary Malware Analysis," Proc. Virus Bull. Conf., pp. 187-197, 2004.
[10] T. Dullien and R. Rolles, "Graph-Based Comparison of Executable Objects (English Version)," Proc. SSTIC, 2005.
[11] I. Briones and A. Gomez, "Graphs, Entropy and Grid Computing: Automatic Comparison of Malware," Proc. Virus Bull. Conf., pp. 1-12, 2008.
[12] S. Cesare and Y. Xiang, "Classification of Malware Using Structured Control Flow," Proc. Eighth Australasian Symp. Parallel and Distributed Computing (AusPDC '10), 2010.
[13] G. Bonfante, M. Kaczmarek, and J.Y. Marion, "Morphological Detection of Malware," Proc. IEEE Int'l Conf. Malicious and Unwanted Software, pp. 1-8, 2008.
[14] R.T. Gerald and A.F. Lori, "Polymorphic Malware Detection and Identification via Context-Free Grammar Homomorphism," Bell Labs Technical J., vol. 12, pp. 139-147, 2007.
[15] X. Hu, T. Chiueh, and K.G. Shin, "Large-Scale Malware Indexing Using Function-Call Graphs," Proc. ACM Conf. Computer and Comm. Security, pp. 611-620, 2009.
[16] P. Royal, M. Halpin, D. Dagon, R. Edmonds, and W. Lee, "Polyunpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware," Proc. Computer Security Applications Conf., pp. 289-300, 2006.
[17] "Mal(ware)formation Statistics - Panda Research Blog," Panda Research, Mal_ 2800_ware_2900_formation-statistics.aspx , 2009.
[18] A. Stepan, "Improving Proactive Detection of Packed Malware," Proc. Virus Bull. Conf., 2006.
[19] M.G. Kang, P. Poosankam, and H. Yin, "Renovo: A Hidden Code Extractor for Packed Executables," Proc. Workshop Recurring Malcode, pp. 46-53, 2007.
[20] L. Boehne, "Pandora's Bochs: Automatic Unpacking of Malware," Diploma thesis, Univ. of Mannheim, 2008.
[21] T. Raffetseder, C. Kruegel, and E. Kirda, "Detecting System Emulators," Proc. Information Security Conf., p. 1, 2007.
[22] L. Sun, T. Ebringer, and S. Boztas, "Hump-and-Dump: Efficient Generic Unpacking Using an Ordered Address Execution Histogram," Proc. Int'l Computer Anti-Virus Researchers Organization (CARO) Workshop, 2008.
[23] T. Graf, "Generic Unpacking: How to Handle Modified or Unknown PE Compression Engines," Proc. Virus Bull. Conf., 2005.
[24] D. Quist and Valsmith, "Covert Debugging Circumventing Software Armoring Techniques," Proc. Black Hat Briefings, 2007.
[25] L. Martignoni, M. Christodorescu, and S. Jha, "Omniunpack: Fast, Generic, and Safe Unpacking of Malware," Proc. Ann. Computer Security Applications Conf. (ACSAC), pp. 431-441, 2007.
[26] A. Dinaburg, P. Royal, M. Sharif, and W. Lee, "Ether: Malware Analysis via Hardware Virtualization Extensions," Proc. 15th ACM Conf. Computer and Comm. Security, pp. 51-62, 2008.
[27] R. Perdisci, A. Lanzi, and W. Lee, "McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables," Proc. Ann. Computer Security Applications Conf., pp. 301-310, 2008.
[28] Y. Tang, B. Xiao, and X. Lu, "Signature Tree Generation for Polymorphic Worms," IEEE Trans. Computers, vol. 58, no. 4, pp. 565-579, Apr. 2011.
[29] M. Dalla Preda, R. Giacobazzi, S. Debray, K. Coogan, and G. Townsend, "Modelling Metamorphism by Abstract Interpretation," Proc. Int'l Conf. Static Analysis, R. Cousot and M. Martel, eds., pp. 218-235, 2011.
[30] Zynamics, VxClass, http://www.zynamics.comvxclass.html, 2009.
[31] D. Gao, M.K. Reiter, and D. Song, "Binhunt: Automatically Finding Semantic Differences in Binary Programs," Proc. Int'l Conf. Information and Comm. Security, pp. 238-255, 2008.
[32] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna, "Polymorphic Worm Detection Using Structural Information of Executables," Proc. Int'l Conf. Recent Advances in Intrusion Detection, pp. 207-226, 2006.
[33] J. Kinable and O. Kostakis, "Malware Classification Based on Call Graph Clustering," J. Computer Virology, vol. 7, pp. 233-245, 2011.
[34] R. Lyda and J. Hamrock, "Using Entropy Analysis to Find Encrypted and Packed Malware," IEEE Security and Privacy, vol. 5, no. 2, pp. 40-45, Mar./Apr. 2007.
[35] Y. Wu, T. Chiueh, and C. Zhao, "Efficient and Automatic Instrumentation for Packed Binaries," Proc. Int'l Conf. and Workshops Advances in Information Security and Assurance, pp. 307-316, 2009.
[36] I. Santos, X. Ugarte-Pedrero, B. Sanz, C. Laorden, and P.G. Bringas, "Collective Classification for Packed Executable Identification," Proc. Eighth Ann. Collaboration, Electronic Messaging, Anti-Abuse and Spam Conf. (CEAS '11), pp. 23-30, 2011.
[37] C. Kruegel, W. Robertson, F. Valeur, and G. Vigna, "Static Disassembly of Obfuscated Binaries," Proc. USENIX Security Symp., p. 18, 2004.
[38] C. Cifuentes, "Reverse Compilation Techniques," PhD thesis, Queensland Univ. of Tech nology, 1994.
[39] E. Moretti, G. Chanteperdrix, and A. Osorio, "New Algorithms for Control-Flow Graph Structuring," Proc. Conf. Software Maintenance and Reeng., 2001.
[40] T. Wei, J. Mao, W. Zou, and Y. Chen, "Structuring 2-Way Branches in Binary Executables," Proc. Int'l Computer Software and Applications Conf., 2007.
[41] R. Baeza-Yates and G. Navarro, "Fast Approximate String Matching in a Dictionary," Proc. South Am. Symp. String Processing and Information Retrieval (SPIR '98), pp. 14-22, 1998.
[42] A.V. Aho and M.J. Corasick, "Efficient String Matching: An Aid to Bibliographic Search," Comm. ACM, vol. 18, pp. 333-340, 1975.
[43] Offensive Computing, http:/, 2009.
[44] Mwcollect Alliance, http:/, 2009.
4 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool