This Article 
 Bibliographic References 
 Add to: 
Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD)
October-December 2006 (vol. 3 no. 4)
pp. 301-311
An effective approach to phishing Web page detection is proposed, which uses Earth Mover's Distance (EMD) to measure Web page visual similarity. We first convert the involved Web pages into low resolution images and then use color and coordinate features to represent the image signatures. We use EMD to calculate the signature distances of the images of the Web pages. We train an EMD threshold vector for classifying a Web page as a phishing or a normal one. Large-scale experiments with 10,281 suspected Web pages are carried out to show high classification precision, phishing recall, and applicable time performance for online enterprise solution. We also compare our method with two others to manifest its advantage. We also built up a real system which is already used online and it has caught many real phishing cases.

[1] Anti-Phishing Group of the City University of Hong Kong, http:/, 2005.
[2] Anti-Phishing Working Group, http:/, 2005.
[3] A. Broder, S. Glassman, M. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Proc. Sixth Int'l World Wide Web Conf., pp.391-404, 1997.
[4] Y. Chen, W.Y. Ma, and H.J. Zhang, “Detecting Web Page Structure for Adaptive Viewing on Small Form Factor Devices,” Proc. 12th Int'l Conf. World Wide Web, pp. 225-233, 2003.
[5] A. Chowdhury, O. Frieder, D. Grossman, and M. McCabe, “Collection Statistics for Fast Duplicate Document Detection,” ACM Trans. Information Systems, vol. 20, no. 2, pp. 171-191, 2002.
[6] S. Cohen and L. Guibas, “The Earth Mover's Distance under Transformation Sets,” Proc. IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1076-1083, 1999.
[7] R. Dhamija and J.D. Tygar, “The Battle Against Phishing: Dynamic Security Skins,” Proc. Symp. Usable Privacy and Security, 2005.
[8] A.Y. Fu, X. Deng, and W. Liu, “A Potential IRI Based Phishing Strategy,” Proc. Sixth Int'l Conf. Web Information Systems Eng. (WISE '05), pp. 618-619, Nov. 2005.
[9] A.Y. Fu, , 2005.
[10] K. Grauman and T. Darrell, “Fast Contour Matching Using Approximate Earth Mover's Distance,” Proc. 2004 IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 220-227, 2004.
[11] X.D. Gu, J.L. Chen, W.Y. Ma, and G.L. Chen, “Visual Based Content Understanding towards Web Adaptation,” Proc. Second Int'l Conf. Adaptive Hypermedia and Adaptive Web-Based Systems, pp.29-31, 2002.
[12] F.S. Hillier and G.J. Liberman, Introduction to Mathematical Programming. McGraw-Hill, 1990.
[13] F.L. Hitchcock, “The Distribution of a Product from Several Sources to Numerous Localities,” J. Math. Physics, vol. 20, pp. 224-230, 1941.
[14] T.C. Hoad and J. Zobel, “Methods for Identifying Versioned and Plagiarized Documents,” J. Am. Soc. Information Science and Technology, vol. 54, no. 3, pp. 203-215, 2003.
[15] C.R. John, The Image Processing Handbook, second ed. CRC Press, 1995.
[16] E. Levina and P. Bickel, “The Earth Mover's Distance is the Mallows Distance: Some Insights from Statistics,” Proc. IEEE Int'l Conf. Computer Vision, vol. 2, 2001.
[17] W. Liu, X. Deng, G. Huang, and A.Y. Fu, “An Anti-Phishing Strategy Based on Visual Similarity Assessment,” IEEE Internet Computing, vol. 10, no. 2, pp. 58-65, 2006.
[18] W. Liu, G. Huang, X. Liu, M. Zhang, and X. Deng, “Detection of Phishing Web Pages Based on Visual Similarity,” Proc. 14th Int'l World Wide Web Conf., pp. 1060-1061, 2005.
[19] W. Liu, G. Huang, X. Liu, M. Zhang, and X. Deng, “Phishing Web Page Detection,” Proc. Eighth Int'l Conf. Documents Analysis and Recognition, pp. 560-564, 2005.
[20] T. Nanno, S. Saito, and M. Okumura, “Structuring Web Pages Based on Repetition of Elements,” Proc. Seventh Int'l Conf. Document Analysis and Recognition, 2003.
[21] Netscape Corp., The SSL Protocol,, 2005.
[22] Y. Rubner, C. Tomasi, and L.J. Guibas, “The Earth Mover's Distance as a Metric for Image Retrieval,” Technical Report STAN-CS-TN-98-86, Dept. of Computer Science, Stanford Univ., 1998.
[23] Y. Rubner, C. Tomasi, and L.J. Guibas, “A Metric for Distributions with Applications to Image Databases,” Proc. IEEE Int'l Conf. Computer Vision, pp. 59-66, 1998.
[24] G. Salton, A. Wong, and C.S. Yang, “A Vector Space Model for Information Retrieval,” J. Am. Soc. Information Science, vol. 18, no. 11, pp. 613-620, 1975.
[25] L. Wood, Document Object Model Level 1 Specification, http:/, 2005.
[26] M. Wu, R.C. Miller, and G. Little, “Web Wallet: Preventing Hishing Attacks by Revealing User Intentions,” Proc. Symp. Usable Privacy and Security, 2006.
[27] S. Yu, D. Cai, J.R. Wen, and W.Y. Ma, “Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation,” Proc. 14th Int'l Conf. World Wide Web, pp. 11-18, 2003.

Index Terms:
Antiphishing, visual assessment, Earth Mover's Distance.
Anthony Y. Fu, Liu Wenyin, Xiaotie Deng, "Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD)," IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 4, pp. 301-311, Oct.-Dec. 2006, doi:10.1109/TDSC.2006.50
Usage of this product signifies your acceptance of the Terms of Use.