The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2008 vol.20)
pp: 752-767
ABSTRACT
There is significant interest in the data mining and network management communities about the need to improve existing techniques for clustering multi-variate network traffic flow records so that we can quickly infer underlying traffic patterns. In this paper we investigate the use of clustering techniques to identify interesting traffic patterns from network traffic data in an efficient manner. We develop a framework to deal with mixed type attributes including numerical, categorical and hierarchical attributes for a one-pass hierarchical clustering algorithm. We demonstrate the improved accuracy and efficiency of our approach in comparison to previous work on clustering network traffic.
INDEX TERMS
Traffic analysis, Network management, Network monitoring, Clustering, classification, and association rules
CITATION
Abdun Naser Mahmood, Christopher Leckie, Parampalli Udaya, "An Efficient Clustering Scheme to Exploit Hierarchical Data in Network Traffic Analysis", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 6, pp. 752-767, June 2008, doi:10.1109/TKDE.2007.190725
REFERENCES
[1] A. Kuman, M. Sung, J. Xu, and J. Wang, “Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution,” Proc. ACM SIGMETRICS, 2004.
[2] A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. Kolaczyk, and N. Taft, “Structural Analysis of Network Traffic Flows,” Proc. ACM SIGMETRICS '04, June 2004.
[3] A. Lakhina, M. Crovella, and C. Diot, “Characterization of Network-Wide Anomalies in Traffic Flows,” Technical Report BUCS-2004-020, Boston Univ., 2004.
[4] A. Medina, K. Salamatian, N. Taft, I. Matta, and C. Diot, “A Two-Step Statistical Approach for Inferring Network Traffic Demands,” revision of Technical Report BUCS-TR-2003-003, Mar. 2004.
[5] A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, and C. Diot, “Traffic Matrix Estimation: Existing Techniques and New Directions,” Proc. ACM SIGCOMM '02, Aug. 2002.
[6] B. Roh and S. Yoo, “A Novel Detection Methodology of Network Attack Symptoms at Aggregate Traffic Level on Highspeed Internet Backbone Links,” Proc. Telecomm. and Networking, 2004.
[7] C. Estan and G. Varghese, “New Directions in Traffic Measurement and Accounting,” Proc. ACM SIGCOMM Internet Measurement Workshop, pp. 75-80, Nov. 2001.
[8] C. Estan, S. Savage, and G. Varghese, “Automatically Inferring Patterns of Resource Consumption in Network Traffic Problem,” Proc. ACM SIGCOMM, 2003.
[9] C. Tebaldi and M. West, “Bayesian Inference of Network Traffic Using Link Count Data,” J. Am. Statistical Assoc., pp. 557-573, June 1998.
[10] Ethereal: A Network Protocol Analyzer, http:/www.ethereal. com/, 2007.
[11] G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava, “Diamond in the Rough: Finding Hierarchical Heavy Hitters in Multi-Dimensional Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 155-166, June 2004.
[12] G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava, “Finding Hierarchical Heavy Hitters in Data Streams,” Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 464-475, 2003.
[13] Network Visualization Tools, http://www.caida.org/fundin/ internetatlas/ vizviztools.html, 2007.
[14] AutoFocus, http://www.caida.org/tools/measurementautofocus , 2007.
[15] cflowd: Traffic Flow Analysis Tool, http://www.caida.org/tools/measurementcflowd /, 2007.
[16] MIT Lincoln Laboratory-1998 DARPA Intrusion Detection Evaluation Data Set, http://www.ll.mit.edu/IST/ideval/data/1998 1998_data_index.htm, 1998.
[17] Network Monitoring Tools, http://www.slac.stanford.edu/xorg/nmtfnmtf-tools.html , 2007.
[18] Flow-Tools, http://www.splintered.net/swflow-tools/, 2007.
[19] J. Bolot, “Characterizing End-to-End Packet Delay and Loss in the Internet,” J. High Speed Networks, vol. 2, no. 3, pp. 289-298, 1993.
[20] J. Cao, D. Davis, S. Vander Weil, and B. Yu, “Time-Varying Network Tomography,” J. Am. Statistical Assoc., 2000.
[21] K. Claffy, G.C. Pluyzos, and H.W. Braun, “Applications of Sampling Methodologies to Network Traffic Characterization,” Proc. ACM SIGCOMM, 1993.
[22] K. Lan and J. Heidemann, “On the Correlation of Internet Flow Characteristics,” Technical Report ISI-TR-574, USC/Information Sciences Inst., July 2003.
[23] K. Ramah, H. Ayari, and F. Kamoun, “Traffic Anomaly Detection and Characterization in the Tunisian National University Network,” Proc. Networking, 2006.
[24] K. Wang and S. Stolfo, “Anomalous Payload-Based Network Intrusion Detection,” Proc. Seventh Int'l Symp. Recent Advanced in Intrusion Detection (RAID '04), pp. 201-222, Sept. 2004.
[25] M. Cai, J. Pan, Y. Kwok, and K. Hwang, “Fast and Accurate Traffic Matrix Measurement Using Adaptive Cardinality Counting,” Proc. SIGCOMM Workshops '05, Aug. 2005.
[26] M. Crosbie, B. Dole, T. Ellis, I. Krsul, and E. Spafford, “IDIOT—Users Guide. The COAST Project,” Technical Report TR-96-050, Dept. Computer Science, Purdue Univ., Sept. 1996.
[27] M. Mahoney, “Network Traffic Anomaly Detection Based on Packet Bytes,” Proc. ACM Symp. Applied Computing, 2003.
[28] M.S. Kim, H.J. Kang, S.C. Hung, S.H. Chung, and J.W. Hong, “A Flow-Based Method for Abnormal Network Traffic Detection,” Proc. IEEE/IFIP Network Operations and Management Symp., Apr. 2004.
[29] M. Sloman, Network and Distributed Systems Management. Addison-Wesley, 1994.
[30] M. Shyu, S. Chen, K. Sarinnapakorn, and L. Chang, “A Novel Anomaly Detection Scheme Based on Principal Component Classifier,” Proc. IEEE Foundations and New Directions of Data Mining Workshop, pp. 172-179, 2003.
[31] N. Alon, Y. Matias, and M. Szegedy, “The Space Complexity of Approximating the Frequency Moments,” Proc. ACM Symp. Theory of Computing, pp. 20-29, 1996.
[32] N. Duffield, C. Lund, and M. Thorup, “Charging From Sampled Network Usage,” Internet Measurement Workshop, pp. 245-256, 2001.
[33] O. Goldschmidt, “ISP Backbone Traffic Inference Methods to Support Traffic Engineering,” Proc. Internet Statistics and Metrics Analysis Workshop (ISMA '00), Dec. 2000.
[34] P. Barford, J. Kline, D. Plonka, and A. Ron, “A Signal Analysis of Network Traffic Anomalies,” Proc. Internet Measurement Workshop, 2002.
[35] P. Chhabra, A. John, and H. Saran, “PISA: Automatic Extraction of Traffic Signatures,” Proc. Networking, pp. 730-742, 2005.
[36] P. Huang, A. Feldmann, and W. Willinger, “A Non-Instrusive, Wavelet-Based Approach to Detecting Network Performance Problems,” Proc. Internet Measurement Workshop, pp. 213-227, 2001.
[37] Packet Data Mining and Network Forensics Support, https://wpdn.wildpackets.com/articles2006-05-09.php , 2006.
[38] R. Addie, M. Zukerman, and T. Neame, “Broadband Traffic Modeling: Simple Solutions to Hard Problems,” IEEE Comm. Magazine, vol. 36, pp. 2-9, 1998.
[39] An Architecture for IP Address Allocation with CIDR, RFC 1518, http://tools.ietf.org/html1518, 2007.
[40] Classless Inter-Domain Routing (CIDR): An Address Assignment and Aggregation Strategy, RFC 1519, http://tools.ietf.org/html1519, 2007.
[41] S. Kumar and E.H. Spafford, “A Pattern Matching Model for Misuse Intrusion Detection,” Proc. 17th Nat'l Computer Security Conf., pp. 11-21, 1994.
[42] S. Kim, A. Reddy, and M. Vannucci, “Detecting Traffic Anomalies at the Source through Aggregate Analysis of Packet Header Data,” Proc. Networking, 2004.
[43] T. Zhang, R. Ramakrishnan, and M. Livny, “Birch: An Efficient Data Clustering Method for Very Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 103-114, 1996.
[44] V. Paxson, “Bro: A System for Detecting Network Intruders in Real-Time,” Proc. Seventh Usenix Security Symp., Jan. 1988.
[45] V. Paxson, “End-to-End Internet Packet Dynamics,” Proc. ACM SIGCOMM Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm., 1997.
[46] W. Willinger and V. Paxson, “Where Mathematics Meets the Internet,” Notices of the Am. Math. Soc., vol. 45, no. 8, pp. 961-970, 1998.
[47] Y. Vardi, “Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data,” J. Am. Statistical Assoc., pp.365-377, 1996.
[48] Y. Zhang and N. Duffield, “On the Constancy of Internet Path Properties,” Proc. Internet Measurement Workshop, pp. 197-211, 2001.
[49] 1998 Training Data Attack Schedule, MIT Lincoln Laboratory— DARPA Intrusion Detection Evaluation, http://www.ll.mit.edu/IST/ideval/docs/1998 attacks.html, 1998.
[50] K. Kendall, “A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems,” master's thesis, Dept. Electrical Eng. and Computer Science, MIT, pp. 51-53, June 1999.
[51] K. Burbeck and S. Nadjm-Tehrani, “ADWICE: Anomaly Detection with Real-Time Incremental Clustering,” Proc. Seventh Int'l Conf. Information Security and Cryptology (ICISC '04), Dec. 2004.
[52] P. Ganesan, H. Garcia-Molina, and J. Widom, “Exploiting Hierarchical Domain Structure to Compute Similarity,” ACM Trans. Information Systems, vol. 21, no. 1, pp. 64-93, 2003.
[53] A. Joshi and K. Joshi, “On Mining Web Access Logs,” Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 63-69, 2000.
[54] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2006.
[55] P. Berkhin, “Survey of Clustering Data Mining Techniques,” Technical Report 10: 92-1460, Accrue Software, 2002.
[56] R. Xu and D. Wunsch, “Survey of Clustering Algorithms,” IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645-678, 2005.
[57] T. Itoh, H. Takakura, A. Sawada, and K. Koyamada, “Hierarchical Visualization of Network Intrusion Detection Data,” IEEE Computer Graphics and Applications, vol. 26, no. 2, pp. 40-47, 2006.
[58] J. Wang, D. Miller, and G. Kesidis, “Efficient Mining of the Multidimensional Traffic Cluster Hierarchy for Digesting, Visualization, and Anomaly Identification,” IEEE J. Selected Areas of Comm., vol. 24, no. 10, pp. 1929-1941, Oct. 2006.
42 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool