The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2008 vol.20)
pp: 202-215
ABSTRACT
In this paper, we present a complete framework and findings in mining web usage patterns from Web log files of a real website that has all the challenging aspects of real life web usage mining, including evolving user profiles and external data describing an ontology of the web content. Even though the website under study is part of a non-profit organization that does not "sell" any products, it was crucial to understand "who" the users were, "what" they looked at, and "how their interests changed with time", all of which are important questions in Customer Relationship Management (CRM). Hence, we present an approach to discover and track evolving user profiles. We also describe how to enrich the discovered user profiles with explicit information need that is inferred from search queries extracted from Web log data. Profiles are also enriched with other domain specific information facets that give a panoramic view of the discovered mass usage modes. An objective validation strategy is also used to assess the quality of the mined profiles, and in particular, their adaptability in the face of evolving user behavior.
INDEX TERMS
User profiles and alert services, Data mining, Web mining, Personalization
CITATION
Olfa Nasraoui, Maha Soliman, Esin Saka, Antonio Badia, Richard Germain, "A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 2, pp. 202-215, February 2008, doi:10.1109/TKDE.2007.190667
REFERENCES
[1] R. Cooley, B. Mobasher, and J. Srivastava, “Web Mining: Information and Pattern Discovery on the World Wide Web,” Proc. Ninth IEEE Int'l Conf. Tools with AI (ICTAI '97), pp. 558-567, 1997.
[2] O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining Web Access Logs Using a Relational Clustering Algorithm Based on a Robust Estimator,” Proc. Eighth Int'l World Wide Web Conf. (WWW '99), pp.40-41, 1999.
[3] O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi, “Extracting Web User Profiles Using Relational Competitive Fuzzy Clustering,” Int'l J. Artificial Intelligence Tools, vol. 9, no. 4, pp.509-526, 2000.
[4] J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data,” SIGKDD Explorations, vol. 1, no. 2, pp. 1-12, Jan. 2000.
[5] M. Spiliopoulou and L.C. Faulstich, “WUM: A Web Utilization Miner,” Proc. First Int'l Workshop Web and Databases (WebDB '98), 1998.
[6] T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal, “From User Access Patterns to Dynamic Hypertext Linking,” Proc. Fifth Int'l World Wide Web Conf. (WWW '96), 1996.
[7] M. Perkowitz and O. Etzioni, “Adaptive Web Sites: Automatically Learning for User Access Pattern,” Proc. Sixth Int'l WWW Conf. (WWW '97), 1997.
[8] J. Borges and M. Levene, “Data Mining of User Navigation Patterns,” Web Usage Analysis and User Profiling, LNCS, H.A.Abbass, R.A. Sarker, and C.S. Newton, eds. pp. 92-111, Springer-Verlag, 1999.
[9] O. Zaiane, M. Xin, and J. Han, “Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs,” Proc. Advances in Digital Libraries (ADL '98), pp. 19-29, 1998.
[10] O. Nasraoui and R. Krishnapuram, “A New Evolutionary Approach to Web Usage and Context Sensitive Associations Mining,” Int'l J. Computational Intelligence and Applications, special issue on Internet intelligent systems, vol. 2, no. 3, pp. 339-348, Sept. 2002.
[11] O. Nasraoui, C. Cardona, C. Rojas, and F. Gonzalez, “Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm,” Proc. Workshop Web Mining as a Premise to Effective and Intelligent Web Applications (WebKDD '03), pp. 71-81, Aug. 2003.
[12] P. Desikan and J. Srivastava, “Mining Temporally Evolving Graphs,” Proc. Workshop Web Mining and Web Usage Analysis (WebKDD' 04), 2004.
[13] O. Nasraoui, C. Rojas, and C. Cardona, “A Framework for Mining Evolving Trends in Web Data Streams Using Dynamic Learning and Retrospective Validation,” Computer Networks, special issue on Web dynamics, vol. 50, no. 14, Oct. 2006.
[14] M.A. Maloof and R.S. Michalski, “Learning Evolving Concepts Using Partial Memory Approach,” Working Notes AAAI Fall Symp. Active Learning 1995, pp. 70-73, 1995.
[15] M.A. Maloof and R.S. Michalski, “Selecting Examples for Partial Memory Learning,” Machine Learning, vol. 41, no. 11, pp. 27-52, 2000.
[16] T. Mitchell, R. Caruana, D. Freitag, J. McDermott, and D. Zabowski, “Experience with a Learning Personal Assistant,” Comm. ACM, vol. 37, no. 7, pp. 80-91, 1994.
[17] D. Billsus and M.J. Pazzani, “A Hybrid User Model for News Classification,” Proc. Seventh Int'l Conf. User Modeling (UM '99), J.Kay, ed., pp. 99-108, 1999.
[18] I. Grabtree and S. Soltysiak, “Identifying and Tracking Changing Interests,” Int'l J. Digital Libraries, vol. 2, pp. 38-53,
[19] J. Schlimmer and R. Granger, “Incremental Learning from Noisy Data,” Machine Learning, vol. 1, no. 3, pp. 317-357, 1986.
[20] G. Widmer and M. Kubat, “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, vol. 23, pp. 69-101, 1996.
[21] I. Koychev, “Gradual Forgetting for Adaptation to Concept Drift,” Proc. ECAI Workshop Current Issues in Spatio-Temporal Reasoning '00, pp. 101-106, 2000.
[22] B. Mobasher, H. Dai, T. Luo, Y. Sung, and J. Zhu, “Integrating Web Usage and Content Mining for More Effective Personalization,” Proc. Int'l Conf. e-Commerce and Web Technologies (ECWeb '00), Sept. 2000.
[23] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 21st Int'l Conf. Very Large Data Bases (VLDB '95), Sept. 1995.
[24] S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan, “Using Taxonomy, Discriminants, and Signatures for Navigation in Text Databases,” Proc. 23rd Int'l Conf. Very Large Data Bases (VLDB '97), 1997.
[25] H. Dai and B. Mobasher, “Using Ontologies to Discover Domain-Level Web Usage Profiles,” Proc. Second ECML/PKDD Semantic Web Mining Workshop, 2002.
[26] D. Oberle, B. Berendt, A. Hotho, and J. Gonzalez, “Conceptual User Tracking,” Proc. First Int'l Atlantic Web Intelligence Conf. (AWIC '03), 2003.
[27] B. Berendt and M. Spiliopoulou, “Analysis of Navigational Behavior in Web Sites Integrating Multiple Information Systems,” VLDB J., vol. 9, no. 1, pp. 56-75, 2000.
[28] M. Eirinaki, H. Lampos, M. Vazirgiannis, and I. Varlamis, “SEWeP: Using Site Semantics and a Taxonomy to Enhance the Web Personalization Process,” Proc. ACM SIGKDD '03, Aug. 2003.
[29] M. Levene, J. Borges, and G. Loizou, “Zipf's Law for Web Surfers,” Knowledge and Information Systems, vol. 3, no. 1, pp. 120-129, Feb. 2001.
[30] O. Nasraoui and R. Krishnapuram, “A Novel Approach to Unsupervised Robust Clustering Using Genetic Niching,” Proc. Ninth IEEE Int'l Conf. Fuzzy Systems (FUZZ '00), pp. 170-175, May 2000.
[31] J.H. Holland, Adaptation in Natural and Artificial Systems. MIT Press, 1975.
[32] O. Resnik, “Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity and Natural Language,” J. Artificial Intelligence Research, vol. 11, pp. 95-130, 1999.
[33] Z. Wu and M. Palmer, “Verb Semantics and Lexical Selection,” Proc. 32nd Ann. Meeting of the Assoc. Computational Linguistics, pp.133-138, June 1994.
[34] D. Lin, “An Information-Theoretic Definition of Similarity,” Proc. 15th Int'l Conf. Machine Learning (ICML '98), 1998.
[35] C. Ziegler, G. Lausen, and L. Schmidt-Thieme, “Taxonomy-Driven Computation of Product Recommendations,” Proc. 13th ACM Conf. Information and Knowledge Management (CIKM '04), pp. 406-415, 2004.
[36] V. Cross, “Fuzzy Semantic Distance Measures between Ontological Concepts,” Proc. Ann. Meeting North Am. Fuzzy Information Processing Soc. (NAFIPS '04), pp. 392-397, June 2004.
[37] P. Ganesan, H. Garcia-Molina, and J. Widom, “Exploiting Hierarchical Domain Structure to Compute Similarity,” ACM Trans. Information Systems, vol. 21, no. 1, pp. 64-93, 2003.
[38] O. Nasraoui and S. Goswami, “Mining and Validating Localized Frequent Itemsets with Dynamic Tolerance,” Proc. Sixth SIAM Int'l Conf. Data Mining (SDM '06), pp. 578-582, Apr. 2006.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool