This Article 
 Bibliographic References 
 Add to: 
Efficient Data Mining for Path Traversal Patterns
March/April 1998 (vol. 10 no. 2)
pp. 209-221

Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, we can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, we derive algorithms to determine the frequent traversal patterns—i.e., large reference sequences—from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted.

[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms, pp. 69-84, Oct. 1993.
[2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications,” Proc. 18th Conf. Very Large Databases, pp. 560–573, 1992.
[3] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[4] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[5] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 1995 Int'l Conf. Data Eng., pp. 3-14, Mar. 1995.
[6] T.M. Anwar, H.W. Beck, and S.B. Navathe, "Knowledge Mining by Imprecise Querying: A Classification-Based Approach," Proc. Eighth Int'l Conf. Data Eng., pp. 622-630, Feb. 1992.
[7] T. Berners-Lee, R. Fiekding, and H. Frystyk, "Hypertext Transfer Protocol-HTTP/1.0," Internet Draft, Feb. 1996.
[8] M. Bieber and J. Wan, "Backtracking in a Multiple-Window Hypertext Environment," ACM European Conf. Hypermedia Technology, pp. 158-166, 1994.
[9] B. Kain, “Pragmatics of Reuse in the Enterprise,” Object Magazine, pp. 55-58, Feb. 1994.
[10] L.D. Catledge and J.E. Pitkow, "Characterizing Browsing Strategies in the World-Wide Web," Proc. Third WWW Conf., Apr. 1995.
[11] J. December and N. Randall, The World Wide Web Unleashed, SAMS Publishing, 1994.
[12] J. Han, Y. Cai, and N. Cercone, “Knowledge Discovery in Databases: an Attribute-Oriented Approach,” Proc. 18th Conf. Very Large Databases, pp. 547–559, 1992.
[13] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
[14] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
[15] J.-S. Park, M.-S. Chen, and P.S. Yu, Using a Hash-Based Method with Transaction Trimming for Mining Association Rules IEEE Trans. Knowledge and Data Eng., vol. 9, no. 5, pp. 813-825, Oct. 1997.
[16] G. Piatetsky-Shapiro, "Discovery, Analysis, and Presentation of Strong Rules," Knowledge Discovery in Databases, pp. 229-248, 1991.
[17] J.R. Quinlan,"Induction of decision trees," Machine Learning, vol. 1, pp. 81-106, 1986.
[18] N.R. Trio personal communication, May 1995.
[19] J.T.-L. Wang, G.-W. Chirn, T.G. Marr, B. Shapiro, D. Shasha, and K. Zhang, "Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results," Proc. ACM SIGMOD, Minneapolis, pp. 115-125, May 1994.
[20] K.-L. Wu, P.S. Yu, and A. Ballman, "SpeedTracer: A Web Usage Mining and Analysis Tool," IBM Systems J., vol. 37, no. 1, pp. 89-105, Jan. 1998.

Index Terms:
Data mining, traversal patterns, distributed information system, World Wide Web, performance analysis.
Ming-Syan Chen, Jong Soo Park, Philip S. Yu, "Efficient Data Mining for Path Traversal Patterns," IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, pp. 209-221, March-April 1998, doi:10.1109/69.683753
Usage of this product signifies your acceptance of the Terms of Use.