Issue No.05 - May (2011 vol.23)

pp: 713-726

Jae-Gil Lee , Korea Advanced Institute of Science and Technology (KAIST), Daejeon

Jiawei Han , University of Illinois at Urbana-Champaign, Urbana

Xiaolei Li , Microsoft, Bellevue

Hong Cheng , The Chinese University of Hong Kong, Hong Kong

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.153

ABSTRACT

Classification has been used for modeling many kinds of data sets, including sets of items, text documents, graphs, and networks. However, there is a lack of study on a new kind of data, trajectories on road networks. Modeling such data is useful with the emerging GPS and RFID technologies and is important for effective transportation and traffic planning. In this work, we study methods for classifying trajectories on road networks. By analyzing the behavior of trajectories on road networks, we observe that, in addition to the locations where vehicles have visited, the order of these visited locations is crucial for improving classification accuracy. Based on our analysis, we contend that (frequent) sequential patterns are good feature candidates since they preserve this order information. Furthermore, when mining sequential patterns, we propose to confine the length of sequential patterns to ensure high efficiency. Compared with closed sequential patterns, these partial (i.e., length-confined) sequential patterns allow us to significantly improve efficiency almost without losing accuracy. In this paper, we present a framework for frequent pattern-based classification for trajectories on road networks. Our comparative study over a broad range of classification approaches demonstrates that our method significantly improves accuracy over other methods in some synthetic and real trajectory data.

INDEX TERMS

Trajectory classification, frequent pattern-based classification, road network analysis, sequential patterns.

CITATION

Jae-Gil Lee, Jiawei Han, Xiaolei Li, Hong Cheng, "Mining Discriminative Patterns for Classifying Trajectories on Road Networks",

*IEEE Transactions on Knowledge & Data Engineering*, vol.23, no. 5, pp. 713-726, May 2011, doi:10.1109/TKDE.2010.153REFERENCES

- [1] H. Cao, N. Mamoulis, and D.W. Cheung, "Discovery of Periodic Patterns in Spatiotemporal Sequences,"
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 4, pp. 453-467, Apr. 2007.- [2] F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi, "Trajectory Pattern Mining,"
Proc. ACM SIGKDD, pp. 330-339, Aug. 2007.- [3] G. Gidófalvi and T.B. Pedersen, "Mining Long, Sharable Patterns in Trajectories of Moving Objects,"
GeoInformatica, vol. 13, no. 1, pp. 27-55, 2009.- [4] J. Gudmundsson and M.J. Kreveld, "Computing Longest Duration Flocks in Trajectory Data,"
Proc. 14th ACM Int'l Symp. Geographic Information Systems, pp. 35-42, Nov. 2006.- [5] J.-G. Lee, J. Han, and K.-Y. Whang, "Trajectory Clustering: A Partition-and-Group Framework,"
Proc. ACM SIGMOD, pp. 593-604, June 2007.- [6] J.-G. Lee, J. Han, and X. Li, "Trajectory Outlier Detection: A Partition-and-Detect Framework,"
Proc. 24th Int'l Conf. Data Eng., pp. 140-149, Apr. 2008.- [7] J.-G. Lee, J. Han, X. Li, and H. Gonzalez, "TraClass: Trajectory Classification Using Hierarchical Region-Based and Trajectory-Based Clustering,"
Proc. VLDB Endowment, vol. 1, no. 1, pp. 1081-1094, 2008.- [8] X. Li, J. Han, S. Kim, and H. Gonzalez, "ROAM: Rule- and Motif-Based Anomaly Detection in Massive Moving Object Data Sets,"
Proc. SIAM Int'l Conf. Data Mining, Apr. 2007.- [9] N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D.W. Cheung, "Mining, Indexing, and Querying Historical Spatiotemporal Data,"
Proc. ACM SIGKDD, pp. 236-245, Aug. 2004.- [10] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma, "Mining Interesting Locations and Travel Sequences from GPS Trajectories,"
Proc. 18th Int'l Conf. World Wide Web, pp. 791-800, Apr. 2009.- [11] R. Fraile and S.J. Maybank, "Vehicle Trajectory Approximation and Classification,"
Proc. Ninth British Machine Vision Conf., pp. 832-840, Sept. 1998.- [12] J. Krumm and E. Horvitz, "Predestination: Inferring Destinations from Partial Trajectories,"
Proc. Eighth Int'l Conf. Ubiquitous Computing, pp. 243-260, Sept. 2006.- [13] D.J. Patterson, L. Liao, K. Gajos, M. Collier, N. Livic, K. Olson, S. Wang, D. Fox, and H.A. Kautz, "Opportunity Knocks: A System to Provide Cognitive Assistance with Transportation Services,"
Proc. Sixth Int'l Conf. Ubiquitous Computing, pp. 433-450, Sept. 2004.- [14] H. Cheng, X. Yan, J. Han, and C.-W. Hsu, "Discriminative Frequent Pattern Analysis for Effective Classification,"
Proc. 23rd Int'l Conf. Data Eng., pp. 716-725, Apr. 2007.- [15] H. Cheng, X. Yan, J. Han, and P.S. Yu, "Direct Discriminative Pattern Mining for Effective Classification,"
Proc. 24th Int'l Conf. Data Eng., pp. 169-178, Apr. 2008.- [16] M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis, "Frequent Substructure-Based Approaches for Classifying Chemical Compounds,"
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 8, pp. 1036-1050, Aug. 2005.- [17] C.S. Leslie, E. Eskin, and W.S. Noble, "The Spectrum Kernel: A String Kernel for SVM Protein Classification,"
Proc. Seventh Pacific Symp. Biocomputing, pp. 566-575, Jan. 2002.- [18] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, "Text Classification Using String Kernels,"
J. Machine Learning Research, vol. 2, pp. 419-444, 2002.- [19] V.N. Vapnik,
Statistical Learning Theory. John Wiley & Sons, 1998.- [20] G. Cong, K.-L. Tan, A.K.H. Tung, and X. Xu, "Mining Top-K Covering Rule Groups for Gene Expression Data,"
Proc. ACM SIGMOD, pp. 670-681, June 2005.- [21] W. Li, J. Han, and J. Pei, "CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules,"
Proc. First IEEE Int'l Conf. Data Mining, pp. 369-376, Nov. 2001.- [22] B. Liu, W. Hsu, and Y. Ma, "Integrating Classification and Association Rule Mining,"
Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 80-86, Aug. 1998.- [23] X. Yin and J. Han, "CPAR: Classification Based on Predictive Association Rules,"
Proc. Third SIAM Int'l Conf. Data Mining, May 2003.- [24] R. Agrawal and R. Srikant, "Mining Sequential Patterns,"
Proc. 11th Int'l Conf. Data Eng., pp. 3-14, Mar. 1995.- [25] X. Yan, J. Han, and R. Afshar, "CloSpan: Mining Closed Sequential Patterns in Large Databases,"
Proc. Third SIAM Int'l Conf. Data Mining, May 2003.- [26] H. Hu, D.L. Lee, and V.C.S. Lee, "Distance Indexing on Road Networks,"
Proc. 32nd Int'l Conf. Very Large Data Bases, pp. 894-905, Sept. 2006.- [27] R. Agrawal, T. Imielinski, and A.N. Swami, "Mining Association Rules between Sets of Items in Large Databases,"
Proc. ACM SIGMOD, pp. 207-216, May 1993.- [28] Y. Yang and J.O. Pedersen, "A Comparative Study on Feature Selection in Text Categorization,"
Proc. 14th Int'l Conf. Machine Learning, pp. 412-420, July 1997.- [29] Y.-W. Chen and C.-J. Lin, "Combining SVMs with Various Feature Selection Strategies,"
Feature Extraction: Foundations and Applications, I. Guyon, S. Gunn, M. Nikravesh, and L.A. Zadeh, eds., pp. 315-323, Springer, 2006.- [30] V. Sindhwani, P. Bhattacharya, and S. Rakshit, "Information Theoretic Feature Crediting in Multiclass Support Vector Machines,"
Proc. First SIAM Int'l Conf. Data Mining, Apr. 2001.- [31] J. Han and M. Kamber,
Data Mining: Concepts and Techniques, second ed. Morgan Kaufmann, 2006.- [32] C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machines," http://www.csie.ntu.edu.tw/~cjlinlibsvm, 2001.
- [33] T. Brinkhoff, "A Framework for Generating Network-Based Moving Objects,"
GeoInformatica, vol. 6, no. 2, pp. 153-180, 2002.- [34] L.R. Rabiner and B.H. Juang, "An Introduction to Hidden Markov Models,"
IEEE ASSP Magazine, vol. 3, no. 1, pp. 4-16, Jan. 1986.- [35] P. Geurts, "Pattern Extraction for Time Series Classification,"
Proc. Fifth European Conf. Principles of Data Mining and Knowledge Discovery, pp. 115-127, Sept. 2001.- [36] E.J. Keogh and M.J. Pazzani, "An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback,"
Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 239-243, Aug. 1998.- [37] S. Gaffney and P. Smyth, "Trajectory Clustering with Mixtures of Regression Models,"
Proc. Fifth ACM SIGKDD, pp. 63-72, Aug. 1999. |