Subscribe
Issue No.01 - Jan. (2014 vol.26)
pp: 55-68
Miao Qiao , The Chinese University of Hong Kong, Hong Kong
Hong Cheng , The Chinese University of Hong Kong, Hong Kong
Lijun Chang , The Chinese University of Hong Kong, Hong Kong
Jeffrey Xu Yu , The Chinese University of Hong Kong, Hong Kong
ABSTRACT
Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding approach, which selects a set of graph nodes as landmarks and computes the shortest distances from each landmark to all nodes as an embedding. To answer a shortest distance query, the precomputed distances from the landmarks to the two query nodes are used to compute an approximate shortest distance based on the triangle inequality. In this paper, we analyze the factors that affect the accuracy of distance estimation in landmark embedding. In particular, we find that a globally selected, query-independent landmark set may introduce a large relative error, especially for nearby query nodes. To address this issue, we propose a query-dependent local landmark scheme, which identifies a local landmark close to both query nodes and provides more accurate distance estimation than the traditional global landmark approach. We propose efficient local landmark indexing and retrieval techniques, which achieve low offline indexing complexity and online query complexity. Two optimization techniques on graph compression and graph online search are also proposed, with the goal of further reducing index size and improving query accuracy. Furthermore, the challenge of immense graphs whose index may not fit in the memory leads us to store the embedding in relational database, so that a query of the local landmark scheme can be expressed with relational operators. Effective indexing and query optimization mechanisms are designed in this context. Our experimental results on large-scale social networks and road networks demonstrate that the local landmark scheme reduces the shortest distance estimation error significantly when compared with global landmark embedding and the state-of-the-art sketch-based embedding.
INDEX TERMS
Estimation, Complexity theory, Accuracy, Indexing, Query processing, Roads,query optimization, Local landmark embedding, least common ancestor, local search, graph compression
CITATION
Miao Qiao, Hong Cheng, Lijun Chang, Jeffrey Xu Yu, "Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme", IEEE Transactions on Knowledge & Data Engineering, vol.26, no. 1, pp. 55-68, Jan. 2014, doi:10.1109/TKDE.2012.253
REFERENCES
 [1] E.W. Dijkstra, "A Note on Two Problems in Connexion with Graphs," Numerische Mathematik, vol. 1, no. 1, pp. 269-271, 1959. [2] P.E. Hart, N.J. Nilsson, and B. Raphael, "A Formal Basis for the Heuristic Determination of Minimum Cost Paths," IEEE Trans. Systems Science and Cybernetics, vol. SSC-4, no. 2, pp. 100-107, July 1968. [3] A.V. Goldberg and C. Harrelson, "Computing the Shortest Path: ${\rm A}^{\ast}$ Search Meets Graph Theory," Proc. 16th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '05), pp. 156-165, 2005. [4] A.V. Goldberg, H. Kaplan, and R.F. Werneck, "Reach for ${\rm A}^{\ast}$ : Efficient Point-to-Point Shortest Path Algorithms," Proc. SIAM Workshop Algorithms Eng. and Experimentation, pp. 129-143, 2006. [5] P. Francis, S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang, "IDMaps: A Global Internet Host Distance Estimation Service," IEEE/ACM Trans. Networking, vol. 9, no. 5, pp. 525-540, Oct. 2001. [6] T.S.E. Ng and H. Zhang, "Predicting Internet Network Distance with Coordinates-Based Approaches," Proc. IEEE INFOCOM, pp. 170-179, 2002. [7] C. Shahabi, M. Kolahdouzan, and M. Sharifzadeh, "A Road Network Embedding Technique for K-Nearest Neighbor Search in Moving Object Databases," Proc. 10th ACM Int'l Symp. Advances in Geographic Information Systems (GIS '02), pp. 94-100, 2002. [8] J. Kleinberg, A. Slivkins, and T. Wexler, "Triangulation and Embedding Using Small Sets of Beacons," Proc. IEEE 45th Ann. Symp. Foundations of Computer Science (FOCS), pp. 444-453, 2004. [9] M. Thorup and U. Zwick, "Approximate Distance Oracles," J. ACM, vol. 52, no. 1, pp. 1-24, 2005. [10] M.J. Rattigan, M. Maier, and D. Jensen, "Using Structure Indices for Efficient Approximation of Network Properties," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 357-366, 2006. [11] H.-P. Kriegel, P. Kröger, M. Renz, and T. Schmidt, "Hierarchical Graph Embedding for Efficient Query Processing in Very Large Traffic Networks," Proc. 20th Int'l Conf. Scientific and Statistical Database Management (SSDBM '08), pp. 150-167, 2008. [12] M. Potamias, F. Bonchi, C. Castillo, and A. Gionis, "Fast Shortest Path Distance Estimation in Large Networks," Proc. 18th ACM Conf. Information and Knowledge Management (CIKM '09), pp. 867-876, 2009. [13] A.D. Sarma, S. Gollapudi, M. Najork, and R. Panigrahy, "A Sketch-Based Distance Oracle for Web-Scale Graphs," Proc. Third ACM Int'l Conf. Web Search and Data Mining (WSDM '10), pp. 401-410, 2010. [14] A. Gubichev, S. Bedathur, S. Seufert, and G. Weikum, "Fast and Accurate Estimation of Shortest Paths in Large Graphs," Proc. 19th ACM Int'l Conf. Information and Knowledge Management (CIKM '10), pp. 499-508, 2010. [15] M. Qiao, H. Cheng, and J.X. Yu, "Querying Shortest Path Distance with Bounded Errors in Large Graphs," Proc. 23rd Int'l Conf. Scientific and Statistical Database Management (SSDBM '11), pp. 255-273, 2011. [16] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979. [17] M.A. Bender and M. Farach-Colton, "The LCA Problem Revisited," Proc. Fourth Latin Am. Symp. Theoretical Informatics (LATIN '00), pp. 88-94, 2000. [18] A. Mislove, M. Marcon, K.P. Gummadi, P. Druschel, and B. Bhattacharjee, "Measurement and Analysis of Online Social Networks," Proc. Seventh ACM SIGCOMM Conf. Internet Measurement (IMC '07), pp. 29-42, 2007. [19] J. Sankaranarayanan, H. Samet, and H. Alborzi, "Path Oracles for Spatial Networks," Proc. VLDB Endowment, vol. 2, pp. 1210-1221, 2009. [20] J. Sankaranarayanan and H. Samet, "Distance Oracles for Spatial Networks," Proc. IEEE 25th Int'l Conf. Data Eng. (ICDE '09), pp. 652-663, 2009. [21] D. Papadias, J. Zhang, N. Mamoulis, and Y. Tao, "Query Processing in Spatial Network Database," Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 802-813, 2003. [22] M. Kolahdouzan and C. Shahabi, "Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), pp. 840-851, 2004. [23] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, R. Zhang, "iDistance: An Adaptive ${\rm B}^{+}$ -Tree Based Indexing Method for Nearest Neighbor Search," ACM Trans. Database Systems, vol. 30, no 2, pp. 364-397, 2005. [24] H. Hu, D.L. Lee, and V.C.S. Lee, "Distance Indexing on Road Networks," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), pp. 894-905, 2006. [25] H. Samet, J. Sankaranarayanan, and H. Alborzi, "Scalable Network Distance Browsing in Spatial Databases," Proc. ACM SIGMOD Int'l Conf. Management Data (SIGMOD '08), pp. 43-54, 2008. [26] F. Wei, "TEDI: Efficient Shortest Path Query Answering on Graphs," Proc. ACM SIGMOD Int'l Conf. Management Data (SIGMOD '10), pp. 99-110, 2010. [27] S. Nutanong, E. Tanin, J. Shao, R. Zhang, and K. Ramamohanarao, "Continuous Detour Queries in Spatial Networks," IEEE Trans. Knowledge and Data Eng., vol. 24, no. 7, pp. 1201-1215, July 2012.