The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2010 vol.22)
pp: 59-75
Yuzhe Tang , Fudan Univeristy and Shanghai Key Lab of Intelligent Information Processing, Shanghai
Shuigeng Zhou , Fudan Univeristy and Shanghai Key Lab of Intelligent Information Processing, Shanghai
Jianliang Xu , Hong Kong Baptist University, Hong Kong
ABSTRACT
DHT is a widely used building block for scalable P2P systems. However, as uniform hashing employed in DHTs destroys data locality, it is not a trivial task to support complex queries (e.g., range queries and k-nearest-neighbor queries) in DHT-based P2P systems. In order to support efficient processing of such complex queries, a popular solution is to build indexes on top of the DHT. Unfortunately, existing over-DHT indexing schemes suffer from either query inefficiency or high maintenance cost. In this paper, we propose LIGhtweight Hash Tree (LIGHT)—a query-efficient yet low-maintenance indexing scheme. LIGHT employs a novel naming mechanism and a tree summarization strategy for graceful distribution of its index structure. We show through analysis that it can support various complex queries with near-optimal performance. Extensive experimental results also demonstrate that, compared with state of the art over-DHT indexing schemes, LIGHT saves 50-75 percent of index maintenance cost and substantially improves query performance in terms of both response time and bandwidth consumption. In addition, LIGHT is designed over generic DHTs and hence can be easily implemented and deployed in any DHT-based P2P system.
INDEX TERMS
Distributed hash tables, indexing, complex queries.
CITATION
Yuzhe Tang, Shuigeng Zhou, Jianliang Xu, "LIGHT: A Query-Efficient Yet Low-Maintenance Indexing Scheme over DHTs", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 1, pp. 59-75, January 2010, doi:10.1109/TKDE.2009.47
REFERENCES
[1] I. Stoica, R. Morris, D.R. Karger, M.F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. 2003 Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM), pp.149-160, 2001.
[2] S. Ratnasamy, P. Francis, M. Handley, R.M. Karp, and S. Shenker, “A Scalable Content-Addressable Network,” Proc. 2001 Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM), pp.161-172, 2001.
[3] A.I.T. Rowstron and P. Druschel, “Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems,” Proc. Middleware, pp. 329-350, 2001.
[4] B.Y. Zhao, J. Kubiatowicz, and A.D. Joseph, “Tapestry: A Fault-Tolerant Wide Area Application Infrastructure,” Computer Comm. Rev., vol. 32, no. 1, p. 81, 2002.
[5] D.R. Karger, E. Lehman, F.T. Leighton, R. Panigrahy, M.S. Levine, and D. Lewin, “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web,” Proc. Symp. Theory of Computing (STOC), pp. 654-663, 1997.
[6] S.C. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. Shenker, I. Stoica, and H. Yu, “OpenDHT: A Public DHT Service and Its Uses,” Proc. 2005 Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM), pp. 73-84, 2005.
[7] P. Maymounkov and D. Mazières, “Kademlia: A Peer-to-Peer Information System Based on the XOR Metric,” Proc. Int'l Workshop Peer-to-Peer Systems (IPTPS), pp. 53-65, 2002.
[8] http://en.wikipedia.org/wikikademlia, 2009.
[9] A.I.T. Rowstron and P. Druschel, “Storage Management and Caching in Past, a Large-Scale, Persistent Peer-to-Peer Storage Utility,” Proc. Symp. Operating Systems Principles (SOSP), pp.188-201, 2001.
[10] J. Kubiatowicz, D. Bindel, Y. Chen, S.E. Czerwinski, P.R. Eaton, D. Geels, R. Gummadi, S.C. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B.Y. Zhao, “Oceanstore: An Architecture for Global-Scale Persistent Storage,” Proc. Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp.190-201, 2000.
[11] F. Dabek, M.F. Kaashoek, D.R. Karger, R. Morris, and I. Stoica, “Wide Area Cooperative Storage with CFS,” Proc. Symp. Operating Systems Principles (SOSP), pp. 202-215, 2001.
[12] M.J. Freedman, E. Freudenthal, and D. Mazières, “Democratizing Content Publication with Coral,” Proc. Networked Systems Design and Implementation (NSDI), pp. 239-252, 2004.
[13] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana, “Internet Indirection Infrastructure,” Proc. 2002 Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM), pp.73-86, 2002.
[14] M.J. Freedman, K. Lakshminarayanan, and D. Mazières, “Oasis: Anycast for Any Service,” Proc. Networked Systems Design and Implementation (NSDI), 2006.
[15] S. Ramabhadran, S. Ratnasamy, J.M. Hellerstein, and S. Shenker, “Brief Announcement: Prefix Hash Tree,” Proc. Principles of Distributed Computing (PODC), p. 368, 2004.
[16] Y. Chawathe, S. Ramabhadran, S. Ratnasamy, A. LaMarca, S. Shenker, and J.M. Hellerstein, “A Case Study in Building Layered DHT Applications,” Proc. 2005 Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM), pp.97-108, 2005.
[17] J. Gao and P. Steenkiste, “An Adaptive Protocol for Efficient Support of Range Queries in DHT-Based Systems,” Proc. IEEE Int'l Conf. Network Protocols (ICNP), pp. 239-250, 2004.
[18] C. Zheng, G. Shen, S. Li, and S. Shenker, “Distributed Segment Tree: Support of Range Query Cover Query over DHT,” Proc. Fifth Int'l Workshop Peer-to-Peer Systems (IPTPS), Feb. 2006.
[19] B. Yang and H. Garcia-Molina, “Comparing Hybrid Peer-to-Peer Systems,” Proc. Very Large Data Bases (VLDB), pp. 561-570, 2001.
[20] S. Saroiu, P. Gummadi, and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems,” www.citeseer.ist.psu.edusaroiu02measurement.html , 2002.
[21] S.C. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, “Handling Churn in a DHT,” Proc. USENIX Ann. Technical Conf. (ATC), pp.127-140, 2004.
[22] C.G. Plaxton, R. Rajaraman, and A.W. Richa, “Accessing Nearby Copies of Replicated Objects in a Distributed Environment,” Proc. Symp. Parallel Algorithms and Architectures (SPAA), pp. 311-320, 1997.
[23] W.G. Bridges and S. Toueg, “On the Impossibility of Directed Moore Graphs,” J. Combinatorial Theory, Series B, vol. 29, no. 3, pp.339-341, 1980.
[24] P. Fraigniaud and P. Gauron, “Brief Announcement: An Overview of the Content-Addressable Network D2B,” Proc. Principles of Distributed Computing (PODC), p. 151, 2003.
[25] D. Loguinov, A. Kumar, V. Rai, and S. Ganesh, “Graph-Theoretic Analysis of Structured Peer-to-Peer Systems: Routing Distances and Fault Resilience,” Proc. 2003 Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. (SIGCOMM), pp.395-406, 2003.
[26] D. Malkhi, M. Naor, and D. Ratajczak, “Viceroy: A Scalable and Dynamic Emulation of the Butterfly,” Proc. Principles of Distributed Computing (PODC), pp. 183-192, 2002.
[27] D. Li, X. Lu, and J. Wu, “Fissione: A Scalable Constant Degree and Low Congestion DHT Scheme Based on Kautz Graphs,” Proc. IEEE Int'l Conf. Computer Comm. (INFOCOM), pp. 1677-1688, 2005.
[28] J. Liang and K. Nahrstedt, “Randpeer: Membership Management for QoS Sensitive Peer-to-Peer Applications,” Proc. IEEE Int'l Conf. Computer Comm. (INFOCOM), 2006.
[29] O.D. Sahin, A. Gulbeden, F. Emekçi, D. Agrawal, and A.E. Abbadi, “PRISM: Indexing Multi-Dimensional Data in P2P Networks Using Reference Vectors,” Proc. 13th Ann. ACM Int'l Conf. Multimedia (MM), pp. 946-955, 2005.
[30] J. Gao and P. Steenkiste, “Efficient Support for Similarity Searches in DHT-Based Peer-to-Peer Systems,” Proc. Int'l Conf. Comm. (ICC), pp. 1867-1874, 2007.
[31] L. Chen, K.S. Candan, J. Tatemura, D. Agrawal, and D. Cavendish, “On Overlay Schemes to Support Point-in-Range Queries for Scalable Grid Resource Discovery,” Proc. IEEE Int'l Conf. Peer-to-Peer Computing (P2P), pp. 23-30, 2005.
[32] E. Tanin, A. Harwood, and H. Samet, “Using a Distributed Quadtree Index in Peer-to-Peer Networks,” Very Large Data Bases J., vol. 16, no. 2, pp. 165-178, 2007.
[33] R. Huebsch, J.M. Hellerstein, N. Lanham, B.T. Loo, S. Shenker, and I. Stoica, “Querying the Internet with Pier,” Proc. Very Large Data Bases (VLDB), pp. 321-332, 2003.
[34] S. Idreos, C. Tryfonopoulos, and M. Koubarakis, “Distributed Evaluation of Continuous Equi-Join Queries over Large Structured Overlay Networks,” Proc. Int'l Conf. Data Eng. (ICDE), p. 43-54, 2006.
[35] S. Idreos, E. Liarou, and M. Koubarakis, “Continuous Multi-Way Joins over Distributed Hash Tables,” Proc. Extending Data Base Technology (EDBT), 2008.
[36] P. Reynolds and A. Vahdat, “Efficient Peer-to-Peer Keyword Searching,” Proc. Middleware, pp. 21-40, 2003.
[37] C. Tang, S. Dwarkadas, and Z. Xu, “On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems,” Proc. 27th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp.112-121, 2004.
[38] C. Tang and S. Dwarkadas, “Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval,” Proc. Networked Systems Design and Implementation (NSDI), pp. 211-224, 2004.
[39] M. Cai and M.R. Frank, “RDFpeers: A Scalable Distributed RDF Repository Based on a Structured Peer-to-Peer Network,” Proc. World Wide Web (WWW), pp. 650-657, 2004.
[40] L. Galanis, Y. Wang, S.R. Jeffery, and D.J. DeWitt, “Locating Data Sources in Large Distributed Systems,” Proc. Very Large Data Bases (VLDB), pp. 874-885, 2003.
[41] A. Andrzejak and Z. Xu, “Scalable, Efficient Range Queries for Grid Information Services,” Proc. IEEE Int'l Conf. Peer-to-Peer Computing (P2P), pp. 33-40, 2002.
[42] C. Schmidt and M. Parashar, “Flexible Information Discovery in Decentralized Distributed Systems,” Proc. High Performance Distributed Computing (HPDC), pp. 226-235, 2003.
[43] A. Datta, M. Hauswirth, R. John, R. Schmidt, and K. Aberer, “Range Queries in Trie-Structured Overlays,” Proc. IEEE Int'l Conf. Peer-to-Peer Computing (P2P), pp. 57-66, 2005.
[44] D. Li, X. Lu, B. Wang, J. Su, J. Cao, K.C.C. Chan, and H.V. Leong, “Delay-Bounded Range Queries in DHT-Based Peer-to-Peer Systems,” Proc. Int'l Conf. Distributed Computing Systems (ICDCS), p. 64-71, 2006.
[45] D. Li, J. Cao, X. Lu, and K.C.C. Chan, “Efficient Range Query Processing in Peer-to-Peer Systems,” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 1, pp. 78-91, Jan. 2009.
[46] A. Gupta, D. Agrawal, and A.E. Abbadi, “Approximate Range Selection Queries in Peer-to-Peer Systems,” Proc. Conf. Innovative Data Systems Research (CIDR), 2003.
[47] M. Bawa, T. Condie, and P. Ganesan, “Lsh Forest: Self-Tuning Indexes for Similarity Search,” Proc. World Wide Web (WWW), pp.651-660, 2005.
[48] Y.-J. Joung, C.-T. Fang, and L.-W. Yang, “Keyword Search in DHT-Based Peer-to-Peer Networks,” Proc. Int'l Conf. Distributed Computing Systems (ICDCS), pp. 339-348, 2005.
[49] Y.-J. Joung and L.-W. Yang, “KISS: A Simple Prefix Search Scheme in P2P Networks,” Proc. Workshop Web and Databases (WebDB), 2006.
[50] D. Han, T. Shen, S. Meng, and Y. Yu, “Cuckoo Ring: Balancing Workload for Locality Sensitive Hash,” Proc. IEEE Int'l Conf. Peer-to-Peer Computing (P2P), pp. 49-56, 2006.
[51] J. Aspnes and G. Shah, “Skip Graphs,” Proc. Symp. Discrete Algorithms (SODA), pp. 384-393, 2003.
[52] A. Crainiceanu, P. Linga, J. Gehrke, and J. Shanmugasundaram, “Querying Peer-to-Peer Networks Using P-Trees,” Proc. Workshop Web and Databases (WebDB), pp. 25-30, 2004.
[53] A. Crainiceanu, P. Linga, A. Machanavajjhala, J. Gehrke, and J. Shanmugasundaram, “P-Ring: An Efficient and Robust P2P Range Index Structure,” Proc. ACM SIGMOD, pp. 223-234, 2007.
[54] H.V. Jagadish, B.C. Ooi, and Q.H. Vu, “Baton: A Balanced Tree Structure for Peer-to-Peer Networks,” Proc. Very Large Data Bases (VLDB), pp. 661-672, 2005.
[55] H.V. Jagadish, B.C. Ooi, Q.H. Vu, R. Zhang, and A. Zhou, “VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes,” Proc. Int'l Conf. Data Eng. (ICDE), p.34, 2006.
[56] C. du Mouza, W. Litwin, and P. Rigaux, “SD-Rtree: A Scalable Distributed Rtree,” Proc. Int'l Conf. Data Eng. (ICDE), pp. 296-305, 2007.
[57] A.R. Bharambe, M. Agrawal, and S. Seshan, “Mercury: Supporting Scalable Multi-Attribute Range Queries,” Proc. 2004 Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm. and ACM SIGCOMM Computer Comm. Rev., pp.353-366, 2004.
[58] P. Ganesan, M. Bawa, and H. Garcia-Molina, “Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems,” Proc. Very Large Data Bases (VLDB), pp. 444-455, 2004.
[59] D.R. Karger and M. Ruhl, “Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems,” Proc. Symp. Parallel Algorithms and Architectures (SPAA), pp. 36-43, 2004.
[60] P. Yalagandu and J. Browne, “Solving Range Queries in a Distributed System,” Technical Report TR-04-18, 04-18, Dept. of Computer Sciences, Univ. of Texas at Austin, 2003.
[61] M. Mitzenmacher, “The Power of Two Choices in Randomized Load Balancing,” "IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 10, pp. 1094-1104, Oct. 2001.
[62] J.W. Byers, J. Considine, and M. Mitzenmacher, “Simple Load Balancing for Distributed Hash Tables,” Proc. Int'l Workshop Peer-to-Peer Systems (IPTPS), pp. 80-87, 2003.
7 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool