The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2009 vol.21)
pp: 1737-1752
Praveen R. Rao , University of Missouri-Kansas City, Kansas City
Bongki Moon , University of Arizona, Tucson
ABSTRACT
One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers. With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries. In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath. We have developed a new system called psiX that runs on top of an existing distributed hashing framework. Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document. An XML query pattern is also mapped into a signature. The query's signature is used to locate relevant document signatures. Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually. The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures. Value indexes are built to handle numeric and textual values in XML documents. These indexes are used to process queries with value predicates. Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents.
INDEX TERMS
XML indexing, XPath, peer-to-peer computing, distributed hash tables.
CITATION
Praveen R. Rao, Bongki Moon, "Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 12, pp. 1737-1752, December 2009, doi:10.1109/TKDE.2009.26
REFERENCES
[1] R. Overbeek, T. Disz, and R. Stevens, “The SEED: A Peer-to-Peer Environment for Genome Annotation,” Comm. ACM, vol. 47, no. 11, pp. 47-50, Nov. 2004.
[2] G. Koloniari and E. Pitoura, “Peer-to-Peer Management of XML Data: Issues and Research Challenges,” SIGMOD Record, vol. 34, no. 2, pp. 6-17, June 2005.
[3] Q. Li and B. Moon, “Indexing and Querying XML Data for Regular Path Expressions,” Proc. 27th Very Large Data Bases (VLDB) Conf., pp. 361-370, Sept. 2001.
[4] N. Bruno, N. Koudas, and D. Srivastava, “Holistic Twig Joins: Optimal XML Pattern Matching,” Proc. ACM SIGMOD, 2002.
[5] P. Rao and B. Moon, “PRIX: Indexing and Querying XML Using Prüfer Sequences,” Proc. 20th IEEE Int'l Conf. Data Eng., Mar. 2004.
[6] P. Rao and B. Moon, “Sequencing XML Data and Query Twigs for Fast Pattern Matching,” ACM Trans. Database Systems, vol. 31, no. 1, pp. 299-345, Mar. 2006.
[7] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. ACM SIGCOMM '01, pp. 149-160, Aug. 2001.
[8] PlanetLab, http:/www.planet-lab.org, 2009.
[9] L. Galanis, Y. Wang, S.R. Jeffery, and D.J. DeWitt, “Locating Data Sources in Large Distributed Systems,” Proc. 29th Very Large Data Bases (VLDB) Conf., 2003.
[10] C. Tang, Z. Xu, and S. Dwarkadas, “Peer-to-Peer Information Retrieval Using Self-Organizing Semantic Overlay Networks,” Proc. ACM SIGCOMM '03, pp. 175-186, Aug. 2003.
[11] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,” J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[12] K. Aberer, A. Datta, M. Hauswirth, and R. Schmidt, “Indexing Data-Oriented Overlay Networks,” Proc. 31st Very Large Data Bases (VLDB) Conf., pp. 685-696, 2005.
[13] H. Jagadish, B.C. Ooi, and Q.H. Vu, “BATON: A Balanced Tree Structure for Peer-to-Peer Networks,” Proc. 31st Very Large Data Bases (VLDB) Conf., 2005.
[14] A. Crainiceanu, P. Linga, A. Machanavajjhala, J. Gehrke, and J. Shanmugasundaram, “P-Ring: An Efficient and Robust P2P Range Index Structure,” Proc. ACM SIGMOD '07, pp. 223-234, 2007.
[15] S. Viglas, “Distributed File Structures in a Peer-to-Peer Environment,” Proc. 23rd IEEE Int'l Conf. Data Eng., pp. 406-415, 2007.
[16] B. Liu, W.-C. Lee, and D.L. Lee, “Supporting Complex Multi-Dimensional Queries in P2P Systems,” Proc. 25th IEEE Int'l Conf. Distributed Computing Systems, pp. 155-164, June 2005.
[17] P. Ganesan, B. Yang, and H. Garcia-Molina, “One Torus to Rule Them All: Multi-Dimensional Queries in P2P Systems,” Proc. Seventh Int'l Workshop Web and Databases, June 2004.
[18] C. Sartiani, P. Manghi, G. Ghelli, and G. Conforti, “XPeer: A Self-Organizing XML P2P Database System,” Proc. Int'l Workshop Peer-to-Peer Computing and Databases, 2004.
[19] R. Goldman and J. Widom, “DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases,” Proc. 23rd Very Large Data Bases (VLDB) Conf., pp. 436-445, Aug. 1997.
[20] A. Bonifati, U. Matrangolo, A. Cuzzocrea, and M. Jain, “XPath Lookup Queries in P2P Networks,” Proc. Sixth Ann. ACM Int'l Workshop Web Information and Data Management (WIDM' 04), pp.48-55, Nov. 2004.
[21] M.O. Rabin, “Fingerprinting by Random Polynomials,” Technical Report TR 15-81, Harvard Univ., 1981.
[22] L. Garces-Erice, P.A. Felber, E.W. Biersack, G. Urvoy-Keller, and K.W. Ross, “Data Indexing in Peer-to-Peer DHT Networks,” Proc. 24th IEEE Int'l Conf. Distributed Computing Systems, pp. 200-208, Mar. 2004.
[23] G. Skobeltsyn, M. Hauswirth, and K. Aberer, “Efficient Processing of XPath Queries with Structured Overlay Networks,” Proc. Fourth Int'l Conf. Ontologies, DataBases, and Applications of Semantics, Oct. 2005.
[24] S. Abiteboul, I. Manolescu, N. Polyzotis, N. Preda, and C. Sun, “XML Processing in DHT Networks,” Proc. 24th IEEE Int'l Conf. Data Eng., Apr. 2008.
[25] G. Koloniari and E. Pitoura, “Content-Based Routing of Path Queries in Peer-to-Peer Systems,” Proc. Ninth Int'l Conf. Extending Database Technology, pp. 29-47, 2004.
[26] S. Antony, D. Agrawal, and A.E. Abbadi, “P2P Systems with Transactional Semantics,” Proc. 11th Int'l Conf. Extending Database Technology, 2008.
[27] E. Bach and J. Shallit, Algorithmic Number Theory (Volume 1: Efficient Algorithms). MIT Press, 1996.
[28] F. Ruskey and K. Cattel, “The Combinatorial Object Server,” http:/www.theory.csc.uvic.ca/, 2009.
[29] T. Milo and D. Suciu, “Index Structures for Path Expressions,” Proc. Seventh Int'l Conf. Database Theory, pp. 277-295, Jan. 1999.
[30] P. Rao and B. Moon, “Locating XML Documents in Peer-to-Peer Networks Using Distributed Hash Tables,” Technical Report TR-DB-2008-01, Univ. of Missouri-Kansas City, http://r.faculty.umkc.edu/raoprTR-DB-2008-01.pdf , Mar. 2008.
[31] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD '84, pp. 47-57, June 1984.
[32] J.L. Carter and M.N. Wegman, “Universal Classes of Hash Functions,” J. Computer and System Sciences, vol. 18, pp. 143-154, 1979.
[33] D. Eastlake and P. Jones, “US Secure Hash Algorithm 1 (SHA1),” IETF Request for Comments 3174, 2001.
[34] J.H. Conway and R.K. Guy, The Book of Numbers. Springer-Verlag, 1996.
[35] F. Dabek, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica, “Wide Area Cooperative Storage with CFS,” Proc. 20th ACM Symp. Operating Systems Principles, pp. 202-215, Oct. 2001.
[36] The Niagara Project, http://www.cs.wisc.eduniagara/, 2009.
[37] UW XML Repository, www.cs.washington.edu/researchxmldatasets , 2009.
[38] XML.org, http://www.xml.orgxml, 2008.
[39] Y. Diao, M. Altinel, M.J. Franklin, H. Zhang, and P. Fischer, “Path Sharing and Predicate Evaluation for High-Performance XML Filtering,” ACM Trans. Database Systems, vol. 28, no. 4, pp. 467-516, 2003.
[40] E. Curtmola, A. Deutsch, D. Logothetis, K.K. Ramakrishnan, D. Srivastava, and K. Yocum, “XTreeNet: Democratic Community Search,” Proc. 34th Very Large Data Bases Conf., pp. 1448-1451, Aug. 2008.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool