The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2008 vol.20)
pp: 475-488
ABSTRACT
Inner product computation is an important primitive used in many techniques for feature dependency detection, distance computation, clustering and correlation computation among others. Recently, peer-to-peer networks are getting increasing attention in various applications involving distributed file sharing, sensor networks, and mobile ad hoc networks. Efficient identification of top few inner product entries from the entire inner product matrix of features in a distributed peer-to-peer network is a challenging problem since centralizing the data from all the nodes in a synchronous, communication efficient manner may not be an option. This paper deals with the problem of identifying significant inner products among features in a peer-to-peer environment where different nodes observe a different set of data. It uses an ordinal framework to develop probabilistic algorithms to find top-<i>l</i> elements in the inner product matrix. These <i>l</i> inner product entries are important in making crucial decisions about dependency or relatedness between feature pairs, important for a number of data mining applications. In this paper we present experimental results demonstrating accurate and scalable performance of this algorithm for large peer-to-peer networks and also describe a real-world application for this algorithm.
INDEX TERMS
Data mining, Probabilistic algorithms, Mining methods and algorithms, Algorithms for data and knowledge management, Knowledge management applications
CITATION
Kanishka Bhaduri, Kun Liu, Hillol Kargupta, "Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 4, pp. 475-488, April 2008, doi:10.1109/TKDE.2007.190714
REFERENCES
[1] C. Giannella, K. Liu, T. Olsen, and H. Kargupta, “Communication Efficient Construction of Decision Trees over Heterogeneously Distributed Data,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp. 67-74, 2004.
[2] K. Liu, H. Kargupta, and J. Ryan, “Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 92-106, Jan. 2006.
[3] K. Liu, K. Bhaduri, K. Das, P. Nguyen, and H. Kargupta, “Client-Side Web Mining for Community Formation in Peer-to-Peer Environments,” Proc. ACM SIGKDD Workshop Web Usage and Analysis (WebKDD '06), 2006.
[4] B. Babcock and C. Olston, “Distributed Top-k Monitoring,” Proc. ACM SIGMOD '03, pp. 28-39, 2003.
[5] H.A. David, Order Statistics. John Wiley & Sons, 1970.
[6] W. Hoeffding, “Probability for Sums of Bounded Random Variables,” J. Am. Statistical Assoc., no. 58, pp. 13-30, 1963.
[7] H. Kargupta and K. Sivakumar, Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT Press, 2004.
[8] Advances in Distributed and Parallel Knowledge Discovery, H.Kargupta and P. Chan, eds. MIT Press, 2000.
[9] R.I. Arriaga and S. Vempala, “An Algorithmic Theory of Learning: Robust Concepts and Random Projection,” Proc. 40th Ann. Symp. Foundations of Computer Science (FOCS '99), pp. 616-623, 1999.
[10] R. Wolff and A. Schuster, “Association Rule Mining in Peer-to-Peer Systems,” IEEE Trans. Systems, Man and, Cybernetics Part B: Cybernetics, vol. 34, no. 6, pp. 2426-2438, 2004.
[11] R. Fagin, “Combining Fuzzy Information from Multiple Systems,” Proc. ACM SIGMOD '96, pp. 216-226, 1996.
[12] W.-T. Balke, W. Nejdl, W. Siberski, and U. Thaden, “Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks,” Proc. 21st Int'l Conf. Data Eng. (ICDE '05), pp. 174-185, 2005.
[13] F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen, “PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities,” Proc. 12th Int'l Symp. High-Performance Distributed Computing (HPDC '03), pp. 236-249, 2003.
[14] S. Datta, C. Giannella, and H. Kargupta, “K-Means Clustering over Large, Dynamic Networks,” Proc. Sixth SIAM Int'l Conf. Data Mining (SDM '06), pp. 153-164, 2006.
[15] R. Wolff, K. Bhaduri, and H. Kargupta, “Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems,” Proc. Sixth SIAM Int'l Conf. Data Mining (SDM '06), pp. 430-441, 2006.
[16] S. Datta, K. Bhaduri, C. Giannella, R. Wolff, and H. Kargupta, “Distributed Data Mining in Peer-to-Peer Networks,” IEEE Internet Computing, special issue on distributed data mining, vol.10, no. 4, pp. 18-26, 2006.
[17] K. Tumer and J. Ghosh, “Robust Combining of Disparate Classifiers through Order Statistics,” Pattern Analysis and Applications, vol. 5, pp. 189-200, 2001.
[18] M.B. Greenwald and S. Khanna, “Power-Conserving Computation of Order-Statistics over Sensor Networks,” Proc. 23rd ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS '04), pp. 275-285, 2004.
[19] Y.-C. Ho, C.G. Cassandras, C.-H. Chen, and L. Dai, “Ordinal Optimization and Simulation,” J. Operations Research Soc., vol. 51, pp. 490-500, 2000.
[20] L. Lovász, “Random Walks on Graphs: A Survey,” Combinatorics, vol. 2, no. 80, pp. 1-46, 1993.
[21] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, “Equations of State Calculations by Fast Computing Machines,” J. Chemical Physics, vol. 21, pp. 1087-1092, 1953.
[22] W. Hastings, “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, vol. 57, pp. 97-109, 1970.
[23] S. Datta and H. Kargupta, “Uniform Data Sampling from a Peer-to-Peer Network,” Proc. 27th Int'l Conf. Distributed Computing Systems (ICDCS '07), p. 50, 2007.
[24] A. Awan, R.A. Ferreira, S. Jagannathan, and A. Grama, “Distributed Uniform Sampling in Unstructured Peer-to-Peer Networks,” Proc. 39th Hawaii Int'l Conf. Systems Science (HICSS '06), 2006.
[25] P. Orponen and S.E. Schaeffer, “Efficient Algorithms for Sampling and Clustering of Large Nonuniform Networks,” Technical Report cond-mat/0406048, arXiv.org e-Print archive, 2004.
[26] S. Mane, S. Mopuru, K. Mehra, and J. Srivastava, “Network Size Estimation in A Peer-to-Peer Network,” Technical Report 05-030, Univ. of Minnesota, Sept. 2005.
[27] M. Bawa, H. Garcia-Molina, A. Gionis, and R. Motwani, “Estimating Aggregates on a Peer-to-Peer Network,” technical report, Computer Science Dept., Stanford Univ., Apr. 2003.
[28] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: An Approach to Universal Topology Generation,” Proc. Ninth Int'l Workshop Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS '01), 2001.
[29] S. Saroiu, P.K. Gummadi, and S.D. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems,” Proc. SPIE/ACM Conf. Multimedia Computing and Networking (MMCN '02), pp. 156-170, 2002.
[30] DDMT, http://www.umbc.edu/ddm/wiki/softwareDDMT /, 2007.
[31] A.H. Maslow, Motivation and Personality, third ed. HarperCollins, Jan. 1987.
[32] S. Castano and S. Montanelli, “Semantic Self-Formation of Communities of Peers,” Proc. Second Ann. European Semantic Web Conf. (ESWC '05), 2005.
[33] N. Noy, “Semantic Integration: A Survey of Ontology-Based Approaches,” ACM SIGMOD Record, vol. 33, no. 4, pp. 65-70, 2004.
[34] M. Khambatti, K.D. Ryu, and P. Dasgupta, “Efficient Discovery of Implicitly Formed Peer-to-Peer Communities,” Int'l J. Parallel and Distributed Systems and Networks, vol. 5, no. 4, pp. 155-164, 2002.
[35] Y. Wang and J. Vassileva, “Trust-Based Community Formation in Peer-to-Peer File Sharing Networks,” Proc. IEEE/WIC/ACM Int'l Conf. Web Intelligence (WI '04) , pp. 341-348, 2004.
[36] J.P. Scott, Social Network Analysis: A Handbook, second ed. Sage Publications, Mar. 2000.
[37] J. Trajkova and S. Gauch, “Improving Ontology-Based User Profiles,” Proc. Recherche d'Information Assistée par Ordinateur (RIAO '04), pp. 380-389, 2004.
[38] Ö. Egecioglu and H. Ferhatosmanoglu, “Dimensionality Reduction and Similarity Computation by Inner Product Approximations,” Proc. ACM Int'l Conf. Information and Knowledge Management (CIKM '00), pp. 219-226, 2000.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool