Subscribe

Issue No.04 - April (2008 vol.20)

pp: 475-488

ABSTRACT

Inner product computation is an important primitive used in many techniques for feature dependency detection, distance computation, clustering and correlation computation among others. Recently, peer-to-peer networks are getting increasing attention in various applications involving distributed file sharing, sensor networks, and mobile ad hoc networks. Efficient identification of top few inner product entries from the entire inner product matrix of features in a distributed peer-to-peer network is a challenging problem since centralizing the data from all the nodes in a synchronous, communication efficient manner may not be an option. This paper deals with the problem of identifying significant inner products among features in a peer-to-peer environment where different nodes observe a different set of data. It uses an ordinal framework to develop probabilistic algorithms to find top-<i>l</i> elements in the inner product matrix. These <i>l</i> inner product entries are important in making crucial decisions about dependency or relatedness between feature pairs, important for a number of data mining applications. In this paper we present experimental results demonstrating accurate and scalable performance of this algorithm for large peer-to-peer networks and also describe a real-world application for this algorithm.

INDEX TERMS

Data mining, Probabilistic algorithms, Mining methods and algorithms, Algorithms for data and knowledge management, Knowledge management applications

CITATION

Kamalika Das, Kanishka Bhaduri, Kun Liu, Hillol Kargupta, "Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 4, pp. 475-488, April 2008, doi:10.1109/TKDE.2007.190714REFERENCES

- [3] K. Liu, K. Bhaduri, K. Das, P. Nguyen, and H. Kargupta, “Client-Side Web Mining for Community Formation in Peer-to-Peer Environments,”
Proc. ACM SIGKDD Workshop Web Usage and Analysis (WebKDD '06), 2006.- [4] B. Babcock and C. Olston, “Distributed Top-k Monitoring,”
Proc. ACM SIGMOD '03, pp. 28-39, 2003.- [5] H.A. David,
Order Statistics. John Wiley & Sons, 1970.- [7] H. Kargupta and K. Sivakumar,
Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT Press, 2004.- [8]
Advances in Distributed and Parallel Knowledge Discovery, H.Kargupta and P. Chan, eds. MIT Press, 2000.- [9] R.I. Arriaga and S. Vempala, “An Algorithmic Theory of Learning: Robust Concepts and Random Projection,”
Proc. 40th Ann. Symp. Foundations of Computer Science (FOCS '99), pp. 616-623, 1999.- [11] R. Fagin, “Combining Fuzzy Information from Multiple Systems,”
Proc. ACM SIGMOD '96, pp. 216-226, 1996.- [13] F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen, “PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities,”
Proc. 12th Int'l Symp. High-Performance Distributed Computing (HPDC '03), pp. 236-249, 2003.- [14] S. Datta, C. Giannella, and H. Kargupta, “K-Means Clustering over Large, Dynamic Networks,”
Proc. Sixth SIAM Int'l Conf. Data Mining (SDM '06), pp. 153-164, 2006.- [15] R. Wolff, K. Bhaduri, and H. Kargupta, “Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems,”
Proc. Sixth SIAM Int'l Conf. Data Mining (SDM '06), pp. 430-441, 2006.- [17] K. Tumer and J. Ghosh, “Robust Combining of Disparate Classifiers through Order Statistics,”
Pattern Analysis and Applications, vol. 5, pp. 189-200, 2001.- [20] L. Lovász, “Random Walks on Graphs: A Survey,”
Combinatorics, vol. 2, no. 80, pp. 1-46, 1993.- [21] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, “Equations of State Calculations by Fast Computing Machines,”
J. Chemical Physics, vol. 21, pp. 1087-1092, 1953.- [23] S. Datta and H. Kargupta, “Uniform Data Sampling from a Peer-to-Peer Network,”
Proc. 27th Int'l Conf. Distributed Computing Systems (ICDCS '07), p. 50, 2007.- [24] A. Awan, R.A. Ferreira, S. Jagannathan, and A. Grama, “Distributed Uniform Sampling in Unstructured Peer-to-Peer Networks,”
Proc. 39th Hawaii Int'l Conf. Systems Science (HICSS '06), 2006.- [25] P. Orponen and S.E. Schaeffer, “Efficient Algorithms for Sampling and Clustering of Large Nonuniform Networks,” Technical Report cond-mat/0406048, arXiv.org e-Print archive, 2004.
- [26] S. Mane, S. Mopuru, K. Mehra, and J. Srivastava, “Network Size Estimation in A Peer-to-Peer Network,” Technical Report 05-030, Univ. of Minnesota, Sept. 2005.
- [27] M. Bawa, H. Garcia-Molina, A. Gionis, and R. Motwani, “Estimating Aggregates on a Peer-to-Peer Network,” technical report, Computer Science Dept., Stanford Univ., Apr. 2003.
- [28] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: An Approach to Universal Topology Generation,”
Proc. Ninth Int'l Workshop Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS '01), 2001.- [29] S. Saroiu, P.K. Gummadi, and S.D. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems,”
Proc. SPIE/ACM Conf. Multimedia Computing and Networking (MMCN '02), pp. 156-170, 2002.- [30] DDMT, http://www.umbc.edu/ddm/wiki/softwareDDMT /, 2007.
- [31] A.H. Maslow,
Motivation and Personality, third ed. HarperCollins, Jan. 1987.- [32] S. Castano and S. Montanelli, “Semantic Self-Formation of Communities of Peers,”
Proc. Second Ann. European Semantic Web Conf. (ESWC '05), 2005.- [33] N. Noy, “Semantic Integration: A Survey of Ontology-Based Approaches,”
ACM SIGMOD Record, vol. 33, no. 4, pp. 65-70, 2004.- [34] M. Khambatti, K.D. Ryu, and P. Dasgupta, “Efficient Discovery of Implicitly Formed Peer-to-Peer Communities,”
Int'l J. Parallel and Distributed Systems and Networks, vol. 5, no. 4, pp. 155-164, 2002.- [36] J.P. Scott,
Social Network Analysis: A Handbook, second ed. Sage Publications, Mar. 2000.- [37] J. Trajkova and S. Gauch, “Improving Ontology-Based User Profiles,”
Proc. Recherche d'Information Assistée par Ordinateur (RIAO '04), pp. 380-389, 2004.- [38] Ö. Egecioglu and H. Ferhatosmanoglu, “Dimensionality Reduction and Similarity Computation by Inner Product Approximations,”
Proc. ACM Int'l Conf. Information and Knowledge Management (CIKM '00), pp. 219-226, 2000. |