Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network
Issue No. 04 - April (2008 vol. 20)
Inner product computation is an important primitive used in many techniques for feature dependency detection, distance computation, clustering and correlation computation among others. Recently, peer-to-peer networks are getting increasing attention in various applications involving distributed file sharing, sensor networks, and mobile ad hoc networks. Efficient identification of top few inner product entries from the entire inner product matrix of features in a distributed peer-to-peer network is a challenging problem since centralizing the data from all the nodes in a synchronous, communication efficient manner may not be an option. This paper deals with the problem of identifying significant inner products among features in a peer-to-peer environment where different nodes observe a different set of data. It uses an ordinal framework to develop probabilistic algorithms to find top-<i>l</i> elements in the inner product matrix. These <i>l</i> inner product entries are important in making crucial decisions about dependency or relatedness between feature pairs, important for a number of data mining applications. In this paper we present experimental results demonstrating accurate and scalable performance of this algorithm for large peer-to-peer networks and also describe a real-world application for this algorithm.
Data mining, Probabilistic algorithms, Mining methods and algorithms, Algorithms for data and knowledge management, Knowledge management applications
H. Kargupta, K. Bhaduri, K. Das and K. Liu, "Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network," in IEEE Transactions on Knowledge & Data Engineering, vol. 20, no. , pp. 475-488, 2007.