20th Annual International Conference on High Performance Computing (2010)
Dona Paula India
Dec. 19, 2010 to Dec. 22, 2010
Ankur Narang , IBM India Research Laboratory, New Delhi, India
Raj Gupta , IBM India Research Laboratory, New Delhi, India
Anupam Joshi , IBM India Research Laboratory, New Delhi, India
Vikas K. Garg , IBM India Research Laboratory, New Delhi, India
Collaborative filtering (CF) based recommender systems have gained wide popularity in Internet companies like Amazon, Netflix, Google News, and others. These systems make automatic predictions about the interests of a user by inferring from information about like-minded users. Realtime CF on highly sparse massive datasets, while achieving a high prediction accuracy, is a computationally challenging problem. In this paper, we present the design of a soft real-time (around 1 min.) parallel CF algorithm based on the Concept Decomposition technique. Our parallel algorithm has been optimized for multicore/many-core architectures while maintaining the prediction accuracy of 0.84 RMSE. Using the Netflix dataset, we demonstrate the performance and scalability of our algorithm (in both batch mode and online mode) on a 32-core Power6 based SMP system. Our parallel algorithm delivered training time of 64s on the full Netflix dataset and prediction time of 4.5s on 1.4M ratings (3.2/μs per rating prediction). This is 12.6× better than the best known sequential training time and around 33 × better than the best known sequential prediction time, along with high accuracy (0.84 RMSE). To the best of our knowledge, this is also the best known parallel performance at such high accuracy.
information filtering, Internet, recommender systems
A. Narang, R. Gupta, A. Joshi and V. K. Garg, "Highly scalable parallel collaborative filtering algorithm," 2010 International Conference on High Performance Computing (HiPC 2010)(HIPC), Dona Paula, 2011, pp. 1-10.