The Community for Technology Leaders
Green Image
Issue No. 01 - Jan. (2013 vol. 25)
ISSN: 1041-4347
pp: 15-28
Jing Gao , University at Buffalo, The State University of New York, Buffalo
Feng Liang , University of Illinois at Urbana-Champaign, Urbana
Wei Fan , Huawei Noah's Ark Lab, Shatin
Yizhou Sun , University of Illinois at Urbana-Champaign, Urbana
Jiawei Han , University of Illinois at Urbana-Champaign, Urbana
Ensemble learning has emerged as a powerful method for combining multiple models. Well-known methods, such as bagging, boosting, and model averaging, have been shown to improve accuracy and robustness over single models. However, due to the high costs of manual labeling, it is hard to obtain sufficient and reliable labeled data for effective training. Meanwhile, lots of unlabeled data exist in these sources, and we can readily obtain multiple unsupervised models. Although unsupervised models do not directly generate a class label prediction for each object, they provide useful constraints on the joint predictions for a set of related objects. Therefore, incorporating these unsupervised models into the ensemble of supervised models can lead to better prediction performance. In this paper, we study ensemble learning with outputs from multiple supervised and unsupervised models, a topic where little work has been done. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the predictions over the graph, but penalizes the deviations from the initial labeling provided by the supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes and prove the optimality of the solution. The proposed method can be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on different applications with heterogeneous data sources demonstrate the benefits of the proposed method over existing alternatives. (More information, data, and code are available at
Motion pictures, Data models, Predictive models, Optimization, Bipartite graph, Labeling, Social network services, semi-supervised learning, Knowledge integration, ensemble learning, clustering ensemble

Y. Sun, W. Fan, F. Liang, J. Gao and J. Han, "A Graph-Based Consensus Maximization Approach for Combining Multiple Supervised and Unsupervised Models," in IEEE Transactions on Knowledge & Data Engineering, vol. 25, no. , pp. 15-28, 2013.
86 ms
(Ver 3.3 (11022016))