|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE 11th International Conference on Data Mining
Healing Sample Selection Bias by Source Classifier Selection
Vancouver, Canada
December 11-December 14
ISBN: 978-0-7695-4408-3
| ASCII Text | x | ||
| Chun-Wei Seah, Ivor Wai-Hung Tsang, Yew-Soon Ong, "Healing Sample Selection Bias by Source Classifier Selection," Data Mining, IEEE International Conference on, pp. 577-586, 2011 IEEE 11th International Conference on Data Mining, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDM.2011.73, author = {Chun-Wei Seah and Ivor Wai-Hung Tsang and Yew-Soon Ong}, title = {Healing Sample Selection Bias by Source Classifier Selection}, journal ={Data Mining, IEEE International Conference on}, volume = {0}, year = {2011}, issn = {1550-4786}, pages = {577-586}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.73}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Mining, IEEE International Conference on TI - Healing Sample Selection Bias by Source Classifier Selection SN - 1550-4786 SP577 EP586 A1 - Chun-Wei Seah, A1 - Ivor Wai-Hung Tsang, A1 - Yew-Soon Ong, PY - 2011 KW - Domain Adaptation KW - Sample Selection Bias KW - Negative Transfer KW - Maximum Margin Separation KW - Multiple Kernel Learning KW - Classifier Selection VL - 0 JA - Data Mining, IEEE International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.73
Domain Adaptation (DA) methods are usually carried out by means of simply reducing the marginal distribution differences between the source and target domains, and subsequently using the resultant trained classifier, namely source classifier, for use in the target domain. However, in many cases, the true predictive distributions of the source and target domains can be vastly different especially when their class distributions are skewed, causing the issues of sample selection bias in DA. Hence, DA methods which leverage the source labeled data may suffer from poor generalization in the target domain, resulting in negative transfer. In addition, we observed that many DA methods use either a source classifier or a linear combination of source classifiers with a fixed weighting for predicting the target unlabeled data. Essentially, the labels of the target unlabeled data are spanned by the prediction of these source classifiers. Motivated by these observations, in this paper, we propose to construct many source classifiers of diverse biases and learn the weight for each source classifier by directly minimizing the structural risk defined on the target unlabeled data so as to heal the possible sample selection bias. Since the weights are learned by maximizing the margin of separation between opposite classes on the target unlabeled data, the proposed method is established here as Maximal Margin Target Label Learning (MMTLL), which is in a form of Multiple Kernel Learning problem with many label kernels. Extensive experimental studies of MMTLL against several state-of-the-art methods on the Sentiment and Newsgroups datasets with various imbalanced class settings showed that MMTLL exhibited robust accuracies on all the settings considered and was resilient to negative transfer, in contrast to other counterpart methods which suffered significantly in prediction accuracy.
Index Terms:
Domain Adaptation, Sample Selection Bias, Negative Transfer, Maximum Margin Separation, Multiple Kernel Learning, Classifier Selection
Citation:
Chun-Wei Seah, Ivor Wai-Hung Tsang, Yew-Soon Ong, "Healing Sample Selection Bias by Source Classifier Selection," icdm, pp.577-586, 2011 IEEE 11th International Conference on Data Mining, 2011
Usage of this product signifies your acceptance of the Terms of Use.
