The Community for Technology Leaders
2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL) (2014)
London, United Kingdom
Sept. 8, 2014 to Sept. 12, 2014
ISBN: 978-1-4799-5569-5
pp: 141-144
Zhaohui Wu , Computer Science and Engineering, Pennsylvania State University, University Park, 16802, USA
Wenyi Huang , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
Chen Liang , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
C. Lee Giles , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
ABSTRACT
We explore a new metadata extraction framework without human annotators with the ground truth harvested from Web. A new training sample is selected based on not only the uncertainty and representativeness in the unlabeled pool, but also on its availability and credibility in Web knowledge bases. We construct a dataset of 4329 books with valid metadata and evaluate our approach using 5 Web book databases as oracles. Empirical results demonstrate its effectiveness and efficiency.
INDEX TERMS
Abstracts, Welding, IP networks
CITATION

Zhaohui Wu, Wenyi Huang, Chen Liang and C. L. Giles, "Crowd-sourcing Web knowledge for metadata extraction," 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), London, United Kingdom, 2014, pp. 141-144.
doi:10.1109/JCDL.2014.6970160
92 ms
(Ver 3.3 (11022016))