|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Third ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'03)
Automatic Document Metadata Extraction Using Support Vector Machines
Houston, Texas USA
May 27-May 31
ISBN: 0-7695-1939-3
| ASCII Text | x | ||
| Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, Edward A. Fox, "Automatic Document Metadata Extraction Using Support Vector Machines," Digital Libraries, Joint Conference on, pp. 37, Third ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'03), 2003. | |||
| BibTex | x | ||
| @article{ 10.1109/JCDL.2003.1204842, author = {Hui Han and C. Lee Giles and Eren Manavoglu and Hongyuan Zha and Zhenyue Zhang and Edward A. Fox}, title = {Automatic Document Metadata Extraction Using Support Vector Machines}, journal ={Digital Libraries, Joint Conference on}, volume = {0}, year = {2003}, isbn = {0-7695-1939-3}, pages = {37}, doi = {http://doi.ieeecomputersociety.org/10.1109/JCDL.2003.1204842}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Digital Libraries, Joint Conference on TI - Automatic Document Metadata Extraction Using Support Vector Machines SN - 0-7695-1939-3 SP EP A1 - Hui Han, A1 - C. Lee Giles, A1 - Eren Manavoglu, A1 - Hongyuan Zha, A1 - Zhenyue Zhang, A1 - Edward A. Fox, PY - 2003 KW - null VL - 0 JA - Digital Libraries, Joint Conference on ER - | |||
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer[17] and EbizSearch[24]. We believe it can be generalized to other digital libraries.
Citation:
Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, Edward A. Fox, "Automatic Document Metadata Extraction Using Support Vector Machines," jcdl, pp.37, Third ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.
