22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008) BibPro: A Citation Parser Based on Sequence Alignment Techniques March 25-March 28 ISBN: 978-0-7695-3096-3
The dramatic increase in the number of academic publications has led to a growing demand for efficient organization of the resources to meet researchers’ specific needs. As a result, a number of network services have compiled databases from the public resources scattered over the Internet. However, publications in different conferences and journals follow different citation formats, so the problem of accurately extracting metadata from a publication string has also attracted a great deal of attention in recent years. In this paper, we extend our previous work to propose a new tool called BibPro for extracting metadata from citation strings by using a gene sequence alignment tool. The main enhancement of BibPro to our previously tool is that BibPro does not need knowledge databases (e.g., an author name database) to generate feature indices for citation strings. Instead, only the order of punctuation marks in a citation string is used to represent its format. Second, BibPro employs the Basic Local Alignment Search Tool (BLAST) to find the most similar citation formats in database and then uses the Needleman-Wunsch algorithm to choose the best-fit citation format as the extraction template. Our experimental results show that, in terms of precision and recall, BibPro outperforms other existent systems (e.g., INFOMAP and ParaCite), and BibPro can scale well.
Index Terms:
Citation Parser, Sequence Alignment, Digital Library
Citation:
Chien-Chih Chen, Kai-Hsiang Yang, Hung-Yu Kao, Jan-Ming Ho, "BibPro: A Citation Parser Based on Sequence Alignment Techniques," ainaw, pp.1175-1180, 22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008), 2008 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||