2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS'02)
Toward Content Based Retrieval from Scientific Text Corpora
Divnomorskoe, Russia
September 05-September 10
ISBN: 0-7695-1733-1
The growth of digitally available text information has created a need for effective information retrieval and text mining tools. We have used a content-based retrieval method that is built on a prototype-matching technique for clustering scientific text corpora, which in our case are the abstracts from The Hawaii International Conference on System Science 2001. Our aim is to retrieve the documents from a conference paper collection according to similarities in their contents and semantic structures. The method consists of "smart" document encoding on word and sentence levels, creating common word and sentence histograms using a vector quantization algorithm, and matching those histograms for every for document retrieval. In the paper, we position our methods among the existing document clustering methods, explain the motivation behind the clustering of scientific conference papers, and give an example of using our prototype tool for content-based retrieval on the scientific abstract collection. The method offers a promising alternative for retrieval by content.
Index Terms:
information retrieval, prototype matching, text
Citation:
Antonina Kloptchenko, Barbro Back, Ari Visa, Jarmo Toivonen, Hannu Vanharanta, "Toward Content Based Retrieval from Scientific Text Corpora," icais, pp.444, 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS'02), 2002