6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007) Answering English Queries in Automatically Transcribed Arabic Speech Melbourne, Australia July 11-July 13 ISBN: 0-7695-2841-4
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICIS.2007.61
There are several well-known approaches to parsing Arabic text in preparation for indexing and retrieval. Techniques such as stemming and stopping have been shown to improve search results on written newswire dispatches, but few comparisons are available on other data sources. In this paper, we apply several alternative stemming and stopping approaches to Arabic text automatically extracted from the audio soundtrack of news video footage, and compare these with approaches that rely on machine translation of the underlying text. Using the TRECVID video collection and queries, we show that normalisation, stop-word-removal, and light stemming increase retrieval precision, but that heavy stemming and trigrams have a negative effect. We also show that the choice of machine translation engine plays a major role in retrieval effectiveness.
Index Terms:
Arabic information retrieval, Cross-language information retrieval, Machine translation.
Citation:
Abdusalam F. A. Nwesri, S. M. M. Tahaghoghi, Falk Scholer, "Answering English Queries in Automatically Transcribed Arabic Speech," icis, pp.11-16, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007), 2007 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||