2008 The Eighth IAPR International Workshop on Document Analysis Systems Keyword Matching in Historical Machine-Printed Documents Using Synthetic Data, Word Portions and Dynamic Time Warping September 16-September 19 ISBN: 978-0-7695-3337-7
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DAS.2008.64
In this paper we propose a novel and efficient technique for finding keywords typed by the user in digitised Machine-printed historical documents using the Dynamic Time Warping (DTW) algorithm. The method uses word portions located at the beginning and end of each segmented word of the processed documents and try to estimate the position of the first and last characters in order to reduce the list of candidate words. Since DTW can become computational intensive in large datasets the proposed method manages to significantly prune the list of candidate words thus, speeding up the entire process. Word length is also used as a means of further reducing the data to be processed. Results are improved in terms of time and efficiency compared to those produced if no pruning is done to the list of candidate words.
Index Terms:
Historical Documents, Indexing, Dynamic Time Warping
Citation:
Thomas Konidaris, B. Gatos, S.J. Perantonis, A. Kesidis, "Keyword Matching in Historical Machine-Printed Documents Using Synthetic Data, Word Portions and Dynamic Time Warping," das, pp.539-545, 2008 The Eighth IAPR International Workshop on Document Analysis Systems, 2008 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||