Issue No. 10 - Oct. (2016 vol. 28)
Yating Zhang , Graduate School of Informatics, Kyoto University, Kyoto, Japan
Adam Jatowt , Graduate School of Informatics, Kyoto University, Kyoto, Japan
Sourav S. Bhowmick , School of Computer Science and Engineering, Nanyang Technological University, Singapore
Katsumi Tanaka , Graduate School of Informatics, Kyoto University, Kyoto, Japan
Numerous archives and collections of past documents have become available recently thanks to mass scale digitization and preservation efforts. Libraries, national archives, and other memory institutions have started opening up their collections to interested users. Yet, searching within such collections usually requires knowledge of appropriate keywords due to different context and language of the past. Thus, non-professional users may have difficulties with conceptualizing suitable queries, as, typically, their knowledge of the past is limited. In this paper, we propose a novel approach for the
temporal correspondence detection task that requires finding terms in the past which are semantically closest to a given input present term. The approach we propose is based on vector space transformation that maps the distributed word representation in the present to the one in the past. The key problem in this approach is obtaining correct training set that could be used for a variety of diverse document collections and arbitrary time periods. To solve this problem, we propose an effective technique for automatically constructing seed pairs of terms to be used for finding the transformation. We test the performance of proposed approaches over short as well as long time frames such as 100 years. Our experiments demonstrate that the proposed methods outperform the best-performing baseline by 113 percent for the New York Times Annotated Corpus and by 28 percent for the Times Archive in MRR on average, when the query has a different literal form from its temporal counterpart.
Context, Semantics, Buildings, Training, Informatics, Libraries, Portable media players
Y. Zhang, A. Jatowt, S. S. Bhowmick and K. Tanaka, "