loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Australasian Computer Science Conference (ACSC '01)
Efficiency of Data Structures for Detecting Overlaps in Digital Documents
Gold Coast, Queensland, Australia
January 29-February 02
ISBN: 0-7695-0963-0
Krisztián Monostori, Monash University
Arkady Zaslavsky, Monash University
Heinz Schmidt, Monash University
This paper analyses the efficiency of different data structures for detecting overlap in digital documents. Most existing approaches use some hash function to reduce the space requirements for their indices of chunks. Since a hash function can produce the same value for different chunks, false matches are possible. In this paper we propose an algorithm that can be used for eliminating those false matches. This algorithm uses a suffix tree structure, which is space consuming. We define a modified suffix tree that only considers chunks starting at the beginning of words and we show how the algorithm can work on this structure. We can alternatively reduce space requirements of a suffix tree by converting it to a directed acyclic graph. We show that suffix link information can be preserved in this new structure and the matching statistics algorithm still works with those modifications that we propose.
Citation:
Krisztián Monostori, Arkady Zaslavsky, Heinz Schmidt, "Efficiency of Data Structures for Detecting Overlaps in Digital Documents," acsc, pp.140, Australasian Computer Science Conference (ACSC '01), 2001
Usage of this product signifies your acceptance of the Terms of Use.