This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Signature File Methods for Text Retrieval
June 1995 (vol. 7 no. 3)
pp. 423-435

Abstract—Signature files have been studied extensively as an access method for textual databases. Many approaches have been proposed for searching signatures files efficiently. However, different methods make different assumptions and use different performance measures, making it difficult to compare their performance. In this paper, we study three basic methods proposed in the literature, namely, the indexed descriptor file, the two-level superimposed coding scheme, and the partitioned signature file approach. The contribution of this paper is two-fold. First, we present a uniform analytical performance model so that the methods can be compared fairly and consistently. The analysis shows that the two-level superimposed coding scheme, if stored in a transposed file, has the best performance. Second, we extend the two-level superimposed coding method into a multilevel superimposed coding method, we obtain the optimal number of levels for the multilevel method and show that for databases with reasonable size the optimal value is much larger than 2, which is assumed in the two-level method. The accuracy of the analytical formula is demonstrated by simulation.

[1] W.W. Chang and H.J. Schek,“A signature access method for the Starburst database system,” Proc. 15th Int’l Conf. Very Large Data Bases,Amsterdam, The Netherlands, Aug. 1989, pp. 145-153.
[2] S. Christodoulakis and C. Faloutsos,“Design considerations for a message file server,” IEEE Trans. Software Eng., vol 10, no. 2, pp. 201-210, Mar. 1984.
[3] S. Christodoulakis, M. Theodoridou, F. Ho, M. Papa, and A. Pathria, "Multimedia Document Presentation, Information Extraction, and Document Formation in MINOS: A Model and A System," ACM Trans. Office Information Systems, vol. 4, no. 4, pp. 345-383, Oct. 1986.
[4] U. Deppisch,“S-Tree: A dynamic balanced signature index for office retrieval,” Proc. ACM Conf. Research and Development Information Retrieval,Pisa, Italy, pp. 77-87, Sept. 1986.
[5] C. Faloutsos, “Access Methods for Text,” Computer Surveys, vol. 17, no. 1, pp. 49-74, 1985.
[6] C. Faloutsos, “Signature Based Text Retrieval Methods: A Survery,” IEEE Data Eng. Bull., vol. 13, no. 1, pp. 25-32, Mar. 1990.
[7] C. Faloutsos,“Signature files: Design and performance comparison of some signatureextraction methods,” Proc. 1985 SIGMOD Conf.,Austin, Tex., pp. 63-82, May, 1985.
[8] C. Faloutsos and S. Christodoulakis, "Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation," ACM Trans. Office Information Systems, vol. 2, pp. 267-288, Oct. 1984.
[9] C. Faloutsos and S. Christodoulakis,“Description and performance analysis of signature file methods for officefiling,” ACM Trans. Office Information Systems, vol. 5, no. 3, pp. 237-257, July 1987.
[10] C. Faloutsos and S. Christodoulakis,“Optimal signature extraction and information loss,” ACM Trans. Database Systems, vol. 12, no. 3, pp. 395-428, Sept. 1987.
[11] D.L. Lee,“A word-parallel, bit-serial signature processor for superimposed coding,” Proc. Second Int’l Conf. Data Engineering,Los Angeles, pp. 352-359, Feb. 1986.
[12] D.L. Lee and C.-W. Leng,“Partitioned signature files: Design issues and performance evaluation,” ACM Trans Information Systems, vol. 7, no. 2, pp. 158-180, Apr. 1989.
[13] D.L. Lee and C.-W. Leng,“A partitioned signature file structure for multiattribute and text retrieval,” Proc. Sixth Int’l Conf. Data Engineering, pp. 389-397,Los Angeles, Feb. 1990.
[14] C.W. Leng and D.L. Lee,“Optimal weight assignment for signature generation,” ACM Trans Database Systems, vol. 17, no. 2, pp. 346-373, June, 1992.
[15] D.J. Lipman and W.R. Pearson,“Rapid and sensitive protein similarity searches,” Science, vol. 227, pp. 1,435-1,441, Mar., 1985.
[16] J.L. Pfaltz,W.J. Berman,, and E.M. Cagley,“Partial match retrieval using indexed descriptor files,” Comm. ACM, vol. 23, no. 9, pp. 522-528, Sept., 1980.
[17] C.S. Roberts,“Partial-match retrieval via method of superimposed codes,” Proc. IEEE, vol. 67, no. 12, pp. 1,624-1,642, Dec. 1979.
[18] R. Sacks-Davis,A. Kent,, and K. Ramamohanarao,“Multikey access methods based on superimposed coding techniques,” ACM Trans. Database Systems, vol. 12, no. 4, pp. 655-696, Dec. 1987.
[19] R. Sacks-Davis, and K. Ramamohanarao,“A two-level superimposed coding scheme for partial match retrieval,” Information Systems, vol. 8, no. 4, pp. 273-280, 1983.
[20] C. Stanfill and B. Kahle, "Parallel Free-Text Search on the Connection Machine System," Comm. ACM, Dec. 1986, pp. 1229-1239.
[21] B. Yuwono and D.L. Lee,“A backend text retrieval machine for signature-based document ranking,” Proc. Int’l Conf. Computing and Information (ICCI’91), pp. 288-297,Ottawa, Canada, May 1991.

Index Terms:
Access methods, text retrieval, performance analysis, superimposed coding.
Citation:
Dik Lun Lee, Young Man Kim, Gaurav Patel, "Efficient Signature File Methods for Text Retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 3, pp. 423-435, June 1995, doi:10.1109/69.390248
Usage of this product signifies your acceptance of the Terms of Use.