This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourier Domain Scoring: A Novel Document Ranking Method
May 2004 (vol. 16 no. 5)
pp. 529-539

Abstract—Current document retrieval methods use a vector space similarity measure to give scores of relevance to documents when related to a specific query. The central problem with these methods is that they neglect any spatial information within the documents in question. We present a new method, called Fourier Domain Scoring (FDS), which takes advantage of this spatial information, via the Fourier transform, to give a more accurate ordering of relevance to a document set. We show that FDS gives an improvement in precision over the vector space similarity measures for the common case of Web like queries, and it gives similar results to the vector space measures for longer queries.

[1] M. Marchiori, The Quest for Correct Information on the Web: Hyper Search Engines Computer Networks and ISDN Systems, vol. 29, pp. 1225-1235, 1997.
[2] D. Siaw, W. Ngu, and X. Wu, Site Helper: A Localised Agent That Helps Incremental Exploration of the World Wide Web Computer Networks and ISDN Systems, vol. 29, pp. 1249-1255, 1997.
[3] S.J. Carruère and R. Kazman, Webquery: Searching and Visualising the Web through Connectivity Computer Networks and ISDN Systems, vol. 29, pp. 1257-1267, 1997.
[4] E. Spertus, Parasite: Mining Structural Information on the Web Computer Networks and ISDN Systems, vol. 29, pp. 1205-1215, 1997.
[5] O. Etzioni, Moving up the Information Food Chain AI Magazine, vol. 18, pp. 11-18, Am. Assoc. for Artificial Intelligence, Summer 1997.
[6] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, 1999.
[7] C. Buckley and J. Walz, SMART in TREC 8 Proc. Eighth Text Retrieval Conf., pp. 577-582, Nov. 1999.
[8] C. Buckley, A. Singhal, M. Mitra, and G. Salton, New Retrieval Approaches Using Smart: TREC 4 Proc. Fourth Text Retrieval Conf., pp. 25-48, Nov. 1995.
[9] S.E. Robertson and S. Walker, Okapi/Keenbow at TREC-8 Proc. Eighth Text Retrieval Conf., pp. 151-162, Nov. 1999.
[10] K. Yang and K. Maglaughlin, IRIS at TREC-8 Proc. Eighth Text Retrieval Conf., pp. 645-656, Nov. 1999.
[11] J. Allan, J. Callan, F.-F. Feng, and D. Malin, Inquery and Trec-8 Proc. Eighth Text Retrieval Conf., pp. 637-644, Nov. 1999.
[12] J. Zobel and A. Moffat, Exploring the Similarity Space Proc. ACM SIGIR Forum, vol. 32, pp. 18-34, Spring 1998.
[13] S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine Computer Networks and ISDN Systems, vol. 30, nos. 1-7, pp. 107-117, Apr. 1998.
[14] D. Hawking and P. Thistlewaite, Proximity Operators So Near and Yet So Far Proc. Fourth Text Retrieval Conf., pp. 131-144, Nov. 1995.
[15] D.S. Ebert, A. Zwa, and E.L. Miller, Two-Handed Volumetric Document Corpus Mangement IEEE Computer Graphics and Applications, vol. 17, no. 4, pp. 60-62, July/Aug. 1997.
[16] M.W. Berry, S.T. Dumais, and G.W. O'Brien, Using Linear Algebra for Intelligent Information Retrieval Technical Report, Computer Science Dept., The Univ. of Tennessee, K noxville, Dec. 1994.
[17] L.A.F. Park, M. Palaniswami, and R. Kotagiri, Internet Document Filtering Using Fourier Domain Scoring Principles of Data Mining and Knowledge Discovery, L. de Raedt and A. Siebes, eds., pp. 362-373, Springer-Verlag, pp. 362-373 Sept. 2001.
[18] S.T. Dumais, Improving the Retrieval of Information from External Sources Behaviour Research Methods, Instruments&Computers, vol. 23, no. 2, pp. 229-236, 1991.
[19] Wikipedia, Nyquist-Shannon Sampling Theorem,http://www.wikipedia.org/wikiNyquist-Shannon_sampling_theorem , Feb. 2003.
[20] M.F. Porter, An Algorithm for Suffix Stripping Program, vol. 14, no. 3, pp. 130-137, 1980.
[21] Proc. Text Retrieval Conf., Nat'l Inst. of Standards and Technology,http:/trec.nist.gov/, 2001.
[22] Proc. Eighth Text Retrieval Conf., E.M. Voorhees and D.K. Harman, eds., Nat'l Inst. of Standards and Technology special publication 500-246, Dept. of Commerce, Nov. 1999.
[23] Proc. Fourth Text Retrieval Conf., D. Harman, ed., Nat'l Inst. of Standards and Technology special publication 500-236, Nov. 1995.

Index Terms:
Fourier domain scoring, information retrieval, search engine, vector space similarity measure, document ranking, Fourier transform, term signal.
Citation:
Laurence A.F. Park, Kotagiri Ramamohanarao, Marimuthu Palaniswami, "Fourier Domain Scoring: A Novel Document Ranking Method," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 5, pp. 529-539, May 2004, doi:10.1109/TKDE.2004.1277815
Usage of this product signifies your acceptance of the Terms of Use.