This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Multidimensional Fuzzy Search for Personal Information Management Systems
Sept. 2012 (vol. 24 no. 9)
pp. 1584-1597
Wei Wang, Rutgers University, Piscataway
Christopher Peery, Rutgers University, Piscataway
Amélie Marian, Rutgers University, Piscataway
Thu D. Nguyen, Rutgers University, Piscataway
With the explosion in the amount of semistructured data users access and store in personal information management systems, there is a critical need for powerful search tools to retrieve often very heterogeneous data in a simple and efficient way. Existing tools typically support some IR-style ranking on the textual part of the query, but only consider structure (e.g., file directory) and metadata (e.g., date, file type) as filtering conditions. We propose a novel multidimensional search approach that allows users to perform fuzzy searches for structure and metadata conditions in addition to keyword conditions. Our techniques individually score each dimension and integrate the three dimension scores into a meaningful unified score. We also design indexes and algorithms to efficiently identify the most relevant files that match multidimensional queries. We perform a thorough experimental evaluation of our approach and show that our relaxation and scoring framework for fuzzy query conditions in noncontent dimensions can significantly improve ranking accuracy. We also show that our query processing strategies perform and scale well, making our fuzzy search approach practical for every day usage.

[1] S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis, "Automated Ranking of Database Query Results," Proc. First Biennial Conf. Innovative Data Systems Research (CIDR '03), 2003.
[2] S. Amer-Yahia, P. Case, T. Rölleke, J. Shanmugasundaram, and G. Weikum, "Report on the DB/Ir Panel at SIGMOD 2005," SIGMOD Record, vol. 34, no. 4, pp. 71-74, 2005.
[3] S. Amer-Yahia, S. Cho, and D. Srivastava., "Tree Pattern Relaxation," Proc. Int'l Conf. Extending Database Technology (EDBT), 2002.
[4] S. Amer-Yahia, N. Koudas, A. Marian, D. Srivastava, and D. Toman, "Structure and Content Scoring for XML," Proc. Int'l Conf. Very Large Databases (VLDB), 2005.
[5] S. Amer-Yahia, L.V.S. Lakshmanan, and S. Pandit, "FleXPath: Flexible Structure and Full-Text Querying for XML," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2004.
[6] Lucene, http:/lucene.apache.org, 2012.
[7] R.A. Baeza-Yates and M.P. Consens, "The Continued Saga of DB-IR Integration," Proc. Int'l Conf. Very Large Databases (VLDB), 2004.
[8] C.M. Bowman, C. Dharap, M. Baruah, B. Camargo, and S. Potti, "A File System for Information Management," Proc. Int'l Conf. Intelligent Information Management Systems (ISMM), 1994.
[9] N. Bruno, N. Koudas, and D. Srivastava, "Holistic Twig Joins: Optimal Xml Pattern Matching," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2002.
[10] Y. Cai, X.L. Dong, A. Halevy, J.M. Liu, and J. Madhavan, "Personal Information Management with SEMEX," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2005.
[11] D. Carmel, Y.S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer, "Searching XML Documents via XML Fragments," Proc. ACM Int'l Conf. Research and Development in Information Retrieval (SIGIR), 2003.
[12] S. Chaudhuri, R. Ramakrishnan, and G. Weikum, "Integrating DB and IR Technologies: What Is the Sound of one Hand Clapping?," Proc. Conf. Innovative Data Systems Research (CIDR), 2005.
[13] J. Chen, H. Guo, W. Wu, and C. Xie, "Search Your Memory! - An Associative Memory Based Desktop Search System," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2009.
[14] P.-J. Dittrich and M.A. Vaz.Salles, "iDM: A Unified and Versatile Data Model for Personal Dataspace Management," Proc. Int'l Conf. Very Large Databases (VLDB), 2006.
[15] R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," J. Computer and System Sciences, vol. 66, pp. 614-656, 2003.
[16] M. Franklin, A. Halevy, and D. Maier, "From Databases to Dataspaces: A New Abstraction for Information Management," SIGMOD Record, vol. 34, no. 4, pp. 27-33, 2005.
[17] N. Fuhr and K. Großjohann, "XIRQL: An XML Query Language Based on Information Retrieval Concepts," ACM Trans. Information Systems, vol. 22, no. 2, pp. 313-356, 2004.
[18] Google Desktop, http:/desktop.google.com, 2012.
[19] K.A. Gyllstrom, C. Soules, and A. Veitch, "Confluence: Enhancing Contextual Desktop Search," Proc. ACM Int'l Conf. Research and Development in Information Retrieval (SIGIR), 2007.
[20] INEX, http:/inex.is.informatik.uni-duisburg.de /, 2012.
[21] D.R. Karger, K. Bakshi, D. Huynh, D. Quan, and V. Sinha, "Haystack: A General Purpose Information Management Tool for End Users of Semistructured Data," Proc. Conf. Innovative Data Systems Research (CIDR), 2005.
[22] C.S. Khoo, B. Luyt, C. Ee, J. Osman, H.-H. Lim, and S. Yong, "How Users Organize Electronic Files on Their Workstations in the Office Environment: A Preliminary Study of Personal Information Organization Behaviour," Information Research, vol. 11, no. 2, p. 293, 2007.
[23] Q. Li and B. Moon, "Indexing and Querying XML Data for Regular Path Expressions," Proc. Int'l Conf. Very Large Databases (VLDB), 2001.
[24] C. Peery, W. Wang, A. Marian, and T.D. Nguyen, "Multi-Dimensional Search for Personal Information Management Systems," Proc. Int'l Conf. Extending Database Technology (EDBT), 2008.
[25] Sleepycat Software, Berkeley DB, http:/www.sleepycat.com/, 2012.
[26] Apple MAC OS X spotlight, http://www.apple.com/macosx/featuresspotlight . 2012.
[27] J. Teevan, C. Alvarado, M. Ackerman, and D. Karger, "The Perfect Search Engine Is Not Enough: A Study of Orienteering Behavior in Directed Search," Proc. Conf. Human Factors in Computing Systems (SIGCHI), 2004.
[28] M. Theobald, H. Bast, D. Majumdar, R. Schenkel, and G. Weikum, "TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data," VLDB J., vol. 17, no. 1, pp. 81-115, 2008.
[29] W. Wang, C. Peery, A. Marian, and T.D. Nguyen, "Efficient Multi-Dimensional Query Processing in Personal Information Management Systems," Technical Report DCS-TR-627, Dept. of Computer Science, Rutgers Univ., 2008.
[30] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, Inc., 1999.
[31] Z. Xu, M. Karlsson, C. Tang, and C. Karamanolis, "Towards a Semantic-Aware File Store," Proc. Workshop Hot Topics in Operating Systems (HotOS), 2003.

Index Terms:
Proposals,Query processing,Indexing,XML,Optimization,Equations,personal information management system,Information retrieval,multidimensional search,query processing
Citation:
Wei Wang, Christopher Peery, Amélie Marian, Thu D. Nguyen, "Efficient Multidimensional Fuzzy Search for Personal Information Management Systems," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 9, pp. 1584-1597, Sept. 2012, doi:10.1109/TKDE.2011.126
Usage of this product signifies your acceptance of the Terms of Use.