This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Optimizing Top-k Selection Queries over Multimedia Repositories
August 2004 (vol. 16 no. 8)
pp. 992-1009

Abstract—Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Furthermore, unlike in the relational model, users may just want the k top-ranked objects for their selection queries for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. In this paper, we investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repository strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally with respect to our cost model and execution space when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus, both problems can be viewed together as an extended filtering problem to which techniques of query processing and optimization may be adapted.

[1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison-Wesley, 1999.
[2] D.S. Batory, On Searching Transposed Files ACM Trans. Database Systems, vol. 4, no. 4, Dec. 1979.
[3] N. Bruno, L. Gravano, and A. Marian, Evaluating Top-kQueries over Web-Accessible Databases Proc. Int'l Conf. Data Eng. (ICDE '02), 2002.
[4] M.J. Carey and L.M. Haas, Extensible Database Management Systems ACM SIGMOD Record, vol. 19, no. 4, Dec. 1990.
[5] M.J. Carey, L.M. Haas, P.M. Schwarz, M. Arya, W.F. Cody, R. Fagin, M. Flickner, A.W. Luniewski, W. Niblack, D. Petkovic, J.H. Williams, J. Thomas, and E.L. Wimmers, “Towards Heterogeneous Multimedia Information Systems: The Garlic Approach,” Proc. Fifth Int'l Workshop Research Issues in Data Eng. (RIDE): Distributed Object Management, 1995.
[6] M.J. Carey and D. Kossmann, On Saying `Enough Already!' in SQL Proc. ACM Int'l Conf. Management of Data (SIGMOD '97), May 1997.
[7] M.J. Carey and D. Kossmann, Reducing the Braking Distance of an SQL Query Engine Proc. 24th Int'l Conf. Very Large Databases (VLDB '98), Aug. 1998.
[8] K.C.-C. Chang and S.-W. Hwang, Minimal Probing: Supporting Expensive Predicates for Top-kQueries Proc. ACM Int'l Conf. Management of Data (SIGMOD '02), 2002.
[9] S. Chaudhuri and L. Gravano, Optimizing Queries over Multimedia Repositories Proc. ACM Int'l Conf. Management of Data (SIGMOD '96), June 1996.
[10] S. Chaudhuri and L. Gravano, Evaluating Top-kSelection Queries Proc. 25th Int'l Conf. Very Large Databases (VLDB '99), Sept. 1999.
[11] S. Chaudhuri and K. Shim, Optimization of Queries with User-Defined Predicates ACM Trans. Database Systems, vol. 24, no. 2, pp. 177-228, June 1999.
[12] D. Donjerkovic and R. Ramakrishnan, Probabilistic Optimization of Top$N$Queries Proc. 25th Int'l Conf. Very Large Databases (VLDB '99), Sept. 1999.
[13] R. Fagin, Combining Fuzzy Information from Multiple Systems Proc. 15th ACM Symp. Principles of Database Systems (PODS '96), June 1996.
[14] R. Fagin, Fuzzy Queries in Multimedia Database Systems Proc. 17th ACM Symp. Principles of Database Systems (PODS '98), June 1998.
[15] R. Fagin, Combining Fuzzy Information from Multiple Systems J. Computer and System Sciences, vol. 58, no. 1, Feb. 1999.
[16] R. Fagin, A. Lotem, and M. Naor, Optimal Aggregation Algorithms for Middleware Proc. 20th ACM Symp. Principles of Database Systems (PODS '01), 2001. Expanded version appears onhttp://www.almaden.ibm.com/cs/peoplefagin /.
[17] R. Fagin and E.L. Wimmers, Incorporating User Preferences in Multimedia Queries Proc. Sixth Int'l Conf. Database Theory (ICDT '97), Jan. 1997.
[18] C. Faloutsos and I. Kamel, Beyond Uniformity and Independence: Analysis of$R$Trees Using the Concept of Fractal Dimension Proc. 13th ACM Symp. Principles of Database Systems (PODS '94), May 1994.
[19] J. Fan, W. Aref, A. Elmagarmid, M.-S. Hacid, M. Marzouk, and X. Zhu, Multiview: Multilevel Video Content Representation and Retrieval J. Electronic Imaging, vol. 10, no. 4, Oct. 2001.
[20] L. Gravano, H. García-Molina, and A. Tomasic, GlOSS: Text-Source Discovery over the Internet ACM Trans. Database Systems, vol. 24, no. 2, June 1999.
[21] W.I. Grosky, Managing Multimedia Information in Database Systems Comm. ACM, vol. 40, no. 12, pp. 73-80, Dec. 1997.
[22] J.M. Hellerstein, Optimization Techniques for Queries with Expensive Methods ACM Trans. Database Systems, vol. 23, no. 2, pp. 113-157, Sept. 1998.
[23] T. Ibaraki and T. Kameda, On the Optimal Nesting Order for Computing$N$-Relational Joins ACM Trans. Database Systems, vol. 9, no. 3, pp. 482-502, 1984.
[24] A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn, Optimizing Disjunctive Queries with Expensive Predicates Proc. ACM Int'l Conf. Management of Data (SIGMOD '94), May 1994.
[25] A. Kemper, G. Moerkotte, and M. Steinbrunn, Optimizing Boolean Expressions in Object-Bases Proc. 18th Int'l Conf. Very Large Databases (VLDB '92), Aug. 1992.
[26] R. Krishnamurthy, H. Boral, and C. Zaniolo, Optimization of Nonrecursive Queries Proc. 12th Int'l Conf. Very Large Databases (VLDB '86), Aug. 1986.
[27] C. Mohan, D.J. Haderle, Y. Wang, and J.M. Cheng, Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques Proc. Int'l Conf. Extending Database Technology (EDBT '90), Mar. 1990.
[28] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, and C. Faloutsos, The QBIC Project: Querying Images by Content Using Color, Texture, and Shape Proc. Storage and Retrieval for Image and Video Databases (SPIE), pp. 173-187, Feb. 1993.
[29] M. Ortega, Y. Rui, K. Chakrabarti, K. Porkaew, S. Mehrotra, and T.S. Huang, Supporting Ranked Boolean Similarity Queries in MARS IEEE Trans. Knowledge and Data Eng., vol. 10, no. 6, pp. 905-925, Nov./Dec. 1998.
[30] M. Ozsu and D. Meechan, Finding Heuristics for Processing Selection Queries in Relational Database Systems Information Systems, vol. 15, no. 3, 1990.
[31] F. Rabitti and P. Savino, Retrieval of Multimedia Documents by Imprecise Query Specification Proc. Int'l Conf. Extending Database Technology (EDBT '90), Mar. 1990.
[32] A. Rosenthal and D.S. Reiner, An Architecture for Query Optimization Proc. 1982 ACM Int'l Conf. Management of Data (SIGMOD '82), June 1982.
[33] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
[34] P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie, and T.G. Price, Access Path Selection in a Relational Database Management System Proc. ACM Int'l Conf. Management of Data (SIGMOD '79), May 1979.
[35] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-Based Image Retrieval at the End of the Early Years IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[36] S.A. Williams, H. Press, B.P. Flannery, and W.T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing. Cambridge Univ. Press, 1993.
[37] E.L. Wimmers, L.M. Haas, M.T. Roth, and C. Braendli, Using Fagin's Algorithm for Merging Ranked Results in Multimedia Middleware Proc. Fourth IFCIS Int'l Conf. Cooperative Information Systems (CoopIS '99), Sept. 1999.

Index Terms:
Top-k query processing, multimedia databases, information search, information retrieval.
Citation:
Surajit Chaudhuri, Luis Gravano, Am?lie Marian, "Optimizing Top-k Selection Queries over Multimedia Repositories," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 8, pp. 992-1009, Aug. 2004, doi:10.1109/TKDE.2004.30
Usage of this product signifies your acceptance of the Terms of Use.