This Article 
 Bibliographic References 
 Add to: 
A Stochastic Programming Approach for Range Query Retrieval Problems
July/August 2002 (vol. 14 no. 4)
pp. 867-880

One of the important issues in range query (RQ) retrieval problems is to determine the key's resolution for multiattribute records. Conventional models need to be improved because of potential degeneracy, less desired computability, and possible inconsistency with the partial match query (PMQ) models. This paper presents a new RQ model to overcome these drawbacks and introduces a new methodology, stochastic programming (SP), to conduct the optimization process. The model is established by using a monotone increasing function to characterize range sizes. Three SP approaches, wait-and-see (WS), here-and-now (HN), and scenario tracking (ST) methods are integrated into this RQ model. Analytical expressions of the optimal solution are derived. It seems that HN has advantage over WS because the latter usually involves complicated multiple summations or integrals. For the ST method, a nonlinear programming software package is designed. Results of numerical experiments are presented that optimized a 10-dimensional RQ model and tracked a middle size (100) and a large size (1,000) scenarios.

[1] K.A.S. Abdel-Ghaffar and A. El. Abbadi, "Optimal Disk Allocation for Partial Match Queries," ACM Trans. Database Systems, vol. 18, no. 1, pp. 132-156, 1993.
[2] A.V. Aho and J.D. Ullman, "Optimal Partial-Match Retrieval When Fields Are Independently Specified," ACM Trans. Database Systems, vol. 4, no. 2, pp. 168-179, 1979.
[3] L. Arge, V. Samoladas, and J.S. Vitter, “On Two-Dimensional Indexability and Optimal Range Search Indexing,” Proc. Principles of Database Systems (PODS '99), pp. 346–357, May 1999.
[4] J.L. Bentley and J.H. Friedman, "Data Structures for Range Searching," ACM Computing Surveys, vol. 11, no. 4, pp. 397-409, 1979.
[5] K.P. Bogart, Introductory Combinatorics, second ed., Harcourt Brace Jova novich, 1990.
[6] A. Bolour, "Optimality Properties of Multiple Key Hashing Functions," J. Assoc. Computing, vol. 26, no. 2, pp. 196-210, 1979.
[7] W. Burkhard, “Hashing and Trie Algorithms for Partial Match Retrieval,” ACM Trans. Database Systems, vol. 1, no. 2, pp. 175-187, 1976.
[8] W. Burkhard, “Interpolation-Based Index Maintenance,” Proc. Second ACM SIGACT-SIGMOD Symp. Principals of Database Systems, (PODS), pp. 76-89, Mar. 1983.
[9] A.F. Cardenas, "Analysis and Performance of Inverted Data Base Structures," Comm. ACM, vol. 18, no. 5, pp. 253-263, May 1975.
[10] C. Chen, H. Lin, C. Chang, and R. Lee, “Optimal Bucket Allocation Design of k-ary MKH Files for Partial Match Retrieval,” IEEE Trans. Knowledge and Data Eng., vol. 9, no. 1, pp. 148-160, 1997.
[11] R. Courant, “Variational Methods for the Solution of Problems of Equilibrium and Vibration,” Bull. Am. Math. Soc., vol. 49, pp. 1-23, 1943.
[12] R. Dembo, “Scenario Optimization,” Annals of Operations Research, vol. 30, pp. 63-80, 1991.
[13] M. Dempster, Stochastic Programming. New York: Academic Press, 1980.
[14] H.C. Du and J.S. Sobolewski, "Disk Allocation for Product Files on Multiple Disk Systems," ACM Trans. Database Systems, vol. 7, Mar. 1982.
[15] C. Faloutsos, “Access Methods for Text,” Computer Surveys, vol. 17, no. 1, pp. 49-74, 1985.
[16] C. Faloutsos, "Gray Codes for Partial Match and Range Queries," IEEE Trans. Software Eng., vol. 14, no. 10, pp. 1,381-1,393, Oct. 1987.
[17] C. Faloutsos and R. Chan, “Fast Text Access Methods for Optical and Large Magnetic Disks: Design and Performance Comparison,” Proc. 14th Int'l Conf. Very Large Databases, pp. 280-293, Sept. 1988.
[18] M. Fredman, “A Lower Bound on the Complexity of Orthogonal Range Queries,” J. ACM, vol. 28, no. 4, pp. 696-705, 1981.
[19] M. Freeston, “The BANG File: A New Kind of Grid File,” Proc. ACM SIGMOD Conf. Management of Data, 1987.
[20] R. Gustafson, “Elements of the Randomized Combinatorial File Structure,” ACM SIGIR Proc. Symp. Information Storage and Retrieval, pp. 163-174, Apr. 1971.
[21] E. Harris and K. Ramamohanarao, “Optimal Dynamic Multi-Attribute Hashing for Range Queries,” BIT, vol. 33, pp. 561-579, 1993.
[22] J. Hellerstein, E. Koutsoupias, and C. Papadimitriou, “On the Analysis of Indexing Schemes,” Proc. Principles of Database Systems (PODS '97), pp. 249–256, May 1997.
[23] M. Hestenes, “Multiplier and Gradient Method,” J. Optimization Theory and Applications, vol. 3, pp. 303-320, 1969.
[24] P. Kall, Stochastic Linear Programming. Berlin, Heidelberg: Springer-Verlag, 1976.
[25] P. Larson, “Linear Hashing with Partial Expansions,” Proc. Sixth Conf. Very Large Data Bases, pp. 224-232, Oct. 1980.
[26] D. Lee, Y. Kim, and G. Patel, “Efficient Signature File Methods for Text Retrieval” IEEE Trans. Knowledge and Data Eng., vol. 7, no. 3, pp. 423-435, 1995.
[27] W. Litwin, “Linear Hashing: A New Tool for File and Table Addressing,” Proc. Sixth Conf. Very Large Data Bases, pp. 212-223, Oct. 1980.
[28] X. Liu and G. Slemon, “An Improved Method of Optimization for Electrical Machines,” IEEE Trans. Energy Conversion, pp. 492-496, vol. 6, no. 3, 1991.
[29] J. Lloyd, “Optimal Partial-Match Retrieval,” BIT, vol. 20, pp. 406-413, 1980.
[30] J. Lloyd and K. Ramamohanarao, “Partial-Match Retrieval for Dynamic Files,” BIT, vol. 22, pp. 150-168, 1982.
[31] J. Nievergelt, H. Hinterberger, and K.C. Sevcik, "The Grid File: An Adaptable, Symmetric Multikey File Structure," ACM Trans. Database Systems, vol. 9, no. 1, pp. 38-71, Mar. 1984.
[32] M. Ouksel and P. Scheuermann, “Storage Mapping for MultiDimensional Linear Dynamic Hashing,” Proc. Second ACM SIGACT-SIGMOD Symp. Principals of Database Systems, (PODS), pp. 90-105, Mar. 1983.
[33] M. Ouksel, “The Interpolation-Based Grid File,” Proc. Fourth ACM SIGACT-SIGMOD Symp. Principals of Database Systems, (PODS), pp. 20-27, Mar. 1985.
[34] M. Ouksel and O. Mayer, “A Robust and Efficient Spatial Data Structure,” Acta Informatica, vol. 29, pp. 335-373, 1992.
[35] M. Powell, “An Efficient Method for Finding the Minimum of a Function of Several Variables without Calculating Derivatives,” Computer J., vol. 7, pp. 155-162, 1964.
[36] M. Powell, “A Method for Nonlinear Constraints in Minimization Problems,” Optimization: Symp Institute of Math. and Its Applications, R. Fletcher, ed., 1969.
[37] K. Ramamohanarao, J. Lloyd, and J. Thom, “Partial-Match Retrieval Using Hashing and Descriptors,” ACM Trans. Database Systems, vol. 8, no. 4, pp. 552-576, 1983.
[38] R. Rivest, “Partial Match Retrieval Algorithms,” SIAM J. Computing, vol. 5, pp. 19-50, 1976.
[39] J.T. Robinson, “The K-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 10-18, 1981.
[40] R. Rockafellar, “Augmented Lagrange Multiplier Functions and Duality in Nonconvex Programming,” SIAM, J. Control, vol. 12, pp. 268-285, 1974.
[41] J.B. Rothnie and T. Lozano, “Attribute Based File Organization in a Paged Memory Environment,” Comm. ACM, vol. 17, no. 2, pp. 63–69, Feb. 1974.
[42] B. Salzberg, File Structures. New Jersey: Englewood Cliffs, Prentice-Hall, 1988.
[43] H. Samet,“The quadtree and related hierarchical data structures,” ACM Computing Surveys, vol. 16, no. 2, pp. 187-260, June 1984.
[44] P. Scheuermann and M. Ouksel, “Multidimensional B-Trees for Associative Searching in Database Systems,” Information Systems, vol. 7, no. 2, pp. 123-137, 1982.
[45] D. Solow, Linear Programming. New York: Elsevier Science Publishing, 1984.
[46] G. Tintner, “The Pure Theory of Production Under Technological Risk and Uncertainty,” Econometrica, vol. 9, pp. 298-304, 1941.
[47] J. Ullman, Principles of Database and Knowledge-Base Systems, vol. 1. Computer Science Press, 1988.

Index Terms:
Multiattribute hashing, partial match query, physical data organization, range query, stochastic programming.
Xian Liu, Wilsun Xu, "A Stochastic Programming Approach for Range Query Retrieval Problems," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 867-880, July-Aug. 2002, doi:10.1109/TKDE.2002.1019219
Usage of this product signifies your acceptance of the Terms of Use.