This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Allocating Data and Operations to Nodes in Distributed Database Design
April 1995 (vol. 7 no. 2)
pp. 305-317

Abstract—The allocation of data and operations to nodes in a computer communications network is a critical issue in distributed database design. An efficient distributed database design must trade off performance and cost among retrieval and update activities at the various nodes. It must consider the concurrency control mechanism used as well as capacity constraints at nodes and on links in the network. It must determine where data will be allocated, the degree of data replication, which copy of the data will be used for each retrieval activity, and where operations such as select, project, join, and union will be performed. We develop a comprehensive mathematical modeling approach for this problem. The approach first generates units of data (file fragments) to be allocated from a logical data model representation and a characterization of retrieval and update activities. Retrieval and update activities are then decomposed into relational operations on these fragments. Both fragments and operations on them are then allocated to nodes using a mathematical modeling approach. The mathematical model considers network communication, local processing, and data storage costs. A genetic algorithm is developed to solve this mathematical formulation.

[1] P.M.G. Apers,“Data allocation in distributed database systems,” ACM Trans. on Database Systems, vol. 13, no. 3, pp. 263-304, Sept. 1988.
[2] P.A. Bernstein and N. Goodman, "An Algorithm for Concurrency Control and Recovery in Replicated Distributed Databases," ACM Trans. Database Systems, vol. 9, no. 4, pp. 596-615, Dec. 1984.
[3] R. Blankinship,A.R. Hevner,, and S.B. Yao,“An iterative method for distributed database design,” Proc. of the 17th Intl. Conf. on Very Large Data Bases,Barcelona, Spain, pp. 389-400, Sept. 1991.
[4] J.V. Carlis and S.T. March,“A computer aided physical database design methodology,” Computer Performance, vol. 4, no. 4, pp. 198-214, Dec. 1983.
[5] J.V. Carlis and S.T. March,“A descriptive model of physical database design problems and solutions,” Proc. Intl. Conf. on Data Engineering, IEEE, April24-27, 1984,Los Angeles, Calif.
[6] S. Ceri and G. Pelagatti, Distributed Databases: Principles and Systems.New York: McGraw-Hill, 1984.
[7] S. Christodoulakis,“Implications of certain assumptions in database performance evaluation,” ACM Trans. on Database Systems, vol. 9, no. 2, pp. 163-186, June 1984.
[8] D.W. Cornell and P.S. Yu,“On optimal site assignment for relations in the distributed database environment,” IEEE Trans. on Software Engineering, vol. 15, no. 8, pp. 1004-1009, Aug. 1989.
[9] K.A. De Jong and W.M. Spears,“Using genetic algorithms to solve NP-complete problems,” Proc. 3rd Intl. Conf. on Genetic Algorithms, Morgan Kaufmann Publishers, pp. 124-132, June4-7, 1989.
[10] L.W. Dowdy and D.V. Foster, "Comparative Models of the File Assignment Problem," ACM Computing Surveys, vol. 14, no. 2, 1982.
[11] D. Gardy and C. Puech, “On the Effect of Join Operations on Relation Sizes,” ACM Trans. Database Systems, vol. 14, no. 4, pp. 574-603, Dec. 1989.
[12] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Mass.: Addison-Wesley, 1989.
[13] A.R. Hevner,“Data allocation and retrieval in distributed systems,” Advances in Data Management, Vol. II, New York, N.Y.: Wiley Press, 1983.
[14] A.R. Hevner, and S.B. Yao,“Data allocation and retrieval in distributed systems,” IEEE Trans. on Software Engineering, vol. 5, no. 5, pp. 177-187, May 1979.
[15] J.H. Holland, Adaptation in Natural and Artificial Systems. Univ. of Michigan Press, 1975.
[16] G.V. Hubbard,Computer Aided Data Base Design,New York, N.Y.: Van Nostrand-Reinhold, 1981.
[17] W. Kim,“Global optimization of relational queries: A first step,” in Kim, Reiner, and Batory (eds), Query Processing in Database Systems,Berlin: Springer-Verlag, 1985, pp. 206-216.
[18] M.E.S. Loomis,Data Management and File Processing,Englewood Cliffs, N.J.: Prentice-Hall Inc., 1983.
[19] S.T. March,“Techniques for structuring database records,” ACM Computing Surveys, vol. 15, no. 1, pp. 45-79, March 1983.
[20] S.T. March and J.V. Carlis,“On the interdependencies of record structure and access path design,” Journal of Management Information Systems, vol. 4, no. 2, pp. 45-73, Fall 1987.
[21] T.M. Ozsu and P. Valduriez, Principles of Distributed Database Systems. Prentice Hall, 1991.
[22] S.B. Navathe, S. Ceri, G. Wiederhold, and J. Dou, “Vertical Partitioning Algorithms for Database Design,” ACM Trans. Database Systems, vol. 9, no. 4, 1984.
[23] P. Palvia and S.T. March,“Approximating block accessing in database organizations,” Information Processing Letters, vol. 19, pp. 75-79, 1984.
[24] S. Ram and S. Narasimhan,“Allocation of databases in a distributed database system,” Proc. of the 11th Intl. Conf. on Information Systems,Copenhagen, Denmark, pp. 215-230, Dec.16-19, 1990.
[25] G. Sacco,“Fragmentation: A technique for efficient query processing,” ACM Trans. on Database Systems, vol. 11, no. 2, pp. 113-133, June 1986.
[26] T.J. Teorey,“Distributed database design: A practical approach and example,” SIGMOD Record, vol. 18, no. 4, pp. 23-39, Dec. 1989.
[27] O. Wolfson and A. Milo, "The Multicast Policy and Its Relationship to Replicated Data Placement," ACM Trans. Database Systems, vol. 16, no. 1, 1991.
[28] S.B. Yao, "Approximating Block Accesses in Database Organizations," Comm. ACM, vol. 20, no. 4, pp. 260-261, Apr. 1977.
[29] S.B. Yao and D. DeJong,“Evaluation of database access paths,” Proc. of the ACM SIGMOD Conference,Austin, Tex., 1978, pp. 66-77.
[30] C.T. Yu,K-C. Guh,D. Brill,, and A.L.P. Chen,“Partition strategy for distributed query processing in fast local area networks,” IEEE Trans. on Software Engineering, vol. 15, no. 6, pp. 780-793, June 1989.

Index Terms:
Distributed database design, data partitioning and replication, data allocation, operation allocation, performance modeling and analysis, genetic algorithm.
Citation:
Salvatore T. March, Sangkyu Rho, "Allocating Data and Operations to Nodes in Distributed Database Design," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 2, pp. 305-317, April 1995, doi:10.1109/69.382299
Usage of this product signifies your acceptance of the Terms of Use.