This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Using Evolutionary Algorithms for Defining the Sampling Policy of Complex N-Partite Networks
June 2005 (vol. 17 no. 6)
pp. 762-773
N-partite networks are natural representations of complex multi-entity databases. However, processing these networks can be a highly memory and computation-intensive task, especially when positive correlation exists between the degrees of vertices from different partitions. In order to improve the scalability of this process, this paper proposes two algorithms that make use of sampling for obtaining less expensive approximate results. The first algorithm is optimal for obtaining homogeneous discovery rates with a low memory requirement, but can be very slow in cases where the combined branching factor of these networks is too large. A second algorithm that incorporates concepts from evolutionary computation aims toward dealing with this slow convergence in the case when it is more interesting to increase approximation convergence speed of elements with high feature values. This algorithm makes use of the positive correlation between "local” branching factors and the feature values. Two applications examples are demonstrated in searching for most influential authors in collections of journal articles and in analyzing most active earthquake regions from a collection of earthquake events.

[1] M.L. Goldstein, S.A. Morris, and G.G. Yen, “Bridging the Gap between Data Acquisition and Inference Ontologies— Towards Ontology Based Link Discovery,” Proc. AeroSense— Sensors, and Command, Control, Comm., and Intelligence (C3I) Technologies for Homeland Defense and Law Enforcement II (OR25), pp. 116-127, 2003.
[2] M.L. Goldstein and G.G. Yen, “An Evolutionary Algorithm Method for Sampling n-Partite Graphs,” Proc. Congress Evolutionary Computation, pp. 2250-2257, 2004.
[3] J. Gonzalez, I. Jonyer, L.B. Holder, and D.J. Cook, “Efficient Mining of Graph-Based Data,” Proc. AAAI Workshop Learning Statistical Models from Relational Data, pp. 21-28, 2000.
[4] D.J. Cook and L.B. Holder, “Scalable Discovery of Informative Structural Concepts Using Domain Knowledge,” IEEE Expert, vol. 11, pp. 59-68, 1996.
[5] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. Very Large Databases Conf., pp. 134-145, 1996.
[6] D. Madigan and M. Nason, “Data Reduction: Sampling,” Handbook of Data Mining and Knowledge Discovery, pp. 205-208, New York: Oxford Univ. Press, 2002.
[7] J.S. Park, P.S. Yu, and M.-S. Chen, “Mining Association Rules with Adjustable Accuracy,” Proc. Sixth Int'l Conf. Information and Knowledge Management, pp. 151-160, 1997.
[8] O. Frank, “Sampling and Estimation in Large Social Networks,” Social Networks, vol. 1, pp. 91-101, 1978.
[9] E. Costenbader and T.W. Valentine, “The Stability of Centrality Measures When Networks are Sampled,” Social Networks, vol. 25, pp. 283-307, 2003.
[10] S. Djoko, D.J. Cook, and L.B. Holder, “An Empirical Study of Domain Knowledge and Its Benefits to Substructure Discovery,” IEEE Trans. Knowledge and Data Eng., vol. 9, pp. 575-586, 1997.
[11] R.H. Güting, “GraphDB: Modeling and Querying Graphs in Databases,” Proc. 20th Int'l Conf. Very Large Databases, pp. 297-308, 1994.
[12] S. Nestorov, J.D. Ullman, J.L. Wiener, and S.S. Chawathe, “Representative Objects: Concise Representation of Semistructured, Hierarchical Data,” Proc. Int'l Conf. Data Eng., pp. 79-90, 1997.
[13] T. Milo and D. Suciu, “Index Structures for Path Expressions,” Proc. Int'l Conf. Database Theory, pp. 277-295, 1999.
[14] D. Shasha, J.T.L. Wang, and R. Giugno, “Algorithmics and Applications of Tree and Graph Searching,” Proc. Symp. Principles of Database Systems, pp. 39-52, 2002.
[15] H.D. White and K.W. Mccain, “Bibliometrics,” Ann. Rev. Information Science and Technology, vol. 24, pp. 119-186, 1989.
[16] R.C. Gonzalez and M.G. Thomason, Syntactic Pattern Recongnition. Reading, Mass.: Addison-Wesley Publishing, 1978.
[17] R. Albert and A.-L. Barabási, “Statistical Mechanics of Complex Networks,” Rev. Modern Physics, vol. 74, pp. 47-97, 2002.
[18] M.E.J. Newman, “The Structure and Function of Complex Networks,” SIAM Rev., vol. 45, pp. 157-256, 2003.
[19] S.N. Dorogovtsev and J.F.F. Mendes, “Evolution of Networks,” Advances in Physics, vol. 51, pp. 1079-1187, 2002.
[20] G. Caldarelli, “Introduction to Complex Networks,” Proc. Seventh Conf. Statistical and Computational Physics, pp. 17-23, 2002.
[21] M.L. Goldstein, S.A. Morris, and G.G. Yen, “Problems with Fitting to the Power-Law Distribution,” European Physical J. B, vol. 41, pp. 255-258, 2004.
[22] N.L. Johnson, S. Kotz, and A.W. Kemp, Univariate Discrete Distributions, second ed. New York: John Wiley & Sons, 1992.
[23] P. Erdös and A. Rényi, “On Random Graphs,” Publications Mathematicae, vol. 6, pp. 290-297, 1959.
[24] M.L. Goldstein, S.A. Morris, and G.G. Yen, “Group-Based Yule Model for Bipartite Author-Paper Networks,” Physical Rev. E, vol. 71, 2005, to appear.
[25] W.J. Conover, Practical Nonparametric Statistics, third ed. New York: Wiley, 1999.
[26] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, Mass.: Addison-Wesley, 1989.
[27] J.H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor, Mich.: The Univ. of Michigan Press, 1975.
[28] C. Dwork, R. Kumar, N. Naor, and D. Sivakumar, “Rank Aggregation Methods for the Web,” Proc. 10th Int'l World Wide Web Conf., pp. 613-622, 2001.
[29] F. Olken and D. Rotem, “Random Sampling from Databases: A Survey,” Statistics and Computing, vol. 51, pp. 25-42, 1995.

Index Terms:
N-partite network, evolutionary algorithm, graphic-structured database.
Citation:
Michel L. Goldstein, Gary G. Yen, "Using Evolutionary Algorithms for Defining the Sampling Policy of Complex N-Partite Networks," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 762-773, June 2005, doi:10.1109/TKDE.2005.100
Usage of this product signifies your acceptance of the Terms of Use.