Issue No. 03 - June (1995 vol. 10)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/64.393137
<p>The GA-P performs symbolic regression by combining the traditional genetic algorithm's function optimization strength with the genetic-programming paradigm to evolve complex mathematical expressions capable of handling numeric and symbolic data. This technique should provide new insights into poorly understood data relationships.</p> <p>Discovering relationships has been a task troubling researchers since the dawn of modern science. Discovering relationships between sets of data is laborious and error prone, and it is highly subject to researcher bias. Because many of today's research problems are more complex than those of the past, it is increasingly important that robust data analysis methods be available to researchers. For a data analysis method to be most useful, it must meet at least three criteria: good predictive ability, insight into the inner workings of the system being analyzed, and unbiased results.</p> <p>Historically, researchers deduced relationships solely by examining the data--a difficult task if the relationship is complex, if many variables are involved, or if the data are noisy (as often occurs in real-world problems). Moreover, the examination is easily influenced by the researcher's desires and expectations.</p> <p>Statistical methods were among the first tools developed to help a researcher find the relationships of observed facts. Statistical methods are often based on such assumptions as these: (1) the data are normally distributed, (2) the equation relating the data is of a specific form (for example, linear, quadratic, or polynomial), and (3) the variables are independent. If the problem meets these assumptions, statistics are a valuable tool for providing static descriptors. But real-world problems seldom meet these criteria.</p> <p>Neural networks, an artificial intelligence technique, are not limited by these assumptions. They serve as strong predictive models that can uncover complex relationships, but they give little insight into the underlying mechanisms that describe a relationship. However, two other nonstatistical AI techniques, genetic algorithms and genetic programming, are more robust methods of exploring complex solution spaces. Independently, they have had some success at revealing the mechanisms relating data items.</p> <p>Recently, genetic algorithms, which use the principles of evolution through natural selection to solve problems, have established themselves as a powerful search and optimization technique. Most GAs are linear (the structure of an individual is a flat bit string). The basic GA proceeds as follows:</p> <p><li>1. Create a population of random individuals, in which each individual represents a possible solution to the problem at hand.</li> <li>2. Evaluate each individual's fitness--its ability to solve the specified problem.</li> <li>3. Select individual population members to be parents.</li> <li>4. Produce children by recombining parent material via crossover and mutation, and add them to the population.</li> <li>5. Evaluate the children's fitness.</li> <li>6. Repeat steps 3-5 until a solution with the desired fitness goal is obtained.</li></p> <p>GAs have been used for everything from multiple-fault diagnosis to medical-image registration. They have shown themselves to be a superior tool for developing rule-based systems, capable of gleaning knowledge from data inaccessible to statistical methods. Goldberg thoroughly discusses genetic algorithms and their use as a problem-solving and function optimization technique. Goldberg and Forrest give additional examples.</p> <p>Although linear GAs are adept at developing rule-based systems, they cannot develop equations. A recent addition to the evolutionary domain is genetic programming, which uses an evolutionary approach to generate symbolic expressions and perform symbolic regressions. However, the genetic-programming method of performing symbolic regressions has some limitations. It can modify only the structure of an expression, not its contents, which is generated by the implementation program when the genetic programming starts. In performing symbolic regressions, genetic programming cannot deal with nonnumeric variables. It also tends to produce convoluted equations because it cannot modify the coefficients it uses (for example, a genetic program might use (2.523+2.523)/2.523 to represent the number 2).</p> <p>We have developed a method combining the known strengths of traditional genetic algorithms with the new field of genetic programming to produce a superior tool for performing symbolic regressions. We call this tool the genetic algorithm-program, or the GA-P.</p>
D. J. D'Angelo and L. M. Howard, "The GA-P: A Genetic Algorithm and Genetic Programming Hybrid," in IEEE Intelligent Systems, vol. 10, no. , pp. 11-15, 1995.