You are on page 1of 6

International Journal of Advanced Computer Science, Vol. 3, No. 9, Pp. 475-480, Sep., 2013.

Genetic Algorithms and Genetic Programming


Natalia Grafeeva, Lyudmila Grigorieva, & Natalia Kalinina-Shuvalova
Manuscript
Received: 25, Jul., 2013 Revised: 9, Aug., 2013 Accepted: 10,Aug., 2013 Published: 15,Aug., 2013

Keywords
genetic algorithms, genetic programming,

Abstract This article discusses genetic algorithms and their application to three specific examples. The basic principles upon which the genetic algorithms are based are discussed. An example of the use of a genetic algorithm for finding the roots of a Diophantine equation is presented. A genetic program is next used to approximate additional values in a tabulated function. The third case we consider is the development of stock exchange trading systems.

Diophantine equation
1

1. Introduction
Currently programming technology is developing incredibly quickly. Also the last decade has seen a rapid growth of computer power. These circumstances make it possible to search for new ways of solving problems. We have reached a stage where mathematicians can do now as engineering has done earlier and derive inspiration from nature. For our purposes here the example from nature we choose to follow is genetics. Once we have completed our system definitions we can set about performing evolutionary computations. It is possible that such genetic algorithms and genetic programming will lift programming to the next stage. The reported origins of genetic algorithms go back to the 1950s [1]. In the early sixties Holland published his genetic algorithms and classification systems [2]. General recognition of this new field occurred after his book "Adaptation in Natural and Artificial Systems" was published [3]. This book has become a classic in the field. Currently this area is undergoing rapid development due to the availability of high capacity computing. Let us recall some of the basic ideas of the theory of natural selection provided by Charles Darwin in 1859 in his work "On the Origin of Species by Means of Natural Selection" [4]. The population of any species is composed of different individuals. Individuals of this species compete for resources. More new individuals are born than the resources can support (i.e. there is competition among individuals for resources).
Natalia Grafeeva, Lyudmila Grigorieva & Natalia Kalinina-Shuvalova are with Mathematics and Mechanics Faculty, St. Petersburg State University, RUSSIA. (Email: 1dc@spb.edu)

Individuals who have features that help win the fight for resources more successfully survive and have offspring that could inherit those features. Any feature of an individual can either be inherited from a parent, or be due to a mutation that occurred for some external reason. It seems quite attractive to move the idea of self-development in nature to the area of computer technology. We would like to introduce such an approach in our problem solving, however, there are two difficult points in terms of its implementation in software algorithms. Nature doesn't have any specific target for the alteration of species (other than the requirement of individual survival). In our programming we need to define or establish some specific preferences. Natural process development proceeds from many factors, all are random, and often chaotic. The output of an algorithm has a precise direction, it is completely determined. A classical genetic algorithm must build some "individual" (number, vector, program, etc.). We can compare individuals with each other and select those more suitable for solving the original problem. The special fitness function idea exists for achieving this goal. A fitness value can be calculated for each individual. The existence of the fitness function is a requirement for the application of a genetic approach for solving some problems. Defining and constructing the fitness function is one of the stages in the implementation of genetic algorithms. Our algorithm must have the ability of determining the quality of a constructed solution. We should also have rules (genetic operators) for defining the next generation on the basis of parent individuals. The difficulty is not that we have simply to mate the original individuals, but to do it in such a way that the offspring have a greater suitability for our purposes than their parents had. A genetic algorithm has a cycle composed of the following stages: generation of an initial population; selection of individuals to produce offspring; the use of genetic cross and mutation operators; formation of a new generation. Strictly speaking, the first step is outside of the scope of the cycle. Only the last three steps are repeated. The first generation is typically a set of randomly generated individuals (a sufficiently large number - the size of the initial population). During the selection stage, each individual fitness function is evaluated, and those values are used for choosing the parents for the next generation. After calculating the value of this fitness function for all individuals in the current generation a variety of methods

476

International Journal of Advanced Computer Science, Vol. 3, No. 9, Pp. 475-480, Sep., 2013.

can be used for selecting potential parents [5]. The most popular is to simply rank them by fitness function value. The individual fitness function values are sorted (either ascending or descending), and are indexed in this order (given a position or count number). The probability of selecting an individual to be a parent is proportional to its index value (if sorted in ascending order). A second approach is called the tournament method. The choice of the parent individuals is the best of the M randomly selected individuals. Typically, M = 2. The third method truncates the fitness function list. The individuals function values are first sorted. The set of the N best individuals is selected. Then randomly select the required number of next generation parent individuals from this truncated list. There is also a strategy that chooses a few individuals with good fitness function values to move into the next generation unchanged. In the third step the genetic cross and mutation operators are applied to the selected individuals. The last step of the cycle is the formation of a new generation. Typically two possible options are used. In the first, only children will be included in the new generation. In the second, the new generation will include the best individuals of the current generation and the resulting children. The algorithm continues until the specified number of generations is completed, or a locally optimal solution is found (an individual with the required value of the fitness function). In genetic programming an individual is itself a program. This program is the child produced by crossbreeding programs or a mutation of the program. For efficient operation of a genetic algorithm it is necessary to provide three things: the definition of all individuals characteristics; a suitable fitness function; the genetic operations. In genetic programming, we need a detailed description of any program representing an individual and this program must allow us to apply genetic operators. We need to have a full-fledged program produced after the genetic operators have been applied. It is reasonably obvious that we should use a tree representation to organize the programs. Programs composing the new generation are derived from fragments of programs in ancestors. A mutation corresponds to the replacement of a fragment of a program. Ultimately we need software that effectively solves a certain class of problems. The fitness function should be able to determine how effectively any of the programs being generated solves the problem. We are interested in a cross of programs and/or program mutations, after which the resulting offspring will have a good value of fitness as soon as possible. In creating a genetic program one should consider: which parts of the programs you consider "indivisible" (i.e. basic) and that you will retain when

performing genetic operations over the current generation of programs; the fitness function (an additional program or software system), which will test the effectiveness of solutions; the algorithm for applying the genetic operators(cross, mutations and so on) to fragments of programs that will be tried on each following generation with the intention of increasing the value of the fitness function; some condition defining the completion of the algorithm. It is important to consider the question of convergence of the genetic algorithms. In any case convergence means that as a result of repeated application of the genetic operators the fitness function value for the individuals will improve. Holland [3] presents the schema theorem, which provides the theoretical basis of the classical genetic algorithm. We next describe some of the specific ideas and solution approaches to this class of problems. Standard arithmetic operations, mathematical and logical functions, and specific functions known to be appropriate for the subject area can be selected as basic objects. The software components would likely include standard data types: Boolean, integer, floating point, vector, character, or multi-valued. How can we recognize the fitness of the generated program? There are many different ways to determining the fitness, which depend on the program specificity. The most widespread of them is linear approach. In this case we have a solution of the problem for some given set of source data. The fitness function calculates the difference between the known solution and what is obtained for the current generation program for each set of source data. Thereafter, the average of all received differences is treated as a criterion. Criteria that can be taken into account include the size of the resulting program and/or its speed (we might encourage the creation of more compact programs and/or short execution time). We now apply the above approach to solve the following 3 problems.

2. Application of a genetic algorithm to find the roots of a Diophantine equation


We want to find the roots in positive integers of the following Diophantine equation: 2x + y + 3z + 7u = 54 First we need to check that this equation has a solution. In order to do this, make sure that the right side of the equation is divisible by the greatest common divisor of the coefficients of the left-hand side of equation. We find that the greatest common divisor (GCD) of the coefficients: 2, 1, 3 and 7 is: GCD (2,1,3,7) = 1

International Journal Publishers Group (IJPG)

Grafeeva et al.: Genetic Algorithms and Genetic Programming.

477

TABLE 1 THE 1-ST GENERATION

The ID of an individual in the1-st generation 1 2 3 4 5 6 7 8 9 10

Individual (x,y,z,u) (3,6,7,4) (7,2,6,3) (11,8,12,6) (5,30,9,5) (17,38,3,4) (21,9,11,3) (9,33,12,3) (15,24,1,6) (19,31,2,3) (8,15,7,5)

The values of the fitness function abs(2 * 3 + 6 + 3 * 7 + 7 * 4 - 54) = 7 abs(2 * 7 + 2 + 3 * 6 + 7 * 3 - 54) = 1 abs(2 * 11 + 8 + 3 * 12 + 7 * 6 - 54) = 54 abs(2 * 5 + 30 + 3 * 9 + 7 * 5 - 54) = 48 abs(2 * 17 + 38 + 3 * 3 + 7 * 4 - 54) = 55 abs(2 * 21 + 9 + 3 * 11 + 7 * 3 - 54) = 51 abs(2 * 9 + 33 + 3 * 12 + 7 * 3 - 54) = 54 abs(2 * 15 + 24 + 3 * 1 + 7 * 6 - 54) = 45 abs(2 * 19 + 31 + 3 * 2 + 7 * 3 - 54) = 42 abs(2 * 8 + 15 + 3 * 7 + 7 * 5 - 54) = 33

Probability (1 / 7) / 1.31 = .11 (1 / 1) / 1.31 = .76 (1 / 54) / 1.31 = .01 (1 / 48) / 1.31 = .02 (1 / 55) / 1.31 = .01 (1 / 51) / 1.31 = .01 (1 / 54) / 1.31 = .01 (1 / 45) / 1.31 = .02 (1 / 42) / 1.31 = .02 (1 / 33) / 1.31 = .02

This is sufficient to show that a solution exists and we can begin our search. We know that x>= 1, y >= 1, z >= 1, u >= 1 and 2x + y + 3z + 7u = 54. Therefore the roots of the equation to be found must be in the following ranges: 1 <= x<= 22 1 <= y<= 42 1 <= z <= 15 1 <= u <= 7 In this case, as our genetic algorithm individual we consider the vector (x, y, z, u). The fitness function we define with the following expression: Fi= abs (2xi +yi+3zi+7ui -54) where xi ,yi, zi, ui are the values of the four elements for the i-th individual. Obviously, for the optimum solution of the equation, the value of the fitness function is 0. For each individual, after calculating the fitness function, the probability function defined below is calculated. This probability is used in selecting the next generation (preference will be given to individuals with the better values of the fitness function): P1 = (1F1)/( 1F1) Lets randomly select 10 individual vectors for the first generation, calculate the value of the fitness function for each individual and calculate the probability of using that individual in the next generation (as shown in Table 1). Note that at this stage we have an initial generation with an average fitness function value of 39 for these 10 random vectors. Further, in accordance with the calculated probabilities choose the next 10 pairs of parents for the next generation. Its obvious that for individuals with ID 1 and ID 2 the probability of becoming a parent of the next generation is much greater than for the other individuals since their Probability in the table is much greater than the others. For example: Further, after using cross and mutation operations, we

create the next generation. Lets describe an example of a cross operation. We have the two parents with the following example values (a1,a2,a3,a4) and (b1,b2,b3,b4). We want to recombine the values and we want to do it by having each of the 4 values be viewed as two different groups. Looking at the a values we would have 3 possible groups as follows (a1) and (a2,a3,a4), or (a1,a2) and (a3,a4), and finally (a1,a2,a3) and (a4). The same groups exist for the b values. TABLE 2 10 PAIRS OF PARENTS
The ID of the first parent 2 4 1 2 1 2 9 8 2 2 The ID of the second parent 1 2 2 8 2 3 2 2 9 10

The partitioning of the groups can be seen to be accomplished by looking at the position of the commas in the vector list. Randomly well choose the position.

International Journal Publishers Group (IJPG)

478

International Journal of Advanced Computer Science, Vol. 3, No. 9, Pp. 475-480, Sep., 2013.

For example, it can be 2 for values ranging from 1 to 3 in our current example. In this way our parents are grouped as (a1,a2,|| a3,a4) and (b1,b2,|| b3,b4). The idea of the cross operation is to take a part from one parent and a part from another. The result would be (a1,a2,b3,b4) or (b1,b2,a3,a4) The mutation operation is a random change of one of the components of an individual to any other value from the

range of allowable values. In Table 3 mutated elements are identified by bold font with underlining. The cross and mutation operations create the 2nd generation of individuals. We calculate the fitness function value for each individual of the 2nd generation and the probability of using an individual in the next generation (as shown in Table 4).

International Journal Publishers Group (IJPG)

Grafeeva et al.: Genetic Algorithms and Genetic Programming.

479

Note that at this stage we have a generation with an average fitness function equal to 16.6. We continue to form successive generations until we get an individual with a fitness function equal 0 (that is, we find a solution of the equation).Using this approach we cant guarantee finding all solutions, but well find at least one of them. Our conviction is based on the schema theorem [3], that was mentioned above. How many generations do we need to create before reaching a solution? Possibly very many. For this reason the use of genetic methods became practical only after the rapid growth of computer power.

{X, A, B} In the third step we define the fitness function and the formula for calculating the probability that helps decide the further use of an individual. In the case of a tabulated function the table itself can be used as a standard for comparison with the results of the program calculations (expression value) obtained in the next generation. For the fourth step we create an initial population with a fixed size. This population is a set of trees, based on the data given in steps 1 and 2. For example, the initial population might look like as shown in Table 5.
TABLE 5 AN INITIAL POPULATION

3. The use of genetic programming to approximate a table function


Consider how you can build a program (find an arithmetic expression) for approximating a function given as a table of values. Given the specificity of generated programs (each one a defined expression), it is convenient to represent the term and operations as a tree, where the variables and constants are leaf nodes, and the operations and functions are internal nodes of the tree. For example:

The ID of an individual 1 2 3 4

Individual (+(+(XX)B)A) (*(+(X(+(XB))X)A) (+(*(XA)X)(+(XX)) (*(*(XX)(-(XB)))(+(XA)))

The corresponding tree can be represented in the form of a left-sided linear-bracket expression: (*(+(BC))A) The first step is defining the functions and operations that can be used to build the program. For example: {+, -, * , /} The second step is defining terms (variables and constants), which will later be presented as the leaves of the tree. For example:

In the fifth step evaluate the programs (all the expressions) and compare these values with the tabulated function values and calculation of the probabilities of individuals that can be used to produce the next generation. At step six apply the genetic operations (cross and mutation), which in this case have some peculiarities. A cross can be represented as a transposing of the subtrees for the parent individuals (for the case of a linear bracket record, exchange the bracket structures and individual terminal symbols). For a mutation substitute a terminal symbol for any subtree. The next generation could be formed as shown in Table 6. With the completion of the next generation return to the step 5. The algorithm ends when the required accuracy is achieved or with the completion of a given number of generations.

4. Application of Genetic Programming to create a trading system


The ultimate dream for every developer of stock

TABLE 6 USING CROSS AND MUTATION OPERATIONS

The first parent (*(*(XX)(-(XB)))(+(XA))) (+(*(XA)X)(+(XX)) (*(*(XX)(-(XB)))(+(XA))) (*(+(X(+(XB))X)A)

The second parent (+(*(XA)X)(+(XX)) (*(+(X(+(XB))X)A) (+(+(XX)B)A) (+(+(XX)B)A)

Child after cross (*(*(XX)(-(XB)))(+(XX))) (+(*(XA)X)A) (+(+(XX)(-(XB)))A) (*(+(X(+(XX))X)A)

Child after cross and mutation (*(*(XX)(-(XB)))(+(XX))) (+(*(XA)X)A) (+(+ B(-(XB)))A) (*(+(X(+(XX))X)A)

International Journal Publishers Group (IJPG)

480

International Journal of Advanced Computer Science, Vol. 3, No. 9, Pp. 475-480, Sep., 2013.

exchange programmed trading systems is having the technology automatically execute trades based on the behavior of the exchange market. Of course, to fulfill such a dream well into the future no approach or technology has yet gone far enough, but the techniques of genetic programming can be of some help in the near term. Imagine that you have a few trading indicators that have a small but real probability of predicting the behavior of the stock market in the very near future. On the basis of these indicators, predictor combinations can be written which can be used as BUY or SELL signals. It remains only to find these combinations. Genetic programming is an appropriate technology for finding these combinations. A trading system in the simplest case might look like this: case when<no open positions > and <BUY signal > Then <open BUY position >; when<no open positions > and <SELL signal > Then <open SELL position>; when<there is BUY position > and <SELL signal > Then <close BUY position>; when<there is SELL position > and <BUY signal > Then <close SELL position >; end case; In the above program, there are two predicates - <BUY signal> and <SELL signal>. You can apply genetic programming to find the most suitable combination of indicators, i.e. to build good prediction logical expressions based on a predetermined set of known indicators. What is the fitness function in this case? The answer is quite obvious. As we create each generation we have actual programs, that can be run on historical trade data and when the programs reproduce the historical trade result then these trading actions demonstrate the fitness of a particular individual.

[3] Holland J. H. Adaptation in natural and artificial systems. //University of Michigan Press, Ann Arbor, 1975 [4] Darwin C.R., On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life.//John Murray, London, 1859. [5] Blickle T., Thiele L. A Comparison of Selection Schemes used in Genetic Algorithm. // 1995, 2 Edition [6] Egorov K. V., Tsarev, F. N., A. Shalito A. A. Application of genetic programming to build automatic control systems with complex behavior on the basis of training samples and specifications. //Scientific-technical Bulletin of Saint-Petersburg State University of Information Technologies, Mechanics and Optics, 5 (69), 2010. [7] George F. Luger Artificial Intelligence: Structures and Strategies for Complex Problem Solving. //Addison Wesley, 2004. [8] Grafeeva N., Grigorieva L., Khristoforov V. Overview of classes of problems, effectively solved by the genetic programming. //European Applied Sciences, 4, 2013. [9] Koza J. R. Genetic programming. On the Programming of Computers by Means of Natural Selection. //The MIT Press, MA, 1998. [10] Mitchell M. An Introduction to Genetic Algorithms. //The MIT Press, MA, 1996. Tsarev F. N. The method of constructing of governors finite automata-based test case using genetic programming. //Information and control systems, 5, 2010.

5. Conclusion
The application of genetic algorithms to new fields has been steadily expanding [6,7,8,11]. The authors hope that readers interested in the subject can easily find additional information and their own examples of the application of genetic algorithms. However, we would like to point out a number of special problems that have received undeservedly little notice. In particular, the list includes some classical problems in number theory. The authors hope to look into these problems in their future work.

References
[1] Fogel L.J., Owens A.J., Walsh M.J., Artificial Intelligence through Simulated Evolution. //John Wiley, NY, 1966. [2] Holland J. H. Concerning efficient adaptive systems//Self-Organizing Systems.-Washington, D.C.:Spartan Books, pp.215-230, 1962

International Journal Publishers Group (IJPG)

You might also like