Home

History Key

  • New content
  • Removed content

Recent Versions

Choose two versions to compare, or click the link to view it.

  1. 9. over 3 years by andy
  2. 8. over 3 years by andy
  3. 7. over 3 years by andy
  4. 6. over 3 years by andy
  5. 5. over 3 years by andy
  6. 4. over 3 years by andy
  7. 3. over 3 years by andy
  8. 2. over 3 years by andy
  9. 1. over 3 years by andy
 

This workspace contains Andy Singleton's Genetic Programming code, written in 1992 - 1993, along with a summary of his research from that period.

Genetic programming is an application of genetic algorithms (artificial evolution) to create computer programs. It was first tried by Nichael Kramer, popularized by John Koza and is the subject of a fair amount of research.

Doodle Garden

Doodle Garden is an entertaining program that allows you to write and select doodles, which are pictures generated by a logo-like drawing language.  You can then mutate and cross doodles to create fantastic variations.  It is fun for anyone, without requiring a lot of thought.  You actually see the effects of mutation and crossover on computer code.  If you want to look deeper, you can experiment with the primitives and genetic operations.

Get Doodle Garden for Windows here.

The repository contains the Doodle Garden source code, but compiling it requires Borland C++, which is no longer available.

I originally got this idea of human-selected images from Karl Sims, and there have been a lot of implementations of it since then.

GPQuick

GPQuick is a C++ genetic programming framework that features very fast evaluation.  This is important if you are running many iterations on 1993 vintage computers.  GPQuick is the framework that is used for Doodle Garden and GP GIM.  The GPQuick framework is explained in my Byte Article Genetic Programming with C++ which I have copied here.

Meta-GA

A system like GPQuick has a LOT of parameters that you can tune, including the mix of genetic operation (there are many variations of crossover and mutation), the algorithm used to select parents, the allowed sizes and shapes of the individual programs, and the mix of programming primities and inputs used.  At the time, there were a lot of academic papers that consisted of graduate students comparing one set of these parameters and operations to some other set.  This would be the work for a summer, or a "what I did with my summer vacation" paper. I built the meta-GA to cut through this clutter and finish the optimization job.

The Meta-GA would take a set of GP parameters as something to be optimized.  It would use that set to run a complete GP run (itself with thousands of iterations), and then score the best results from that GP run as the the fitness score of its parameters.  It would iterate, applying a genetic algorithm, to find the best set of parameters.  It would automate a summer's manual work into a day or so of intensive computation.

This optimization produced a big jump up in efficiency of my runs, but it also produced some interesting research results.  One of the first results from this system was to debunk the idea that crossover (recombination of parts from two or more individuals) was a required genetic operation.  That has been a part of GP dogma Koza's first writing on the subject 20 years ago, and many still follow it.  However, I found that the system would often settle on a mix of genetic operations that only included mutation, without any crossover.  Although the intuitive explanation for the efficiency of crossover makes sense (combining the best parts of two things seems like a fast way to make something better), I didn't see any evidence for it.  Since then, we have learned that there is one genus of cellular organism - the Bdelloid Rotifer - that happily evolves without sexual combination.

Distributed GP

GP is compute intensive, typically requiring thousands of generations iterating over a population size of hundreds or thousands.   Running a GA of several thousand generations on top of that takes, well, thousands of times as long.  In 1993, the compute power wasn't available.  So, I started running a network of computers with a "distributed island" model.  It turned out that I could get good results running a separate GP/GA on each machine, as long as I transmitted (migrated) some of the individuals to other islands occassionally.  This architecture was adopted by Koza in his next generation GP cluster, and now, with the Internet, it is possible to put together a very large amount of cmputing power to run this type of algorithm under things like screen savers.

Genetic Induction Machine

I wrote the Genetic Induction Machine, or GP GIM for short, to evolve financial trading rules.  Genetic programs are a nice format for trading rules because they can be evolved to tell you exactly what you want to know.   Given a set of indicators, the evolved function will output a number indicating buy, sell, or do-nothing.  You can evolve the function by showing it a lot of historical cases, stepping through and buying and selling when indicated, subtracting estimated trading costs, and using the profit or loss as a fitness score.  This will get you a function that does very well trading historical data.  However, it will usually be a disaster when trading in the future, with data it has never seen, because it overlearns.  It fits itself to the training data, not to the infrequent times when you have generalizeable information.

To get general learning, you need to restrict your machine so that it can't overlearn.  You need to adjust it's parameters so that it only pays attention to the inputs that have generalized information, and you need to simplify its structure so that it can't remember historical data.  Genetic programming is a nice place to apply this type of restriction, because it has so many parameters, including parameters describing the allowed complexity of the expression, and parameters indicating the functions and inputs it can use.  So, the GIM is a meta-GA that evolves the GP parameters.  It generates a set of GP parameters, runs an evolutionary run of several thousand generations to learn some trading rules with those parameters, and then tests the result with some "out of sample" time periods.  We score the parameters based on how effectively they created rules that can make profits in those unseen time periods.

This evolution on top of evolution algorithm takes a lot of computing power.  When I was running the GIM, I bought a large house in rural New Hampshire, originally built in 1831 (this was when houses were cheap during the 1990-91 recession), and I filled several rooms with plexiglass racks with 80486 motherboards mounted on them, and I ran the meta-ga in parallel with trades from one parameter set running on one computer node.

My clients were currency traders.  I found that currency markets are quite efficient, with very little profit on short term trading.  However, I also learned to identify markets that would be vulnerable to this type of analysis, and found a few indicators that at the time were quite effective.  Perhaps I will discuss the principles for knowing when you have tradable information in a different post.

Symbiosis - the next frontier

I became concerned because no matter how well optimized my GP algorithm was, it never went beyond a certain threshold of behavioral complexity.  It never made the jump from virus, to cells, to multi-cellular, to thinking creature.  It never even started in that direction.

We STILL have this problem.  I never published my research on this subject, but I think it is important now.  During the last 15 years, progress in genetic programming has been very slow.  We are not seeing complex evolved organisms.  We are only seeing marginally improved versions of the same simple behaviors.  They can produce "better than human" optimizations, as Koza notes, but only in a very small set of domains.  Computing power has increased by thousands of times, so that is clearly not the obstacle.  The obstacle is the simplicity of our genetic representations, our simplistic model of evolution, and the lack of machinery for running systems that use symbiosis and all of the other apparently crazy things we see in nature.

I noticed two things that I though were important.  First,I noticed that the genetic representation itself - the genome, not the phenome - was extremely important.  The GP algorithm would go nowhere until it rearranged the genome into redundant parts that could survive genetic operations like crossover, and it would rearragne itself within a few generations to accomplish that goal.  The shape of the genome was different depending on the genetic operations we were throwing at it, but it was consistent for a given mix of operations.  The previously mentioned example of the Rotifer is relevant here.  Rotifers can apparently evolve asexually because they are are able to eliminate parasites at the genetic level without sexual combination.  Their trick is purely at the genome level.  I think there are a lot of other things going on in a DNA genome to enhance its stability and evolvability. What about all that "junk" DNA?  It's not junk.junk.  Until we figure out the right genetic representation, we will be condemned to simplistic artificial organisms.

Second, I noticed that the normal discussion of how evolution works in a genetic algorithm was extremely simplistic.  Individuals compete, and the fittest are reproduced.  This is the cartoon of evolution that we learn in fourth grade.  It's not what actually happens.  In actual evolution, intra-species competition is only one of many factors.  Actually, the typical phenotype in a species population tends to be surprisingly stable and rarely changes under its own power.  There are also powerful forces from predation and parasitism - co-operative coevolution of the system.  And, there is symbiosis, the combination, co-operation, and co-evolution of genetically distinct organisms into one functional organism.  This may be the most powerful innovation of all, because it directly gets around the complexity limitations of any one genome.  It also may be much more fundamental than we think, if you consider things like distinct chromosomes as co-evolving.

So, I started designed a genetic structure that supported combination and co-evolution of genetically distinct individuals.  There is more work to be done there.

 

Doodle Garden Screenshot