Polymorphism

May 10, 2017

I want to expand on last post, because I think the last section touched on an interesting phenomena. It is summared by these two plots:

run3

Fig1. This evolution run had three phenotype evaluations.

run7

Fig2. This evolution run had seven phenotype evaluations. All else was equal.

What do I mean by “seven phenotype evaulations”? Each genotype (individual) is used to grow some number of phenotypes. Each phenotype is evaluated for fitness (in this case number-of-nodes and health). Then all the fitness scores are averaged, respectively.

What is the meaning of this multiple-phenotype expression? It’s as if multiple clones are being budded off of a single genome. It is making these ‘organisms’ behave as though they reproduce asexually and sexually. The sexual reproduction happens when child genomes are created during each round of the evolutionary algorithm. The asexuall reproduction is happening everytime the evaluation function operates, where it does multiple phenotype evaluations.

So what is the effect? One is that the pareto frontier can explode outwards, to higher average fitness. This is clearly seen in the last couple of generations in the second plot. Why are these individuals so much more fit? I think it is because they are exhibiting “polymorphism”. Poly means “many” and morphe means “form,” so in this context this is saying that a single genotype can exhibit many phenotypic forms. Since the fitness scores from these different forms are averaged, the genotype can more effectively find a niche on the pareto-frontier than a monomorphic genotype.

This is clearly seen in the follwing images. These depict one phenotype for each genome in the pareto-frontier of the final generation (the 20th). The average number of nodes and health is printed. Note this is not the same as the specific fitness for the phenotype you are looking at.

grid_run3

Fig3. Phenotype evaulations = 3

grid_run7

Fig4. Phenotype evaulations = 7

Notice how a bunch of boring little Q-tips are in the center of Fig4.? These correspond to the yellow dots that are way out in Fig2. If I generate a colony (phenotype) from these genomes multiple times, some times they make a bushy-shape, sometimes a q-tip. Polymorphism!

How this works I don’t know. I would need to dissect the processor trees (the genomes) of the polymorphic individuals. I would expect to see some probablistic switch that results in some precentage of phenotypes being q-tips, and some other percentage being bushy. That investigation is for another day.

I think these results are suggesting that organisms that reproduce both sexually and asexually are more likely to exhibit polymorphism. But there are a number of conditions imposed here that may be critical to that hypothesis: 1) selection operates similar to pareto-frontier selection, 2) consistent cycles of asexuall and sexuall reproduction occur.

Anecdotal evidence: The phylum Cnidaria is noted on wikipedia as being charachterized by polymorphism; many animals in this category have a polyp form and a medusa form. Cnidarians reproduce asexually as well as sexually. Things are more complicated because it seems that in some variants polyps do not sexually reproduce. Really this idea needs to be checked with biologists. I wouldn’t be surprised if this is old news to biologists, but its pretty exciting for me!

Population Needs to be Big for Pareto-Front to Move

May 04, 2017

While the last two posts were exciting because a multi-objective evolution ran and spat out some neat shapes, I had to check if the evolution was actually finding better populations.

import ColonyEvolver.evolve_colony_multi_obj as ev
reload(ev)
<module 'ColonyEvolver.evolve_colony_multi_obj' from '/Users/josh/Projects/ColonyEvolver_above/ColonyEvolver/evolve_colony_multi_obj.py'>
info, archive = ev.main()
gen	nevals	avg                          	std                          	min                        	max                          
0  	25    	[ 179.69142857  100.03209854]	[ 158.46217436  167.08081721]	[  2.         -19.99791905]	[ 439.57142857  355.        ]
1  	48    	[ 227.76571429   16.73458574]	[ 143.79577287  101.42524822]	[  2.         -16.86966543]	[ 447.          364.64285714]
2  	46    	[ 233.31428571   17.55206654]	[ 138.30070167  102.03801106]	[  2.         -16.86966543]	[ 447.          364.64285714]
3  	45    	[ 235.76571429   20.21989035]	[ 153.93481727  102.09670425]	[  2.         -16.06184076]	[ 447.          364.64285714]
4  	39    	[ 175.50285714   38.46083977]	[ 145.44557363  120.98575388]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
5  	46    	[ 173.79428571   37.89007022]	[ 137.98761897  121.15668099]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
6  	41    	[ 190.66285714   39.57204475]	[ 147.25656566  121.45121315]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
7  	46    	[ 148.56         61.71167985]	[ 152.06825492  134.6276333 ]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
8  	45    	[ 143.56571429   87.54939393]	[ 154.78536808  156.97171347]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
9  	47    	[ 173.69714286   21.5299556 ]	[ 157.46930933   77.42182592]	[  2.         -16.71785968]	[ 448.57142857  367.85714286]
10 	46    	[ 160.92571429   27.50332923]	[ 159.32660395   79.07821019]	[  2.         -16.71785968]	[ 448.57142857  367.85714286]
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
v = np.linspace(0, 1, len(archive))
colors = cm.viridis( v )
fig = plt.figure(figsize=(9,6))
for i,generation in enumerate(archive):
    g = np.array(generation)
    n = g[:,0]
    h = g[:,1]
    plt.scatter(n, h, c=colors[i])
plt.xlabel('number of nodes')
plt.ylabel('colony health')
plt.show(fig)

This plot was generated with Population=25 and Children=50. Code is at git tag list-save

png

In this plot darker points are individuals from earlier generations. This does not look so great. What I am hoping to see is each generation slightly offset from the previous one, in the direction of the upper right hand corner. It looks like there is a little movement in that direction, especially in the ~60 number-of-node region. But something is not right.

What is going on? Here are some ideas.

  1. There is not enough variation to select slightly better individuals.
  2. The random nature of simulation runs is confusing the results. This means that individuals that are slightly better are not being selected consistently because on some evaluations they get a low score. I am using 7 simulation runs per individual for this run. That number is rather arbitrary; it really depends on the variation between runs for a given individual. The scary thing is that this variation probably depends on the individual. Some individuals, as a result of their genome, are probably going to result in a wider range of phenotypes. Uh-oh!
  3. The variation is there but the wrong individuals are being selected.
  4. Everything is fine, just need to run a longer evolution.

1, 2, and 4 seem likely. If 2 is true its not so terrible; in the long run individuals that result in inconsistent fitness may die-out. 1 maybe means I should do a run with a larger population. Comparing to the knapsack example shows that I halved the population size (just to make it run faster). I think its time to try out a bigger run.

Ok did that. Here is the output:

Code is at git tag mult-obj-more-gen yay

Wow this looks alot better.

Learning from these Results

The Effect of Averages

Check out these two phenotypes from the same genotype (health rank = 21): Drawing Drawing These are simply two different runs of the same genome (processor tree deciding what to do when a node gets ‘fed’).

One is just the initital ‘seed’ that every phenotype starts with: two nodes vertically oriented. The other has an off-shoot and a little bundle of nodes. I think this is a happening becuase of the following. For each genotype 7 phenotypes are generated and evaluated for fitness. The 7 scores are averaged. This means that a genotype that produces a wide range of phenotypes might actaully do quite well. Perhaps some percentage of the time the depicted genotype makes no modification to the seed, racking up a high health score. Other times the genotype makes the bundle of nodes and gets a better ‘number-of-nodes’ score. Taking the average, this genotype results in something in-between, allowing it to make it to further rounds by occupying a unique niche on the pareto-frontier.

This averaging idea might explain why there are so many phenotypes that look the same (see image below). Perhaps the difference between these is that they have a different likelyhood of making a taller phenotype everynow and then.

grid Phenotypes sorted by health from the final pareto-frontier (yellow dots in the plot). Descending health from upper left to the right.

Finally here are some awesome phenotypes (note that the reported #nodes and health is for this one phenotype, not the average obtained in the fitness evaluation) r24 r26 r33

Shapes from the first multi-objective run

April 23, 2017

grid_img These shapes are colonies from the final pareto frontier of a multi-fitness crieteria evolutionary algorithm. See the previous post for explanation.

The images are ordered from top left accross according to colony health. They correspond to the dots in this plot: plot

So the lightest dot corresponds to lower right corner.

Health is the average node health, over all the nodes in a colony, at the end of the simulation run. Five health points are given to each node when it ‘eats’ (gets collided by a particle). One health point is subtracted for each time-step. Admittedly taking the average may be making the health score suscpetable to outliers. Next time consider trying sum of all health scores.

Anyway, it makes sense that the highest health genome is one that results in no growth added to the seed (the two-node stick). These two nodes get bombarded by particles during the entire simulation. On the other end of the spectrum are colonies that are giant and tangly. Not surprisingly these colonies have low average health; all but the topmost nodes are starved for nutrients.

Multi Objective First Run

April 21, 2017

Summary

multi-objective worked and spat out a range of solutions. The solutions are on a frontier that at first sight looks not-optimal, but may be so because of the problem formulation. Skip to Fun Pictures.

This code was run and shared using jupyter. Many thanks to this post explaining how to put a jupyter session into a jekyll blog.

The evolve_colony_multi_obj.py script that generated the data is in git tag multi-obj-0.

Jupyter Session

import evolve_colony_multi_obj as ev
reload(ev)
<module 'evolve_colony_multi_obj' from 'evolve_colony_multi_obj.py'>

The imported module is a script with a main() function that runs the evolution. The fitness evaluator outputs (size, health) for the evaluated phenotype (a colony of nodes).

info = ev.main()
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/transforms3d/quaternions.py:400: RuntimeWarning: invalid value encountered in divide
  vector = vector / math.sqrt(np.dot(vector, vector))


gen	nevals	avg                          	std                          	min                        	max                          
0  	25    	[ 179.69142857  100.03209854]	[ 158.46217436  167.08081721]	[  2.         -19.99791905]	[ 439.57142857  355.        ]
1  	48    	[ 227.76571429   16.73458574]	[ 143.79577287  101.42524822]	[  2.         -16.86966543]	[ 447.          364.64285714]
2  	46    	[ 233.31428571   17.55206654]	[ 138.30070167  102.03801106]	[  2.         -16.86966543]	[ 447.          364.64285714]
3  	45    	[ 235.76571429   20.21989035]	[ 153.93481727  102.09670425]	[  2.         -16.06184076]	[ 447.          364.64285714]
4  	39    	[ 175.50285714   38.46083977]	[ 145.44557363  120.98575388]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
5  	46    	[ 173.79428571   37.89007022]	[ 137.98761897  121.15668099]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
6  	41    	[ 190.66285714   39.57204475]	[ 147.25656566  121.45121315]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
7  	46    	[ 148.56         61.71167985]	[ 152.06825492  134.6276333 ]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
8  	45    	[ 143.56571429   87.54939393]	[ 154.78536808  156.97171347]	[  2.         -16.71785968]	[ 448.57142857  364.64285714]
9  	47    	[ 173.69714286   21.5299556 ]	[ 157.46930933   77.42182592]	[  2.         -16.71785968]	[ 448.57142857  367.85714286]
10 	46    	[ 160.92571429   27.50332923]	[ 159.32660395   79.07821019]	[  2.         -16.71785968]	[ 448.57142857  367.85714286]
n_nodes = []
health = []
for individual in info.final_pop:
    N,H = individual.fitness.values
    n_nodes.append(N)
    health.append(H)
import matplotlib.pyplot as plt
import matplotlib.cm as cm
# order by health
import numpy as np
ordered_idx = np.flip(np.argsort(health), 0)
health = np.array(health)[ordered_idx]
n_nodes = np.array(n_nodes)[ordered_idx]
v = np.linspace(0, 1, len(n_nodes))
color = cm.viridis( v )
fig = plt.figure(figsize=(9,6))
#fig = plt.figure()
plt.scatter(n_nodes, health, c=color)
plt.xlabel("Number of Nodes")
plt.ylabel("Colony Health")
plt.show(fig)

png

NOTE: colors are for verifying that node health ordering is correct

At first glance this was completely the reverse of what I was hoping to see. I expected to see a convex buldge pointing towards the upper right hand corner.

Hypothesis 1:

It looks as if the ea is trying to minimize the number of nodes and the colony health. This is not what the code specifies.

Hypothesis 2:

I did set up a system where high health is likely to be achived by a colony with a low number of nodes, and vice-versa. It could be that this curve, which looks like an inverse-relationship, is a result of the mechanics of the system. It might be inevitable. If that is the case I would expect to see the curve bump out generation by generation.

# Save an image for each genome in the final population
import mayavi.mlab as mlab
for i,idx in enumerate(ordered_idx):
    genome = info.final_pop[idx]
    p = ev.make_phenotype(genome)
    p.show_lines()
    mlab.savefig(str(i).zfill(2)+'_genome_'+str(idx)+'.png')
    mlab.close(all=True)
health_rank = 15
idx = ordered_idx[health_rank]
genome = info.final_pop[idx]
p = ev.make_phenotype(genome)
p.number_of_elements()
151
p.get_health()
-10.443708609271523
#print(genome)
p.show()

Below is a screen shot of the 3d view generated by show(): img It makes sense that this has low health! The nodes below are being starved for nutrients. But it has many nodes. After inspecting alot of the solutions, I think the evolution may be working right after all (looks like Hypothesis 2 is pulling ahead). Next steps are to make a compiliation of the images for all of the solutions in this final population. Edit just made this image. See the next post.

here is another one from health_rank = 9: img (number of nodes = 24, health = 19 )

Multi Objective Evolution

April 19, 2017

Today I spent some time understanding the knapsack example, provided by the deap documentation. This is a great example for me because it uses multi-objective optimization, which is critical for my current curiosity. More on that in a future post.

My goal was to make sure that my understanding of multi-objective optimization corresponds with what is actually possible with DEAP. Since I like visuals, I decided to make a plot of what the evolution is doing. Here it is: knapsack_img

Oooh pretty colors. This plot was made using matplotlib. The color map is matplotlib.cm.viridis.

Each color represents a generation in the evolution. Brighter colors are newer generations. Each dot is an individual solution to the knapsack problem. The knapsack problem is like the problem faced by a backpacker who is trying to decide what items to put in her ‘knapsack’. There are a bunch of items with some weight and value. The goal is to have a ‘knapsack’ that is below some critical weight, and has high value. So the optimization problem has multiple objectives: low-weight and high value.

The easy way to adapt this to a normal evolutionary algorithm is to perform a weighted sum of the two objectives. The problem is that by scaling the two metrics and suming, a bias is injected into the search. Only solutions that perform well for a particular level of relative importance between the objectives will be found. This is quite limiting when you are interested in seeing a wide range of good solutions. Other problems exist with the weighted sum approach: How does one scale objectives that have unkown bounds? How does one scale objectives that have radically different meanings or units?

Luckily there is an elegant way to completely avoid cobbling disimilar metrics together. The essential idea is the pareto frontier. Basically this is the set of all individuals that are the best in their own special way at some combination of objectives (this property is called ‘non-dominated’). All the evolution has to do is select individuals from the pareto frontier, and after many generations the frontier gets better and better. See the wikipedia on multi-objective optimization.

So if we plot the weight and value for each individual, and color them by generation, we should see the dots forming a sort of buldge that moves further from the previous colors. This is indeed what we see. I was a little confused at first that the bulge moved to the upper right. This is reconciled by the fact that the algorithm is selecting for low weight.

Also, wow matplotlib’s new default plotting looks pretty good. Go open source!!