Genome structure

Code center > Suggestions

Genome structure

(1/5) > >>

shvarz:
Yes, again. I know it most likely won't go anywhere, but I'll put it out for everyone to hear anyway. Maybe at least some ideas would find their way into the program

Right now we have fully deterministic genome structure: conditions are used to turn genes on and off. There also was a suggestion to go for probabilistic approach, where genes are executed at random. Both have advantages that have been discussed previously (do a search yourself).

I have a compromise suggestion, that would keep the possibility of designer-bots being tightly regulated, but at the same time allow for a genome structure that is more "evolvable". Everyone knows that on practice it is impossible to evolve a bot with a gene that is using a condition. Genes are either always on or always off with no middle ground. Gene control is such a tricky thing that it gets broken down very quickly and never evolves from scratch.

So, in my opinion the structure of "condition" needs some simplification and flexibility. Here is the idea:
We have a single value that defines the probability of the next encountered gene to be executed. That value is never reset to zero but is modified by the DNA.
1. Allow commands that increase or decrease the probability. (say "bump" and "drop"). As DNA is scanned each encountered command changes the probability by 10% up or down.
2. Current strong conditions, such as <, >, =, =! etc "bump" the probability if true or "drop" it if false.
3. Mild conditions such as ~= can do the same but to a lesser extent, say by 5%.

Such conditions are much more likely to appear during evolution and are easier to handle for bots. One of the big advantages of such genome structure is a wider diversity of mutations that it can tolerate. The fitness of the bot is not going to change drastically, but through a gradient and gradients are very good for evolution, because evolution is good at optimizing, but is bad at coming up with new structures.

Jez:
Forgive me if I haven't understood this correctly, I am a bot designer not evolver after all.

Are you saying that instead of having a yes/no (100%)/(0%) probability of a gene or command being activated that it could be a maybe/maybe not (100%down)/(0%up) type of thing?

I'm trying to think of this in terms of how I would design a bot to use it but shan't comment further for now as I'm not sure I've got the right idea yet.

Numsgil:
I think the primary issue with conditions is that they are a multistep process that has to follow such a rigid structure. I don't think the issue is with the deterministic nature. If it was, I would expect to see genes evolve that use the rnd command in the condition.

That we don't see any useful conditions compared with the frequency of useful genes indicates to me a structural flaw, not an operational one. DNA likes to use one gene techniques.

Another issue I think is the way DNA flows. Imagine the growth of metabolic processes, for a moment. Supposedly, they start at the end product and grow various preprocesses. Evolution would then seem to be good at working backwards from the way regular humans think. Humans start at A and move to Z. Evolution seems to start at Z and work backwards to A.

An idea I've been toying with involves using codules as "nodes" in a large graph. These nodes would have fixed numbers of input and output. DNA inside these codules could mutate, but there'd also be mutations that change the way the codules interconnect to each other. For instance, a mutation could change codule A to shoot a -1 shot instead of a -6 shot, or it could change codule A to give control to codule C instead of codule B.

The program flow from codule to codule would be an implicit conditional forking. I would also add the ability for a "starting" codule to add a prerequisite codule that becomes the "starting" codule.

Anyway, what this really relates to is software architecture. The if A then B structure humans like, evolution doesn't. We have to figure out the architctural system evolution likes most, and try to find a middle ground with human authors.

My vote is for either a pipeline, where a gene modifies the contents of a central data object and passes it to other genes in a kind of assembly worker format, or a data centered design, where you basically have several independant processes that update to the same data object (imagine ten monkeys who are specially trained to grab every X they find from a giant trash heap, make it into a Y, and put it back into the trash heap. Each monkey's X is different, so they basically form a process).

I think the data centered design is most similar to the way in which real organisms work, but I'm not sure how it would work in code. I've been playing with these sorts of ideas for a while, but I've never put them together into any sort of presentable coherant idea.

But I think this is the course to explore. Connecting DNA modules together is a better way of providing conditions.

EricL:
I agree we may have a structural problem and may need changes to better facilitate evolving stable conditional gene logic. In particular, I like the suggestion of working towards some sort of increased atomicity of conditional blocks so that they are both easier to evolve and less fragile in the face of mutations. Connected codules may be the way longer term - I like the connected graph thinking - but perhaps something less drastic can be done short term. Perhaps if we added some new flavors of start comnands for example, an atomic conditional 'start' for example which operated off the top of the boolean stack. So instead of

cond
x y <
start
blah blah

we could have

x y <
bunch of nocoding junk
condstart
blah blah

We could go further in the direction of atomicity and define other start commands which included the operator and operated off the integer stack: startifequal, startiflessthan, startifgreaterthan, etc. This would allow a single base pair mutation to make a gene conditional instead of requiring the evolution of a cond - start base pair sequence.

The probability of evolving

*.eye5
bunch of noncoding stuff
some positive number
bunch of noncoding stuff
startifgreaterthan
blah blah

is a lot higher than that of evolving

cond
bunch of noncoding stuff
*.eye5
bunch of noncoding stuff
some postive number
bunch of noncoding stuff
>
bunch of noncoding stuff
start
blah blah

Additionally, I would like to see some analysis/investigation done on the current system w.r.t. mutation probabilities and relative frequencies of various flow control commands. An inspection of the 420bp long DNA of a randomly selected bot in my zerobot sim shows 41 cond statements, 36 start statements, 15 else statements but only a single stop statement. It is possible that much of the current problem has more to do with bugs or mis-balanced mutation probabilties then with fundemental structural issues in the DNA design.

Numsgil:
I like the idea of a condstart type command. I'm not so much a fan of having 8 or 9 differnt startiflessthan, etc. Some of the issue I think is that the condition blocks don't mirror the same reverse polish notation that the rest of the DNA uses.

Ultimately I don't like the artificial segregation between conditions and start blocks. The only real difference between the two now is that conditional and logical operators only work in a cond block, while storage operators only work in a start block.

If we could find a way to cleanly integrate logic and comparison operators into the body of the DNA, we could entirely do away with the flow operators all together.

For instance, if we run the condition stack and the integer stack at the same time that might work. We could have store only work if the top value of the condition stack is true. That would let you imbed comparison operators right in the DNA.

For instance,

cond
*.nrg 4000 >
start
50 .repro store
stop

becomes:
*.nrg 4000 > 50 .repro store

or even:

50 .repro *.nrg 4000 > store

I would add true and false operators that can push true and false onto the conditions stack, the same way numbers can push flat values onto the integral stack.

This entirely eliminates the idea of structural genes, for good or ill. This also makes DNA interpretation by humans a little easier, since the only commands that takes a variable amount of numbers from the stack is store, inc and dec.

..........
You could probably add a "make random genome" function to test various probabilities with. It's hard to make blatant assessments from the genome of evolved bots, since evolution is by definition going to screw with the probabilites as it weeds out unsuccessful strategies.

I would point out, also, that your probabilities follow the numeric order of flow commands in the code. For instance, a cond is basically a tipo of some number with a value of 0. start = 1, else = 2, stop = 3. So what you're really seeing is a skewing of results from the initial 0ness of a zerobot. But it could be a program skew too.

Navigation

[0] Message Index

[#] Next page

Go to full version