Author Topic: Genome structure  (Read 6979 times)

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Genome structure
« on: January 02, 2007, 04:29:05 PM »
Yes, again.  I know it most likely won't go anywhere, but I'll put it out for everyone to hear anyway.  Maybe at least some ideas would find their way into the program

Right now we have fully deterministic genome structure: conditions are used to turn genes on and off.  There also was a suggestion to go for probabilistic approach, where genes are executed at random.  Both have advantages that have been discussed previously (do a search yourself).

I have a compromise suggestion, that would keep the possibility of designer-bots being tightly regulated, but at the same time allow for a genome structure that is more "evolvable".  Everyone knows that on practice it is impossible to evolve a bot with a gene that is using a condition.  Genes are either always on or always off with no middle ground.  Gene control is such a tricky thing that it gets broken down very quickly and never evolves from scratch.

So, in my opinion the structure of "condition" needs some simplification and flexibility.  Here is the idea:
We have a single value that defines the probability of the next encountered gene to be executed.  That value is never reset to zero but is modified by the DNA.  
1. Allow commands that increase or decrease the probability. (say "bump" and "drop").  As DNA is scanned each encountered command changes the probability by 10% up or down.
2. Current strong conditions, such as <, >, =, =! etc "bump" the probability if true or "drop" it if false.
3. Mild conditions such as ~= can do the same but to a lesser extent, say by 5%.

Such conditions are much more likely to appear during evolution and are easier to handle for bots.  One of the big advantages of such genome structure is a wider diversity of mutations that it can tolerate.  The fitness of the bot is not going to change drastically, but through a gradient and gradients are very good for evolution, because evolution is good at optimizing, but is bad at coming up with new structures.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Jez

  • Bot Overlord
  • ****
  • Posts: 788
    • View Profile
Genome structure
« Reply #1 on: January 02, 2007, 07:27:37 PM »
Forgive me if I haven't understood this correctly, I am a bot designer not evolver after all.

Are you saying that instead of having a yes/no (100%)/(0%) probability of a gene or command being activated that it could be a maybe/maybe not (100%down)/(0%up) type of thing?

I'm trying to think of this in terms of how I would design a bot to use it but shan't comment further for now as I'm not sure I've got the right idea yet.
If you try and take a cat apart to see how it works, the first thing you have in your hands is a non-working cat.
Douglas Adams

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genome structure
« Reply #2 on: January 02, 2007, 08:00:13 PM »
I think the primary issue with conditions is that they are a multistep process that has to follow such a rigid structure.  I don't think the issue is with the deterministic nature.  If it was, I would expect to see genes evolve that use the rnd command in the condition.

That we don't see any useful conditions compared with the frequency of useful genes indicates to me a structural flaw, not an operational one.  DNA likes to use one gene techniques.

Another issue I think is the way DNA flows.  Imagine the growth of metabolic processes, for a moment.  Supposedly, they start at the end product and grow various preprocesses.  Evolution would then seem to be good at working backwards from the way regular humans think.  Humans start at A and move to Z.  Evolution seems to start at Z and work backwards to A.

An idea I've been toying with involves using codules as "nodes" in a large graph.  These nodes would have fixed numbers of input and output.  DNA inside these codules could mutate, but there'd also be mutations that change the way the codules interconnect to each other.  For instance, a mutation could change codule A to shoot a -1 shot instead of a -6 shot, or it could change codule A to give control to codule C instead of codule B.

The program flow from codule to codule would be an implicit conditional forking.  I would also add the ability for a "starting" codule to add a prerequisite codule that becomes the "starting" codule.

Anyway, what this really relates to is software architecture.  The if A then B structure humans like, evolution doesn't.  We have to figure out the architctural system evolution likes most, and try to find a middle ground with human authors.

My vote is for either a pipeline, where a gene modifies the contents of a central data object and passes it to other genes in a kind of assembly worker format, or a data centered design, where you basically have several independant processes that update to the same data object (imagine ten monkeys who are specially trained to grab every X they find from a giant trash heap, make it into a Y, and put it back into the trash heap.  Each monkey's X is different, so they basically form a process).

I think the data centered design is most similar to the way in which real organisms work, but I'm not sure how it would work in code.  I've been playing with these sorts of ideas for a while, but I've never put them together into any sort of presentable coherant idea.

But I think this is the course to explore.  Connecting DNA modules together is a better way of providing conditions.

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genome structure
« Reply #3 on: January 03, 2007, 12:21:50 AM »
I agree we may have a structural problem and may need changes to better facilitate evolving stable conditional gene logic.  In particular, I like the suggestion of working towards some sort of increased atomicity of conditional blocks so that they are both easier to evolve and less fragile in the face of mutations.  Connected codules may be the way longer term - I like the connected graph thinking - but perhaps something less drastic can be done short term.  Perhaps if we added some new flavors of start comnands for example, an atomic conditional 'start' for example which operated off the top of the boolean stack.  So instead of

cond
x y <
start
blah blah

we could have

x y <
bunch of nocoding junk
condstart
blah blah
   
We could go further in the direction of atomicity and define other start commands which included the operator and operated off the integer stack: startifequal, startiflessthan, startifgreaterthan, etc.  This would allow a single base pair mutation to make a gene conditional instead of requiring the evolution of a cond - start base pair sequence.

The probability  of evolving

*.eye5
bunch of noncoding stuff
some positive number
bunch of noncoding stuff
startifgreaterthan
blah blah

is a lot higher than that of evolving

cond
bunch of noncoding stuff
*.eye5
bunch of noncoding stuff
 some postive number
bunch of noncoding stuff
 >
bunch of noncoding stuff
start
blah blah

Additionally, I would like to see some analysis/investigation done on the current system w.r.t. mutation probabilities and relative frequencies of various flow control commands.  An inspection of the 420bp long DNA of a randomly selected bot in my zerobot sim shows 41 cond statements, 36 start statements, 15 else statements but only a single stop statement.  It is possible that much of the current problem has more to do with bugs or mis-balanced mutation probabilties then with fundemental structural issues in the DNA design.
« Last Edit: January 03, 2007, 12:36:39 AM by EricL »
Many beers....

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genome structure
« Reply #4 on: January 03, 2007, 04:44:30 AM »
I like the idea of a condstart type command.  I'm not so much a fan of having 8 or 9 differnt startiflessthan, etc.  Some of the issue I think is that the condition blocks don't mirror the same reverse polish notation that the rest of the DNA uses.

Ultimately I don't like the artificial segregation between conditions and start blocks.  The only real difference between the two now is that conditional and logical operators only work in a cond block, while storage operators only work in a start block.

If we could find a way to cleanly integrate logic and comparison operators into the body of the DNA, we could entirely do away with the flow operators all together.

For instance, if we run the condition stack and the integer stack at the same time that might work.  We could have store only work if the top value of the condition stack is true.  That would let you imbed comparison operators right in the DNA.

For instance,

cond
*.nrg 4000 >
start
50 .repro store
stop

becomes:
*.nrg 4000 > 50 .repro store

or even:

50 .repro *.nrg 4000 > store

I would add true and false operators that can push true and false onto the conditions stack, the same way numbers can push flat values onto the integral stack.

This entirely eliminates the idea of structural genes, for good or ill.  This also makes DNA interpretation by humans a little easier, since the only commands that takes a variable amount of numbers from the stack is store, inc and dec.

..........
You could probably add a "make random genome" function to test various probabilities with.  It's hard to make blatant assessments from the genome of evolved bots, since evolution is by definition going to screw with the probabilites as it weeds out unsuccessful strategies.

I would point out, also, that your probabilities follow the numeric order of flow commands in the code.  For instance, a cond is basically a tipo of some number with a value of 0.  start = 1, else = 2, stop = 3.  So what you're really seeing is a skewing of results from the initial 0ness of a zerobot.  But it could be a program skew too.

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genome structure
« Reply #5 on: January 03, 2007, 05:23:36 AM »
Now that I think of it, you would want store, inc and dec to remove values from the stack even if the condition stack is false.  You just wouldn't do anything with those removed values.

Basically, this makes the DNA 100% deterministic.  You'd always be able to easily decipher any evolved DNA, because you wouldn't have to worry about wether or not a store in a start block gets executed or not.

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Genome structure
« Reply #6 on: January 03, 2007, 11:38:06 AM »
Well, it looks like I'm not the only one who has problems with the current genome structure

I think removing cond command is a good idea.  The conditional functions (<, >, etc) are by definition conditional and there is no need to separate them structurally from the rest of the DNA.

I also like the idea of some higher-order condition/start commands.

I guess the real question is about the mechanism in which these conditional commands would activate the execution of DNA.  Here we have a lot of ideas, one contradicting another.

I have a bit of a problem with this particular approach:
Quote
*.nrg 4000 > 50 .repro store

Looks like this will connect conditions only to the closest store commands, while I think that there is a need to execute fairly large chunks of DNA.  But I may be wrong.  Maybe creating such short-distance connections is the key.  It certainly makes evolution easier.  But I am not ready to do away with the "start" and "stop" commands.

In answer to Jez:

Bot designers like predictability because they can fine-tune the responses of bots.

You would probably like to write code in the way:
"I see a bot, it's not the same species, it has a lot of energy -> Fire a shot".

During evolution all three conditions must have a very strong selective pressure to remain in the gene.  If they don't then they are thrown away and gene is left to fire constantly.  Also, mutations within conditions have a high probability of killing the whole gene.  Insert something like an extra "<" or even just a number in the wrong place and condition outcome turns into "false" and gene is not activated.

For evolution it is easier to manage this situation if the condition is made more flexible.  It is similar to writing code in this way:
"I see a bot -> I consider firing a shot at it (20% probability of firing)
I see a bot -> Maybe I should fire (40% probability of firing)
The bot has a lot of energy -> I am pretty sure firing is a good option (60% probability of firing)
It's not same species -> Even better, I am seriously considering firing a shot (80% probability of firing)
The bot has a lot of energy -> Yeah, looks like firing is the only option ("100% probability of firing)

In this way you can mess up one of these conditions completely and still have a gene that functions at 80% efficiency.  You can mix and match conditions so that some conditions drive the probability of gene execution up, while others drive it down, which allows for some fairly flexible logic.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genome structure
« Reply #7 on: January 03, 2007, 01:13:09 PM »
Quote from: shvarz
I have a bit of a problem with this particular approach:

Looks like this will connect conditions only to the closest store commands, while I think that there is a need to execute fairly large chunks of DNA.  But I may be wrong.  Maybe creating such short-distance connections is the key.  It certainly makes evolution easier.  But I am not ready to do away with the "start" and "stop" commands.

This is where codules would come in, I think.  The way codules would work, you could have a short distance condition control the entrance to a codule which itself contains a large amount of DNA.

For instance, suppose you have a codule that looks like this:

A:
*.nrg 5000 > 50 .repro store
*.nrg 500 < 100 .fdbody store

You could control entry to this codule using something like the following:

*.robage 10 > A ifcall

where ifcall calls the codule only if the current condition is true.

So this really only works at peak efficiency in tandem with codules.

Offline Jez

  • Bot Overlord
  • ****
  • Posts: 788
    • View Profile
Genome structure
« Reply #8 on: January 03, 2007, 01:22:37 PM »
Thanks Shvarz,

You are right, I like the - If This then Do That.

Using your example the bot would still have an 80% chance of firing at a friend, I guess you would have to add several 'drop' options as well.

I'm not sure about this approach, the randomness of doing an action, I thought organisms were more defined in their stimulus/reaction.
 I am in full support of changing something in some way to help with the problems that you originally outlined, the problems that the current code causes with mutations, just not yet sure if this would be the best approach.
If you try and take a cat apart to see how it works, the first thing you have in your hands is a non-working cat.
Douglas Adams

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Genome structure
« Reply #9 on: January 03, 2007, 02:17:16 PM »
The current functionality would be preserved for bot designers, although the code would look a lot more messy.  If you want "If this do this" gene, then just repeat all your conditions 10 times, so that the probability of gene execution goes up to 100%. And I'm sure you can come up with some more elegant solutions as well.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline MacadamiaNuts

  • Bot Destroyer
  • ***
  • Posts: 273
    • View Profile
Genome structure
« Reply #10 on: January 06, 2007, 10:21:51 PM »
Mmh, trying to think about a more DNA like coding. Sorry if it's too complex / too long. Maybe you can scrap something useful from all the junk.

Mutated condition and command aminoacids could be generated on the fly from patterns, like:

*### ### >
*### ### = or *### ### !=

### ### store
-# # store

These aminoacids use a number as a nexus point. For example:

@10 *.nrg 3000 >
@10 50 .repro store

Would attach one to another and form the gene 10.

Each aminoacid will store across generations the amount of cicles it has survived without mutations, (it's strongness). Small mutations may decrease this amount, based on the aminoacid lenght. New AAs will make weak genes.

Genes would be duplicated if they must activate several AAs at the same time:

@10 *.nrg 3000 > and *.robage 500 >
@10 50 .repro store

@11 *.nrg 3000 > and *.robage 500 >
@11 314 .aimdx store

Code viewer may hide the duplicated condition for readability. Though this isn't so neat, it lets evolution be more creative about complex behaviours. Maybe that bot will split gene @11 from the *.nrg 3000 > 'event' and use it as a veg spotting gene.

When a bot reproduces asexually, it reads its own DNA and copies each AA. A new DNA will build up from the resulting 'soup' of aminoacids. Small mutations of the same type may happen then, like changing 'and' for 'not', 3000 for 2493, '>' for '<', etc. Medium mutations may add or remove a whole condition.

If the nexus of an aminoacid (the gene number) is mutated and it doesn't match any other present AA, at the end of the reproduction it will be either:

1) cut off
2) filled with a new one from pattern

If two AAs 'compete' for the same spot, then their age is applied. A new mutation will have little chances to pass over a strong gene that has guaranteed survival during lots of generations.

Sexual reproduction will mean they will create one single soup of AAs from both DNA's. This means genes from parents will 'compete' to create the child:

Parent 1:
@10 *.nrg 3000 > (100000 cycles)
@10 50 .repro store (25000 cycles)

Parent 2:
@10 *.nrg 1000 > (80000 cycles)
@10 45 .repro store (500 cycles)

Both conditions and both commands will fight for the spot, matching their respective strongness or weakness. There's always a chance that a very weak aminoacid will pass over a strong one.

'stop' and 'start' conditions would isolate parts of the code that shouldn't be executed unless they are called to make RNA. They won't execute any command.

RNA would be made by reproducing only a certain range of genes. It may be stored in a new membrane or used as temporary code. It will block the real genes if it has got the same numbers. This one would create a fat cell that would stay idle most of the time, wasting almost no energy, maybe until a conspec comes and stores or takes some energy from it:

@12 stop
@12 ...

@13 *.nrg 3000 >
@13 10 .strbody store

@14 *.nrg 2000 <
@14 10 .fdbody store

@15 start
@15 ...

@16 ...
@16 13 14 rnacell

Or maybe the RNA can be thrown away. Then as some other bot picks it, it may be executed until it becomes waste. Perhaps it is a virus, or a toxin protein, or a happiness/fear/sexual signal.

Here's a venom/toxin that will be run only when it's loaded as RNA (active RNA will ignore stops and starts):

@14 *.refeye *.myeye !=
@14 16 17 rnashot

@15 null
@15 stop

@16 *.myeyes 5 != and *.body 100 >
@16 100 .fdbody store

@17 *.myeyes 5 !=
@17 -2 .shoot store

@18 null
@18 start

@19 ...
@19 ...

A RNA virus would be a protein that replaces 'legitimate' ones, forcing the code to make copies of it without actually damaging its execution. Or maybe just a self copying protein that forces its own execution without caring about which genes it blocks. And maybe it can get mixed during reproduction...

Also, instead of using magic -1 shots to digest targets, bots maybe should use a digest protein that forces the target to shoot energy.
« Last Edit: January 06, 2007, 10:24:17 PM by MacadamiaNuts »
Sometimes you win, and sometimes you lose...

Offline Endy

  • Bot Overlord
  • ****
  • Posts: 852
    • View Profile
Genome structure
« Reply #11 on: January 06, 2007, 11:03:43 PM »
Maybe for conditions in the dna, have the value after the next value multiplied by the condition in front. ie.

35 *.eye5 > -1 .shoot store

If *.eye5 is greater than 35 .shoot is multiplied by 1. If not it's multiplied by zero.

If a condition is followed by another condition then that value is multiplied by it's value (0,1).

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genome structure
« Reply #12 on: January 08, 2007, 05:06:53 AM »
I think, Endy, that what you're proposing is still basically a boolean flag.  Just with a less arbitrary way of describing how it works, which I like

I agree that I think using a straight flag instead of a stack is the way to go for conditions.  However, I would still like to see the logical operations of "or" and "and" both be allowed at the very least (xor and not: maybe, maybe not.  I can live without them).

So as I see it, you would need to have store commands either just take the top value from the boolean tack, or "and" together the stack, which really puts us back to the way things currently work.

I think I would favor either a straight flag, which is ANDed with any new conditions, a flag that is ORed with any new conditions (with some "clear" command to set the flag to false), or something similar to the way things work now, with a full boolean stack.

I would prefer an ORing of conditions over an ANDing if it comes to the two (we could implement it as part of the new DNA system I'm working on, so we wouldn't have to worry about backwards compatibility).  However, I just can't get past the point that there are going to be times when you'll want to AND and other times where you'll want to OR, so I'm thinking there isn't really a way to get rid of the boolean stack.

Maybe something like this: have store commands use the top value of the stack, with no automatic ANDing or ORing.  This would favor the last condition to be evaluated, which would make it easier for mutations I think.  Longer conditional chains could be ANDed and ORed together.

@Macadamia - I think moving away from treating the DNA as a strand of undifferentiated beads is the wrong direction.  Aside from issues of DNA structure rigidness, alot of what you're trying to see when you run an evosim is what sort of novel genes can arise from mutations that do not respect or understand gene structure.

For instance, suppose a bot inserted the condition *.nrg 5000 > into a gene's condition.  Is that really all that exciting?  Compare this to something like:

*.body 23 div * 28 100 mult >

Which probably doesn't do anything (I just made it up) but if it did do something, would be the far more interesting mutation.  I think it's far more interesting to have valid genes suddenly appear in the frothy soup of stands of randomly assorted base pairs, than it is for a mutation to luck out and produce a fully functional anything in a single mutation.

That said, it might be interesting if we could figure out a way to build a "metaDNA".  Something that would be able to scan through the Bot's DNA, correcting errors or inserting new conditions in places, itself subject to mutations.  But I have no idea how something like that would work.

Offline Endy

  • Bot Overlord
  • ****
  • Posts: 852
    • View Profile
Genome structure
« Reply #13 on: January 08, 2007, 09:22:27 PM »
I've been taking a look at the Evolve4.0 dna structure, since they seem to have something like this already. I do like how they can have conditions daisy chained to each other ie.

'A
start
Some dna here
50 *60 >
stop

'B
start
More dna here
*74 30 >
stop

'C
start
Even more dna
stop

C is activated if the condition in B is true, yet B is only activated if the condition in A is true.

Still a bit rigid in terms of having to call genes to activate them(if not conditionally activated). Seems like it'd be more evolvable if they were always activate like DB's system.

I'm going to take a look at some of their evolved critters and see how the code is currently evolving.

Offline MacadamiaNuts

  • Bot Destroyer
  • ***
  • Posts: 273
    • View Profile
Genome structure
« Reply #14 on: January 09, 2007, 08:06:07 AM »
Mmh, maybe if just stores didn't work without a true in the top of the stack bots wouldn't get rid of conditions.

So:

0 0 =

Leaves a 1 in the stack. Then the next store would check that first value and if true, would store the second one on the third memory location, leaving the 1 in the stack for the next store.
Stop would clear the true value.

Bots would be able to use almost anything as a condition:

'*.robage start 1 .deltie store' would be activated when '*.robage = 1'

'*.eye5 not' would be the same than 'cond *.eye5 0 ='

'*.tiepres sgn' would work like 'cond *.tiepres 0 !='

And so on. That's 2 expression conditions instead of 4.
Sometimes you win, and sometimes you lose...