Author Topic: New Mutation Paradigm  (Read 7202 times)

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
New Mutation Paradigm
« on: May 25, 2005, 11:39:10 PM »
Okay, after looking over Greven's Post and how Avida does mutations.

Here's how the current DNA works (this is important)

Each unit is comprised of a type and a value.  Type is:

0 for numbers
1 for *numbers
2 for commands (add, etc.)
3 for conditions (=, etc.)
4 for flow (cond, stop, etc.)
5 for logic (and, or, etc.)

value determines which command or number it is.

Okay, here's a new system I've come up with:


First off, we'd need to come up with a way for the user to create a custom gaussian distribution, where he can decide what values the standard deviation includes (taht is, so he can determine the range for 99% of the generated values).


Mutations aren't limited to reproduction only, nor are they limited to children during reproduction.

1.  Point mutations (changing of ~1 command) can occur at any time in a bot's life.  It can change either the type or the value of a unit.  Which one is effected is determined by a probability slider the user can set.  If you set it more towards value, than type is less likely to mutate.

Changes in type always move  up or down by 1, or perhaps some user defined value of 1 to X).  Modular arithmetic is used, so a type of 5 + 1 = 0.  and 0 -1 = 5.

Changes in value are set by 2 custom gaussian curves.  One for number and *number, and one for the rest.

The number of places effected are determined by a custom gaussian curve.


During reproduction, much more massive changes are possible.

2. Copy Error mutation.  This is just like a point mutation in effect, but it's probalities and distributions are (or atleast can be, if the user so desires) different and only occurs during reproduction.

3.  Extra Copy Mutation.  This copies a segment of DNA X units long starting at A and inserted at B.  A and B are determined as a number between 1 and the genome length.  X is a custom gaussian curve.

4. Deletion.  Starting at A a segment of DNA X units long is deleted.  A is determined randomly between 1 and the genome length.  X is a custom gaussian distribution.

5.  Reversal.  Starting at A, X units of DNA are reversed.  That is, if the DNA is 5 6 > start, then it becomes start > 6 5.  A is between 1 and genome length, X is a custom Gaussian distribution.

6.  Insertion.  Starting at A, X units of DNA are inserted.  The type is determined by a special 6 way slider between all the types that the user can set.  The value is set by a linear distribution of all existing values for that type.

These mutations can affect either parent or child.


Mutations which result in a non existant command will show up as number@number (or some other code), with a toggle to turn off display of all Junk in most DNA viewing windows and recording areas.

The beauty of this is that as we add stuff to the program, we don't have to modify the mutations code.  It's all set up to handle whatever we throw at it.

I'd probably create a nested command window, so alot of the custom gaussian curves, etc. are hidden unless you look for them.

On the surface you just set the 6 types of mutations' probabilities.  That's even easier than it currently is.

There'd also probabaly be a way to save mutation rates seperately from settings, so you can just import a settings file directly for a new species you've added.
« Last Edit: May 26, 2005, 01:49:32 AM by Numsgil »

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
New Mutation Paradigm
« Reply #1 on: May 26, 2005, 11:56:37 PM »
I know it's alot to read over, but I'm kind of itchy to get started on this one (it's a fun kind of work, since it's redesigning over inventing).

If anyone has any thoughts one way or another...

The only thought I had was maybe using modular arithmetic for all the types.  So *1001 = *1.  But that might change the way things like numbers are mutated into different type commands and then back.

Offline Carlo

  • Bot Destroyer
  • ***
  • Posts: 122
    • View Profile
New Mutation Paradigm
« Reply #2 on: May 27, 2005, 08:16:32 AM »
Quote
First off, we'd need to come up with a way for the user to create a custom gaussian distribution, where he can decide what values the standard deviation includes (taht is, so he can determine the range for 99% of the generated values).

Ok, this should be the basis of every correct mutation systems.
But I think this should only apply to immediate values and maybe pointers. I can't understand, for example, why an add command, to become a sqrt command, should pass by a sub and a mult command. Hope you're getting the point. Graduality in immediate values is (often) obvious.  
5 .up store should be more likely to change in
6 .up store than in
500 .up store.  

(but often is not: there is no apparent relation between -1 .shoot store and -2 .shoot store or 2 .shoot store  ... and this is a big problem)

Same thing don't applies to instructions and even pointers. Is aimsx more similar to dx or to shoot? Who knows? Does the question even makes sense?

Another problem. Think a the eyeN values. They go from 0 to 100. Now take the refnrg value. It goes from 0 to 32000. Take the shootval value (memory shots): makes sense from 0 to 1000. Take the up, dn etc commands. They make sense (maybe) from -30 to 30.
This is a BIG problem. Mutations should be able to create new functionalities by mixing the code at random. But what can happen if I take a .refnrg value and put it in .dn? Nothing useful. Because the range of values offered by .refnrg is completely different from that accepted by .dn. What happens if the code

"5 . up store"? is mutated in "5 .repro store"? Nothing useful, again. A repro with a 5 value is nearly useless.

I think that these problem are deep in the structure of darwinbots.
A solution for the second problem could be this one: writing in the sysvars file the range of each var. You'd have

1 .up  -20  20
2 .dn  -20  20
....
5  .aimdx  -600 600
...
501 .eye1 0 100

this way you would be able to normalize each value read or written to a var to an universal range... and have much more meaningful mutations.



Quote
Mutations aren't limited to reproduction only, nor are they limited to children during reproduction.

This again it's ok for me. Correct from the biological point of view. The only doubt I have about this is that it will likely introduce something like a maximum life span: because if you have, say, one point mutation every 100 dna positions every one thousand cycles on average, this makes it unlikely for a robot to live for arbitrarily long time (death by aging was one of the possibilities I wanted to leave open to evolution).
But on the other hand, robots could evolve multiple copies of the fundamental genes to avoid this problem.

Quote
Changes in type always move up or down by 1

This makes no sense, there is no reason for that. There is no reason for instructions to be "nearer" to pointers than immediate values are.

Quote
During reproduction, much more massive changes are possible.

I agree on that and on the mutation types you proposed

Quote
Mutations which result in a non existant command will show up as number@number (or some other code), with a toggle to turn off display of all Junk in most DNA viewing windows and recording areas.

The beauty of this is that as we add stuff to the program, we don't have to modify the mutations code. It's all set up to handle whatever we throw at it.

No. I perfectly understand what you mean, but keep in mind that those will always be useless mutations until you insert new commands into the dna interpreter. A useless mutation is a mutation that:

- either leave unchanged the functionality of the dna so that is in fact a non-mutation
- makes the code non executable or strongly disfunctional, so that you'll waste cpu cycles just for bringing the new mutated robot to death

Moreover, you have to keep in mind that you cannot add new commands AFTER the mutations are in the dna, because each mutation has to be positively selected by its effects. Introducing new commands after the mutations are in the dna means changing abruptly the way dnas a re interpreted. And that's no good.

So, I think that the best thing to do is allowing the mutations routines to produce only the actual instructions. Simply code them with numbers going from 1 to max and, when you introduce the (n+1)th instruction, change the max to n+1.
« Last Edit: May 27, 2005, 08:52:15 AM by Carlo »

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
New Mutation Paradigm
« Reply #3 on: May 27, 2005, 11:05:15 AM »
Agree with Carlo on most of things.

One thing that somewhat bothers me is the mutation during life of a single bot.  I know it is somewhat related to real biology, but I don't like it.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
New Mutation Paradigm
« Reply #4 on: May 27, 2005, 05:44:02 PM »
You should be able to turn off any mutation types you don't want to play with.

Types only mutating by +-1 was just to help self correcting mutations.  That is, there's a 25% chance of 2 type mutations going back to what they were originally.

Anyway, the range of +- should be user definable, so you can set it to whatever you want.  The idea is to have so many things be user definable that you can run any kind of mutation rules you want.

You misunderstood my last statement.

Quote
Mutations which result in a non existant command will show up as number@number (or some other code), with a toggle to turn off display of all Junk in most DNA viewing windows and recording areas.

The beauty of this is that as we add stuff to the program, we don't have to modify the mutations code. It's all set up to handle whatever we throw at it.

The first section (number@number) means that the mutations code has the ability to mutate any commands at all, even ones that don't exist.  Then, when we add new commands, the mutations code within DB doesn't have to be modified by increasing the range of N.

The number@number that corresponds to the new code is coincidence, and I agree that it doesn't make any sense from an evolutionary standpoint.  It hasn't been selected for.  You'd probably want to eliminate JunkDNA (in this case I mean number@number) from a bot you're porting to a newer version.

Also, number@number lets, say, a 5200 mutate into a 5200@3.  Then later mutate into *5200, then maybe back to 5200.  It means that type mutations don't modify the value, which is important.  5200 only makes sense as an actual number, so for it to mutate into anything worthwhile it would need to change from 5200 into, say, 9, which would take quite a few mutations of value.

I agree that ranges in DB are rather arbitrary.  One solution is to apply modular arithmetic to all commands.  So we do like you say and say .up can only be -20 to 20.  Then a command of 1041 .up store would be the same as 1 .up store.

That way 32000 still works for different commands.  It doesn't solve all problems, but it does solve quite a few.
« Last Edit: May 27, 2005, 05:45:57 PM by Numsgil »

Offline Carlo

  • Bot Destroyer
  • ***
  • Posts: 122
    • View Profile
New Mutation Paradigm
« Reply #5 on: May 27, 2005, 07:43:57 PM »
Are you saying that you may have something like

functional code (---mutation--->) junk dna (----mutation--->) functional code again?

Well, this is a possibility. I don't know whether it would improve things or make them worse. But it surely would make dna more unreadable then they are now. And, it would make it much harder to use evolved dnas from previous versions (you'd have to split off all the junk first). Having a mutation routine which don't have to take in account the actual language (in fact, just an integer parameter...) would be of little comfort.

But I think you're right to some degree. Changing, say, values to pointers and vice versa could be a good thing. But changing instructions to values it's not, because the information you keep just changing one part of the (type, value) couple it's simply meaningless. Like in the case of different ranges in sysvars, the value part of, say, an add instruction has simply no *particular* meaning as a pointer or an immediate value. The only way to become useful is to mutate again in the original type, which is very, very improbable.

So, I think we should take in account all these things and design mutation routines which act in a very very arbitrary way, trying to make changes which could make some sense. I think this is necessary due to the way the DB language is designed now. My fault.

OR...

Or we could completely redesign the DB code to be easier to mutate. Less memory locations. Little, defined and universal ranges for variables. Single effects (no more -1 shots, -2 shots, 0> shots, etc.). Simpler structure (sparse conds, etc.). Etc.
« Last Edit: May 27, 2005, 07:45:59 PM by Carlo »

Offline Endy

  • Bot Overlord
  • ****
  • Posts: 852
    • View Profile
New Mutation Paradigm
« Reply #6 on: May 27, 2005, 11:06:37 PM »
I personally think we should just expand DB as much as possible as long as the users have a say in what actually happens during evolution. I'm not sure that we should be too worried about "lethal" mutations being possible. These still naturally occur with a fair amount of frequency in real beings, so why not in DB's?

On a side note we don't really need to worry about the up/dn commands the cost is limited to the amount used, so a bot isn't going to be charged massive amounts of nrg anymore.

Either way sounds cool, I'm sure you guys will hash something good out.

Endy B)

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
New Mutation Paradigm
« Reply #7 on: May 27, 2005, 11:55:32 PM »
Real DNA works by triplets.  These are then translated into amino acids.  So each of these triplets is basically a command.

So our 2 integer unit is very similar to this triplet of real DNA.

Now, real DNA only has like 64 possible codes.  Since some complex behavior is hard coded into the DB language, we have a whole lot more.

But the relationship between our commands and real genes are very similar.  That is, a single letter change in DNA can result in a new amino acid and protein with absolutely no similarity to the original in function.  So if our add command mutates into a *.nrg, it doesn't seem to make alot of sense to us but who knows how the bots will use them.

The only real problem I see is that there might end up being alot of number@numbers floating around in a genome with just about 0% chance of mutating into anything that does anything.  In fact, since there are alot more commands that do nothing than commands that do something, it's feasible that there will be a huge influence for all commands to revert to number@numbers.

So we may want to consider setting a specific limit on what kinds of values all sysvars return and use.  Say, nothing above 1000.  xpos and ypos could return egrid squares, so their values would be between 0 and 800.  Etc. Etc.  The hardest problem is nrg and body.  I'm not sure what to do there.

Interesting problem.

Offline Carlo

  • Bot Destroyer
  • ***
  • Posts: 122
    • View Profile
New Mutation Paradigm
« Reply #8 on: May 28, 2005, 03:32:27 AM »
Quote
So if our add command mutates into a *.nrg, it doesn't seem to make alot of sense to us but who knows how the bots will use them.

-------
I think there are two points in what you say. The first, is that a point mutation should involve only one of the two values making a dna position, that is "type" or "value".

But this way, add wouldn't mutate into *.nrg, but _always_ in one of the .up or .dn (I don't remember which code represents the add instruction, but is something really near to 0, 1 or 2). You would have mutations unbalanced in favor of the first 10 sysvars or values ranging from 0 to 10 or so.

You cited triplets. Look: you have 4 types of dna pos, and 32000 possible values. 4 means two bits, 32000 means 16 bits. So each pos is represented by:

oo'oooooooooooooooo
^               ^
type        value

which makes eighteen bits. Now, if we like to consider this a triplet, we have to split it this way:

oooooo-oooooo-oooooo

this means that mutating the first part means mutating the type (first two bits of the first part of the triplet) together with the value (six more significant bits of the value part). So, if we split it in equal parts (like a triplet is), we have to mutate types and values together. Same thing if we split it in two equal parts.

Therefore, I think that mutating only type or only value in a dna position makes, in general, no sense.

---------
Your second point is about letting mutations produce junk dna, which means unexistant instructions, sysvars, and so on. This, as I already said, means filling up dnas with tons of junk dna. But junk dna is unlikey to become useful again: as I told you, the instruction coded by the number 500 (possible if you let type change without changing the value) will never become, through subsequent (little) mutations, a valid instruction. It may at most go back to where it came from, that is, to an immediate value or to a pointer, through another mutation. This is very improbable (if a dna position has a chance over 1000 to mutate, it has a chance over 1000000 to mutate two times - not to mention it has to mutate the right way). But if the overall result is to have converted an immediate value to a pointer or vice versa, what was the first mutation for? Let's just say that immediate values can only change to sysvar OR be substituted by a completely different instruction.

I see it this way: we should have

a generator of random meaningful sysvars
a generator of random meaningful values
a generator of random meaningful instructions

and point mutating a pos would mean just calling one of these generators and put the (type, value) they return in the pos we want to mutate.

Offline Endy

  • Bot Overlord
  • ****
  • Posts: 852
    • View Profile
New Mutation Paradigm
« Reply #9 on: May 28, 2005, 04:21:15 AM »
It's going to be hard deciding what is "meaningful". Even large numbers could become useful with enough div/sub or vice versa small numbers could become large with mult/add.

Would it be possible to compromise and have an "off" button for the limits on mutations? This should be a viable solution for everyone. We could also simply test the two types and decide which works better.

Okay, enough with mutations, time to play with the Wiki :)

Endy B)

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
New Mutation Paradigm
« Reply #10 on: May 28, 2005, 05:32:12 PM »
Definately pointed out the two major problems Carlo.

I'll leave type to type mutations unworked out for now.  Most problems seem to stem from that.  However, number -> *number and back again should probably still be possible.

Offline Endy

  • Bot Overlord
  • ****
  • Posts: 852
    • View Profile
New Mutation Paradigm
« Reply #11 on: May 28, 2005, 08:55:34 PM »
Would it be possible to just have a readback command? Something like:

ref 50 could readback the value of 50

Not sure how hard it'd be...

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
New Mutation Paradigm
« Reply #12 on: May 28, 2005, 10:49:18 PM »
I was thinking something like that.

Have a * command, so that

*50 is the same thing as 50 *.

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
New Mutation Paradigm
« Reply #13 on: May 29, 2005, 01:20:38 AM »
Good idea with the * command.

I was thinking about your discussion of junk DNA...  

First of all, I want to comment on Carlo's notion that we already have junk DNA.  This is not exactly true, becasue true junk DNA is not expressed at all, while any "junk code" that evolved bots acquire actually does affect bot's behaviour - at the very least it places some numbers on the stack, which may affect real genes later on.  So, when I am talking about junk DNA, I mean DNA that is not processed by program at all.

Originally I thought, just like Nums did, that having junk DNA would be very beneficial for the same reason it is beneficial in real organisms: duplicate gene >> turn one copy off >> mutate for a while>> turn it back on>> got a new protein.

But then I thought about it some more...  In DBs we are forced to use relatively high mutation rates, becasue we don't have time to wait for millions of generations for something good to appear and our population size is several hunderds at most.  At these mutation rates, any inactive gene will go into complete mess very quickly and is unlikely to become anything even remotely useful for a bot.

So now I don't have a very strong opinion on this.  My inclination would be to allow junk DNA and see where it goes.  If it becomes a huge burden on processor resources and not too useful, then turn it off.  But it may turn out that all said above was wrong and having junk DNA would actually be useful.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Endy

  • Bot Overlord
  • ****
  • Posts: 852
    • View Profile
New Mutation Paradigm
« Reply #14 on: May 29, 2005, 03:10:43 AM »
I was kind of thinking that the * command would be used for changing the location referenced, currently the bots are pretty much stuck with whatever *locations they have at birth. Should be good from both a combat and mutation standpoint.

Would make some great conspecific recognition genes possible, a bot could litterally cycle through all the ref/myvars making worries about misrecognition a thing of the past. :)

For mutations it would help to neutralize some of the higher/unused numbers by returning values of zero for most of them.

Endy B)
« Last Edit: May 29, 2005, 03:12:38 AM by Endy »