Author Topic: Recombination mechanism  (Read 10454 times)

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Recombination mechanism
« on: June 09, 2005, 03:49:22 PM »
The true fault of the current sexrepro system in DBs is not really the choosing mates or staying together for long enough time or the lack of diploidy - all of these are circumstantial for sexual reproduction.  The real advantage of sex is recombination - that is re-assortment of different regions of DNA.  Recombination allows getting rid of bad mutation, escaping mild forms of the muller's ratchet and joining two good mutations that arose independantly.

Recombination is the true blessing of sex and our poor bots are missing out on it!  

Right now during sexrepro genes are simply mixed up randomly, gene 1 from bot 1, gene 2 fro bot 2, gene 3 from bot 1 and so on.  The problem with this approach is that the program actually does not compare the genes, so that if one of the bots has an extra gene inserted, then the product of this "recombination" will be a truly messed up bastard.  In real life there are mechanisms that align DNA and make sure that exchanged pieces of DNA are actually at least remotely similar.

Another problem is that duplicted genes can't recombine with each other.  Here is what often happens in real life:  you start with 1 gene, then duplicate it.  Now you have copies A and B.  Since the two copies are similar, then during sexual reproduction you can align copy A with copy B:

DNA 1: A-----B
DNA 2: ------A-----B

After recombination:
DNA 1: A----A----B
DNA 2: ------B

This is a very important mechanism for duplicating genes, re-assorting mutations and activating non-functional copies of genes.  And our bots are missing on this too!

So, what can we do?  We can't just align DNA in DBs looking for similar regions.  The algorithms for that are so complex that it is impossible to do real-time (not until quantum computers appear).

I think I have a solution:
When DNA is loaded into the program, we introduce specific unique markers into its structure after every command.  Something like random 16-digit numbers (this can be done behind the scene, so that it we can still look at the DNA).  These numbers cannot be mutated, but they can be duplicated/deleted along with DNA that surrounds them.  During sexual reproduction, the newly formed DNA starts from one of the parents and with some probability it switches to DNA of the other parent.  During the switch, program looks for the unique identifier at the place of recombination in the DNA of the other parent.  If there is none, then it returns back to the first parent, if there is one, then it continues from the spot where that identifier is, and if there is more than one, then it picks one of them randomly and continues from there.

This would essentially allow recomabination to happen, along with all the advantages it brings.  What do you think?
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Recombination mechanism
« Reply #1 on: June 09, 2005, 04:10:23 PM »
I was thinking of using centromeres to allign two chromosome threads.

The centromeres are defined for a specific spot on the DNA of two strands, with crossing over occuring based on that centromere as a reference point.

I talked about this in the junk DNA thread just now and I had a post in suggestions not too long ago.

I think crossing over (the current version doesn't count for this stipulation, since it's more like bacteria gene transferance anyway) should only be allowed between chromosomes that are bound together with a centromere.

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Recombination mechanism
« Reply #2 on: June 09, 2005, 04:26:29 PM »
It does not solve the problem of crossing over non-similar sequences.  And it does not allow non-reciprocal recombination (between different copies of the same gene).
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Recombination mechanism
« Reply #3 on: June 09, 2005, 04:51:50 PM »
Well, centromeres are unlikely to develop between disimilar chromosomes.  If they do, the chromosomes will eventually become similar through crossing over.  Like shuffling a deck of cards.  If I have one deck of all red, and another of all black, after enough shuffles they'll become randomly distrubted into red and black.

That is, if you cross over from chromosome A 50 units from its centromere, and do the same for B, it doesn't matter if they're similar or not.  If they're not similar, the result is probably a mess.  If they are identical, you get no effect.

As for non-reciprocal recombination, no, it doesn't address it.  Give me some time to think about it (I'm tired right now).

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Recombination mechanism
« Reply #4 on: June 09, 2005, 05:32:01 PM »
From what I can tell, non-equal crossing over is the exception instead of the norm.  That is, it's a mutation.

I was thinking of adding a portion to the mutations tab for crossing over errors.  This would go in there.  You could set how far off the length is wrong, and the chance of such an error occuring each time crossing over occurs.

As far as I can tell, this unequal crossing over is primarily between neighbor areas.  That is, you won't get a massive slip in the chromosome pairing very often.

Although occassionally the chromosomes will split on different ends of the centromere.  But that's a massive mutation that rarely produces meaningful results.

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Recombination mechanism
« Reply #5 on: June 09, 2005, 06:14:47 PM »
Yes, it is a mutation, but it is such a good, nice mutation that it was the direct cause of many-many-many cool things evolving.  Like most of our immune system, or taste and smell receptors and so on...  And that's just in eukaryotes.  

The problem with counting from centromere is the same as counting from the beginning of DNA - it does not solve anything.  A single insertion or deletion and bots can't recombine anymore.  Besides, you are still thinking in recombining genes, but my method allows recombination to occur anywhere, and it almost always will produce meaningful working bot.  In addition, it will allow repair mutations using second strand as a template if we decide to go diploidal.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Recombination mechanism
« Reply #6 on: June 09, 2005, 06:28:26 PM »
Quote
A single insertion or deletion and bots can't recombine anymore.  Besides, you are still thinking in recombining genes, but my method allows recombination to occur anywhere, and it almost always will produce meaningful working bot.  In addition, it will allow repair mutations using second strand as a template if we decide to go diploidal.
I'm thinking in terms of breaking off whole areas of the chromosome and reattaching them elsewhere, quite irregardless of genes at all.

As you insert a new condition in one chromsome, then the strands become unequal and an unequal crossover occurs ipso facto.  The centromere, no matter where it is spatially at first, will eventually become more or less in the middle since the DNA is shuffled around it.

If nothing else, the centromere system solves other problems as well as recombination, and it's older.  From a few simple rules I can construct most of the rules of eukaryotes.

Centromeres allow chromosomes to be paired within the genome.  Centromeres can be developed naturally from repetitive sequences.  Centromeres allow chromosome pairing to be epigenetic.  Centromeres ensure incentive for similar chromosomes to become paired since that way they'll for sure move to opposite ends during mitosis.  And they allow crossing over to occur by simple breaking of both strands and swapping pieces.

ie:

chromo1:
1-1-1-1-1-1-1

chromo2:
2-2-2-2-2-2-2

after one crossing over event:

1-1-1-2-2-2-2
2-2-2-1-1-1-1

It doesn't even have to occur at a gene break point.  It can occur anywhere.  If they're unequal lengths:

1-1-1-1-1-1-1-1
2-2-2-2-2-2-2

after one crossing over event:

2-2-2-2-1-1-1-1
1-1-1-1-2-2-2

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Recombination mechanism
« Reply #7 on: June 09, 2005, 06:51:33 PM »
If I can add to my last post:

I was going to model crossing over on the Holliday Model.

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Recombination mechanism
« Reply #8 on: June 09, 2005, 07:13:01 PM »
Centromeres is a nice idea.  And it can work together with my UID (unique ID) system in parallel.

I still think UID system is a lot more powerful, because for recombination it allows all that your system does and then some more.  

Imagine that two people start evolving the same bot, then share through internet.  By that time genomes changed a lot, new genes were introduced, mutations accumulated, some commands got deleted.  Your system of simply counting off a certain number of commands and switching will produce only non-viable bots.  My system will allow them to recombine and almost always create a viable bot.

Another example:
imagine a series of sequencial commands:

a-b-c-d-e-f-g

Say one of the bots got an insertion (I) that really increases its fitness

A-B-I-C-D-E-F-G

Bots are reproducing sexually, so that this bot must mate with an old-style bot.  Your system creates a number of different bots with this insertion, with general types like this:

a-B-I-C-D-E-F-G (alive)
A-B-I-d-e-f-g (dead)
A-B-I-C-D-f-g (dead)

Basically, the only survivable bots are those that crossed over before the insertion.  Any time after - and the whole gene is messed up.  Unless cross-over happens in junk DNA.

My system will always make a viable bot, because it aligns the DNA correctly.  It does not introduce any additional insertions/deletions during reproduction, so that the chances for off-spring to be viable are much-much higher!

P.S: Holliday model assumes you have aligned the DNA first.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Recombination mechanism
« Reply #9 on: June 10, 2005, 06:06:15 AM »
Okay, correct me if I'm wrong, but:

Quote
a-B-I-C-D-E-F-G (alive)

simple insertion of new instruction, and trading of chromosome arms.

Quote
A-B-I-d-e-f-g (dead)

small deletion and trading of chromosome arms (remember that for DNA missing a letter is disaterous because it makes all words downstream meaningless.  But for us each DNA instruction is atomic, so in our system such an event would be like DNA losing  the coding for a single amino acid).

Quote
A-B-I-C-D-f-g (dead)

insertion of new instruction and deletion of E/e.

Neither deletions nor insertions are automatically lethal, unless they're in a gene that codes for important behavior.  And then you shouldn't be mutating it anyway.

That said, some mechanism like polymerase that zips up two paired chromosomes for crossing over could be useful.  How does real DNA pair up the chromosome?  How do polymerases do it?  Maybe we can do something similar.

Mind you I'm not against an ID system, I just want to understand the problem and solutions possible before picking one.

Offline Carlo

  • Bot Destroyer
  • ***
  • Posts: 122
    • View Profile
Recombination mechanism
« Reply #10 on: June 10, 2005, 09:38:04 AM »
Shvarz, your UID system gave me an idea. I think UID would be a good thing, being basically a shortcut to actually comparing dna portions to look for similar parts. On the other hand, it would require changes in the way dnas are stored. What do you think of this: without the need to modify anything anywhere else, we just change the sexual reproduction routines. We insert a new routine which calculates a number (let's call it idcode) from each gene (say, just by calculating the sum of the type-value sequence in the gene, or something like that). Then, we take each gene from one of the parents, and we couple it with the gene with the closest idcode number. When genes have been coupled, we take one of them from each parent and build a new dna. We should decide what to do with genes that remain uncoupled.

Now, to be clear, an example. Say we have a gene

cond
  *.eye5 0 >
start
  10 .up store
stop

If we assign a value to each instruction, variable and number, we can easily make a sum of these values and obtain a code which is not unique for that gene, but has very good chances to be unique in that dna. An important feature of this code is that, unlike a hash code, it is only slightly changed by little variations.
Now, say that the calculated idcode for this gene is 100.

A mutated copy of this gene, say

cond
  *.eye7 0 >
start
  10 .up store
stop

may have idcode=102

Now, if we have two dnas, with genes: (where 1, 2, ... are different genes, 1a, 1b,.. are slightly mutated copies of the gene 1, and there's an idcode associated to each gene)

dna1...............dna2
1-100.............1-100
1-100.............2-324
2-324.............3a-64
2a-320...........3b-66
3-60...............4a-410
4-400.............5-150
4b-390

You understand that it is relatively simple to mix these two dnas. You should couple each gene of dna1 with one of those with more similar idcode on dna2. You may decide randomly which gene to couple if there are two or more with similar idcode. For example, you may decide to couple either gene2 or gene2a from dna1 to gene2 in dna2.

If you wish to have a more fine crossing, you may even decide to go further and split the coupled genes somewhere inside them and mix the two ends.

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Recombination mechanism
« Reply #11 on: June 10, 2005, 10:15:09 AM »
The only problem with that is its still gene-centric.  We should be moving away from treating the genes as whole units.

Offline Carlo

  • Bot Destroyer
  • ***
  • Posts: 122
    • View Profile
Recombination mechanism
« Reply #12 on: June 10, 2005, 10:50:10 AM »
I'd prefer to leave apart the discussion about possible new structures for the dna language, and implement rapidly this kind of sexual reproduction. If we (you) are ever going to change the dna structure, you'll change also this, among many other things. But let's have immediately what we can have now.

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Recombination mechanism
« Reply #13 on: June 10, 2005, 10:58:07 AM »
We should build up from A to B.  The code is already fairly strung out and inconsistant.  Coding something that just needs to be coded over later is wasteful.

And remember what's wrong with the current sexual reproduction system:  It's gene centric.  If all you're after is mixing up the genes between bots, it does just fine.

A more complex, chromosome based system is what we should be considering, and working toward in small but self consistant and robust steps.

Offline Carlo

  • Bot Destroyer
  • ***
  • Posts: 122
    • View Profile
Recombination mechanism
« Reply #14 on: June 10, 2005, 11:13:28 AM »
Quote
We should build up from A to B. The code is already fairly strung out and inconsistant. Coding something that just needs to be coded over later is wasteful.
First, it's not said that we're really going to change the dna language. I don't really see any need for that, except maybe going in the direction of less programmability and understandability of the language itself.

Quote
And remember what's wrong with the current sexual reproduction system: It's gene centric. If all you're after is mixing up the genes between bots, it does just fine.
It seems to me that nobody said that. Shvarz's point, that is correct, is that the mixing of genes how it is made now don't work when genes are duplicated (because sometimes  the zig-zag procedure cuts off a valid gene). But this method solves the problem perfectly, and it's also biologically sound.
« Last Edit: June 10, 2005, 11:14:07 AM by Carlo »