Greven's DNA structure!

Code center > Suggestions

(1/3) > >>

Greven:
Post not finished YET!!

This post will be a little technical.

I have criticised the DNA structure and DB in general, and I promised to make up something and here is my proposal. It is not fully finished! But what I want, is to discuss it in detail and hear ideas/critics from all DB'ers, because this is very big thing to begin changing.

My main quest for this were to creat a DNA language/structure that:
* allowed junk DNA.
* not destroying any existing bots.
* making mutations work better and implentment them easier(not saying they dont work now, read on!!!)
* preserving most of DB as it is now.You must understand that I dont say this is better than the current system and it is [span style=\'font-size:14pt;line-height:100%\']not [/span] perfect, far from perfect, but I like to get opinions, and if you can argue that the current is better, so be it, it is just thought as a suggestion/discussing topic, not a: IF-IT-CANT-BE-IMPLEMENTED-AS-IS-DONT-USE-IT, I am open to constructive critcism.

I will use words like:
* genome/genotype to identify an entire DNA sequence (Bot DNA)
* DNA letter or just DNA as a single instruction (like the store command)
* phenotype as the way the bots ending genome is like (I know this is not what phenotype actually mean, but it is the best word to use right now.)
* DNAS is short for DNA Structure!
I will try to argue for my point of view, but still there maybe things I havent thought of.

The genome and mutations:

The genome for a bot now is stored in an array. In my DNAS the genome is made up by a single string!
After a bot birth the genome is read and implemented by the main DB program, it will end up in a array, the phenotype.

Example:
We now have a genome looking like this:
"ABDDhFEKFLjgFHFKDADDFDFEIREUROIEr"
(When I mean a single instruction(DNA letter) I mean 'A' or 'B' etc.)

Why:
With mutations it will be alot easier. Instead of having a delete gene, insert gene etc., we only need 3 (maybe 4) different mutation:
* Insert
Can insert a single instruction or an entire substring within the genome. This one can have two different functions: inserting instructions already in the genome (and letting the point mutation introduce new instructions) or insert all possible instructions.
* Delete
Can delete a single instuction or an entire substring within the genome.
* Point
This only works on a single DNA instruction, subtituting it with another DNA instruction.
* (Revert)
Works on a substring within the genome, and revert it: example:
we have "ABCDEFGH", the substring "BCD" will mutate and we get "DCB", the entire genome then look like this: "ADCBEFGH"Why:
This opens up for junk DNA or DNA we have never seen before in DB, like a ADD in the condition part of the gene, instructions outside genes!
Conditions within conditions! With also get ride of the peculiar genome with spaces in it, I mean empty spaces, I have seen such in DB. (I do not know if it is fixed, never bother to report it, maybe it was an earlier version cant remember).
And we also get a completly new recombinations. You must have in mind, that the gene it self is not a unit anymore (as it is in some arbitary way in the current system), it emerges from the combinations of the DNA instructions.

The phenotype:
The phenotype is the actual behavior or the part of the genome that is executed in the bot.
When a bot is born, the program read through the entire genome and it will decide, through rules we have decide, if there should be a condition within a condition or if it can have a '='-sign within the executed part of the gene or DNA outside the genes should be executed etc. etc. This is all yet to be decided.

If we dont what something (condition within condition) the program just ignores this and only the parts we want gets into the phenotype (the array). The genome is not touched at all, and it is the genome that is passed on to the offspring, but the phenotype that is executed.
It all ends up in a array as it is now.

Why:
This is interconnected with the genome. But it allows us the make certain rules about the execution of the bot. Ex. We dont want a store in the condition part etc., the it is not expressed in the phenotype of the bot, but in is still there, able to act as junk DNA.
It also makes the possiblity that the genome and the phenotype is different from each other, which it is in real life, and just a single insert or delete mutation may be the rise to new and interesting species of bots, because of all the junk DNA it is now possible to have.

The DNA it self
(All in this section, is mainly arbitary picked anything can be used!)

Because DB arciteture is mainly based on direct numbers in the DNA (as opposed to other AL simulations like Avida, but it is difficult to compared DB with Avida),
I could not write an entire DNA system without numbers and still live up to the goals I did set for this system. Therefore the downfalls of this systems lies in the DNA it self.

A sysvar is actualy only a way to make it more readable and a pointer to a specific number, therefore every bot could be written without sysvars.

Say we now have the letter A-Z (26) and the letters a-z (26) as symbols, for different DNA instruction.

The letters A-J is the numbers 0-9. The letter Zis the flow command cond, Y is start, X is < and n is a seperator for numbers, then we could have something like:

--- Code: ---cond
10 50 >
start
--- End code ---
this will end up into something like: BAnFAXY

I hope you get the idea.

But say we have the following genome (in DB language):

--- Code: ---cond
10 add 50 >
start
--- End code ---

then phenotype will be (again in DB language)::

--- Code: ---cond
10 50 >
start
--- End code ---
(If we what no add instruction in the condition part)

The again we could have a-j is the numbers *0-*9, and we could decide that the first number is what the number actual is, so aBC is *012 = *12 etc.

The downfalls:
The main downfall I can see in this system is the DNA. The numbers can change dramaticly without logic, a '0001' could mutate to become 9001, but then again maybe this could help evolution further, I dont know, and it is the natural selection / the evolutions job to find the fittest! And remember that not all mutations are good for the survival of the organism.

This system is not perfect, and if implemented it could endup showing that this system really sucks.

I have relied heavily on a few books I have read about the topic and my own experince with developing AL simulations.

Overall you can still write the old code, and use old bots, because we will creat a small routine to convert the code to the DNA instructions so the program understands it.

Please comment.

shvarz:
I like it. It goes along the lines of junk DNA that Nums was proposing a while back, just gives it a more technical side. Here is what I wrote back then:

--- Quote ---The way I see it, the system should work like this:

we have a "gene execution" flag that can be set to 0, +1, or -1

true condition sets the flag to +1
false condition sets the flag to -1

- when flag is set to +1, program scans for first available "start" and executes the gene, flag is set to 0

- when flag is set to -1, program scans for first available "else" and executes commands after it, then sets flag to 0

- when flag is set to 0, program does not execute neither "start" nor "else", just scans for the next "condition"

This way we can have "cond", "start" and "else" parts in any order we want.
--- End quote ---

The idea of substituting the genotype with phenotype is good, as it would save the processor cycles. Also, it would make the phenotype much more readable.

The "string vs array" thing is fine with me. This is purely a programmer's issue that should not have major effects on the way the program works.

PurpleYouko:
I like it too.
Particularly that the genotype will contain possible junk DNA but for the life of the robot it will only see the phenotype which will be the executable code presently used in DB.

We will obviously have to make it work such that saving the robot's DNA actually saves the file as the genotype version. We will probably need to make a small stand-alone decoder utility also, so that we can convert genotype to phenotype and back while coding a robot.

Might even make file transfer quicker and more efficient since the text file of the genotype will be considerably smaller than the existing DNA file.

I would also like to see the idea that Shvarz outlined above, implemented. As he said, this would allow us to have a much less formalized structure of DNA writing that could lead to some very interesting behaviours.

Good start B)

Numsgil:
Okay, this is me just listing the major points to help clarify it:

Current Version:
DNA is an array of elements that are of the type:

type as integer
value as integer

where type is wether it's a number, control command, etc. and value is which particular type or control command it is.

Greven's idea:

DNA is a string of characters where each letter is a DNA element, where type and value are both determined by the same value.

Okay, now my critique based on the above summary:

1. If each command is based on a letter, then you are limited to 48 letters. Or 256 (actually less, since not all character codes are printable) if you use all possible character codes. I don't know if that's a problem now, but if it ever does become a problem (we add more stuff to the language), then we are in a huge way in trouble. The only way to fix it would be to change to either two character codes (which is basically what it is now) or change to an integer array (basically what it is now).

2. You said that in the current version each gene is an arbitrary unit to the program. However, this is an upper level distinction. In the bowels of the program it's just an array.

3. Using a string makes manipulation of the DNA easier (strings have built in insertion and deletion routines).

I see your idea as more a paradigm to how to approach the existing DNA structure in DB than a reason to drastically change the basic structure. That is, we keep the array of two integer elements.

We get rid of distinctions between conditions, bodies, and outside of bodies. That's a good idea. We, as you say, have major types of mutations:
* Insertion
* Deletion
* Single Point Change
* Duplication(I don't know that reversal is either realistic or particularly useful, but we can argue that point seperately.)

(Note that deletion and duplication should be allowed to work on large tracks of DNA as well).

However, we should have a subsection within each that controls what type of command (the type of the DNA element) can be inserted/deleted/changed (you, the user, don't have to mess with this if you don't want, unless you want greater control over the mutations).

shvarz:
OK, I have no idea what all those arrays and strings and what-not.

I'll just add that reversion is a very reasonable type of mutations.

Navigation

[0] Message Index

[#] Next page

Go to full version