Author Topic: Genome structure  (Read 3414 times)

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genome structure
« on: December 02, 2006, 12:50:33 AM »
Genes do not exist independently in a vaccum, in DB or in biology.  The structure of the genome - that it consists of genes, that those genes have an order and proximity, that some genes are adjacent while others are far apart - that structure matters.

An end statement can spontatiously mutate into the middle of a genome.  Which genes just happen to come after it will no longer get expressed.  The postion of genes in the genome matters.

Mutations can duplicate or delete sequences of base pairs.  Adjacent genes or sections there of can get moved, copied or deleted together.  A genes' environment includes other genes.  Gene locality relative to other genes in the genome matters.

I think it is important in the DB genome structure to allow for small genotypic mutations to have the potential to have a large phenotypic impact.  Today, we have the ability for a single point mutation to cause a premature End to genome execution, a "clipping" of all the genes that might follow the End base pair sequence.  What we don't have is an equivalent "Begin" base pair sequence.

If we created a Begin base pair and modifed the DNA execution logic appropriately, the result would esentially be Introns and Exons.  Exons are coding portions of a genome.  The genes in Exons get expressed I.e. they get executed.  Introns are non-coding sequences, the so-called 'Junk DNA'.  Introns don't get expressed.

Junk DNA is important.  Genes get duplicated into Junk DNA without harm to the resulting phenotype since the copy does not get expressed.  They get mutated over time without getting expressed.  Then down the road, a sudden mutation can turn on that gene or a highly modified version originally descended from a copy of a working gene, but mutated over time so it now does something quite different - or even a whole sequence of genes resulting in large and dramatic phenotypic changes due to small genotype changes such as the deletion of an End base pair sequence.

By allowing for multiple Begin and End base pair sequences in a DB genome, we would be allowing for multiple independent Exons sections of a genome - those between Begin and End base pairs.  They would be seperated by non-coding Intron regions - those between End and Begin base pairs.  Simple mutations could have dramatic impact.

This concept of contigious genome sections is important.  I think we will need to build upon it in several ways: using it as a unit for one level of mutation probability encoding within the genome itself and perhaps tying in copy and deletion mutation probability so that they are more likely to begin and end on exon or intron boundaries.

Comments?
Many beers....

Offline Jez

  • Bot Overlord
  • ****
  • Posts: 788
    • View Profile
Genome structure
« Reply #1 on: December 02, 2006, 02:22:21 AM »
That could be fun, it seems sensible, if mutations can add or remove 'end' statements that they can do the same for 'begin' statements. It should also allow greater diversity of mutating bots within a sim.
If you try and take a cat apart to see how it works, the first thing you have in your hands is a non-working cat.
Douglas Adams

Offline Henk

  • Bot Destroyer
  • ***
  • Posts: 110
    • View Profile
Genome structure
« Reply #2 on: December 02, 2006, 03:13:23 AM »
Sounds like a great idea!

Do take in mind that not all organisms have introns and exons, that's a eukariotic thing. Prokaryotes can transcribe multiple genes (and thus multiple proteins) from a single mRNA.
So if this is a -let's make DB more realistic- decision then take that in mind IMO.

But then again Shvarz is problably more of an expert in this matter...
« Last Edit: December 02, 2006, 03:13:36 AM by Henk »
cond
*.DBbugs 0 =
start
.rejoice inc
stop

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genome structure
« Reply #3 on: December 02, 2006, 04:14:04 AM »
Quote from: EricL
Today, we have the ability for a single point mutation to cause a premature End to genome execution, a "clipping" of all the genes that might follow the End base pair sequence.

I don't think this is true, unless you've changed it, and I don't think I've ever seen you play with the mutations code so I don't think you have.

end is a relic from the old days when you needed an end to tell the program to stop reading in a DNA file.  After 2.4, it became a member of the Master Flow commands.  In fact, it is the only master flow command, and it existed primarily for backwards compatibility.  The idea was that master flow commands weren't DNA.  They were representations of the physical DNA shape.  MetaDNA information.  Originally I was going to have chromosomes' start and end statements also be master flow commands, but I've since abandoned the idea in favor of codules and physically seperate DNA strands in memory.

That issue aside, what you're suggesting is already more or less implemented with the way bots can use their flow commands.  Parts of genes that end up between a stop and a cond or start command don't get executed.  They are effectively junk DNA.  Also, while personally I think junk DNA has an important role in the preservation of past genetic information for future generations (although I think this role is more convoluted than a simple time capsule type system), mainstream biology does not.  As shvarz pointed out before (and probably will again), junk DNA is too prone to point mutations that entirely scramble any semblance of its original coding.  And since natural selection isn't operating on it, these mutations don't go away.  They just accumulate.

All that said, I can see some times when turning off whole sections of DNA through a mutation might have warrant.  Which is why I'm not against the idea of end statements being mutable and begin statements existing and also being mutable.  I just don't think in the end it adds anything fundamental that we don't already have.
« Last Edit: December 02, 2006, 04:15:05 AM by Numsgil »

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genome structure
« Reply #4 on: December 02, 2006, 12:47:32 PM »
Quote from: Numsgil
I don't think this is true, unless you've changed it, and I don't think I've ever seen you play with the mutations code so I don't think you have.
You are correct.  ChangeDNA() does mutate the type of base pairs but TipoDetok() does not contain a case for type 10 - the end sequence (it does for all other types).  I stand corrected.  Mutations cannot produce an End.


Quote from: Numsgil
That issue aside, what you're suggesting is already more or less implemented with the way bots can use their flow commands.  Parts of genes that end up between a stop and a cond or start command don't get executed.  They are effectively junk DNA.

IMHO, we need to have the ability for whole genes - including Cond and Start and Stop base pair sequences - to appear in noncoding regions.  What we have today is insufficient as the first Cond is treated as the beginning of a coding region.  Whole genes cannot appear in noncoding regions today.

Quote from: Numsgil
Also, while personally I think junk DNA has an important role in the preservation of past genetic information for future generations (although I think this role is more convoluted than a simple time capsule type system), mainstream biology does not.  As shvarz pointed out before (and probably will again), junk DNA is too prone to point mutations that entirely scramble any semblance of its original coding.  And since natural selection isn't operating on it, these mutations don't go away.  They just accumulate.
Shavarz is correct in part.  For example, intron mutations in processed pseudogenes are used for determining genetic distance between species.  But it is more complicated than this.  Retrotransposons for example can transcriibe portions of introns into exons via transcription to RNA and then via reverse transcription back to DNA.  This happens all the time.  Junk DNA is a repository for genetic sequences that can be and sometiems are inserted into coding regions of the DNA.  What I suggest is similar.  A sequence of bairs pairs gets turned off for a while.  Mutations occur which are not subjkect to selection.  Then that sequence or some portion there of gets turned back into a coding region.  What fun!

Quote from: Numsgil
All that said, I can see some times when turning off whole sections of DNA through a mutation might have warrant.  Which is why I'm not against the idea of end statements being mutable and begin statements existing and also being mutable.  I just don't think in the end it adds anything fundamental that we don't already have.

As above, I beleive that whole genes and sequences of genes need to be able to exist in noncoding regions and also that we should tie one aspect of mutation probability - one that is exposed in the genome and subject to selection itself - to this level of genome structure. ( I think we should have other levels as well, down to the indvidiual base pair sequence).
« Last Edit: December 02, 2006, 01:01:08 PM by EricL »
Many beers....

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genome structure
« Reply #5 on: December 02, 2006, 01:18:06 PM »
Quote from: Henk
Do take in mind that not all organisms have introns and exons, that's a eukariotic thing. Prokaryotes can transcribe multiple genes (and thus multiple proteins) from a single mRNA.
So if this is a -let's make DB more realistic- decision then take that in mind IMO.

I didn't know that.  Cool.

My motivations are less about paralling biology or making things more realistic and more about insuring we have all the evolvability mechanisms we need in place but making the same decisions the our ancestors did seems reasonable sicce I assume we all one day want to see viritual organisms of similar complexity.
Many beers....

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Genome structure
« Reply #6 on: December 02, 2006, 02:48:11 PM »
Quote
Prokaryotes can transcribe multiple genes (and thus multiple proteins) from a single mRNA.

That would be "translate" not "transcribe".

Just being anal.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Henk

  • Bot Destroyer
  • ***
  • Posts: 110
    • View Profile
Genome structure
« Reply #7 on: December 02, 2006, 05:57:02 PM »
Quote from: shvarz
That would be "translate" not "transcribe".

Just being anal.

Yeah whatever  I allways mix those terms up  I really shouldn't though as I've got a cel biology/genetics exam next week
cond
*.DBbugs 0 =
start
.rejoice inc
stop

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genome structure
« Reply #8 on: December 03, 2006, 12:12:29 AM »
Quote from: shvarz
That would be "translate" not "transcribe".

Just being anal.

Wow, that is really anal.  Congratulations!