Author Topic: Genetic Distance  (Read 4715 times)

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genetic Distance
« on: December 06, 2007, 10:45:51 AM »
Every bot has a guarenteed unique ID (guarenteed unique in a single sim.  Not necessarily guarenteed uinique in teleporter connected multi-sims like internet mode.  Not yet that is.  I'll work on it.).

For every extant bot I now maintain an orderred ancestor list of unique IDs for the last 100 ancestors and the number of mutations each ancestor had at the time it spawned the next descendent in the line leading to the extant bot.

I have implemented some routines internally that can find the most recent common ancestor of any two extant bots and determine the gentic distance between them.  (Well, okay.  Not actually genetic distance per se.  I'm not actually comparing the genomes and determining how they differ.   Rather, by distance here I mean the number of mutations that seperate them on both lines of descent.  I walk up one side of the phylogeny and down the other and add up number of mutations at each generation.  I have no way to know if those mutations impacted expression or not, were in coding regions or not, were large or small, etc.)  Also, viruses complicate things as does non-asexual reproduction, but lets put those aside for a momnet.

I'm soliciting suggestions as to what to do with this information now that I have it.  What do people want to know or what features do people want to see that leverages this?

Fair warning, some things may be computationally expensive.  I had thought of a graph that showed for each species the maximum genetic distance within the species.  This number climbing from a mean might indicate a speciation event.  But this graph would involve m(n^2) genetic distance compares (I think) where m is the number of speceis and n is the number of individuals/species for every data point.  I'm working on ideas to short cut this, but stiil, yuck.

Another thing I'm working towards is automatic species forking based on genetic distance.  If someone wants to do some deep thinking about when and how to effeciently recognize actual speciation, that would be helpful.

Other ideas are also welcome.  Things that operate from a single bot will be much more effecient than species wide graphs.  e.g. a graph that charts the average genetic distance to the oldest bot of a species is way way more effeceint than the one I mention above.  

And yes, I have plans to dump this to a CSV file for massaging in Excel, etc.
Many beers....

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genetic Distance
« Reply #1 on: December 06, 2007, 12:43:47 PM »
A graph tree would be great.  You wouldn't need to update it other than when the user requests it.  At the top of the tree (bottom?) is the most recent common ancestor for the species.  As you travel up, you move forward in time and the tree branches to show new, well, branches in the evolutionary tree.

If you recorded the lineages of dead bots too (not sure how practical this is, though) you could demonstrate evolutionary dead ends.  But I won't hold my breath for this part.

The thickness of a branch of the tree would represent the number of individuals who trace their ancestry through that branch.  Or the amount of energy gathered by individuals who trace their ancestry, preferably, but this information probably isn't available.

This would allow an easy to interpret graphical representation of the evolution of your bots.

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genetic Distance
« Reply #2 on: December 06, 2007, 12:58:07 PM »
Quote from: Numsgil
A graph tree would be great....

Ooh Ooh, I like that very much.  Probably post 2.44, but probably the first thing I work on in January.
Many beers....

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Genetic Distance
« Reply #3 on: December 06, 2007, 02:30:50 PM »
Of course the most obvious thing is to automate the tracking of speciation. I don't think your genetic distance is much help, however, because you don't know whether these mutations are just drift or were selected for. And without sexual reproduction genetic distance does not matter for defining species. The mutation counter already tracks accumulation of mutations. I think that just from the practical point of view we just need to know if there are distinct groups of bots co-existing in a sim. Also for vanity reasons I'd like to see if a bot that evolved in my sim spreads through other sims  

So, I think the easiest thing to do is just create new species when bots don't have a common ancestor for N generations. We can start with 100 and see how it goes. I'm sure this will result in speciation, simply because sims run at different speeds.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline Peter

  • Bot God
  • *****
  • Posts: 1177
    • View Profile
Genetic Distance
« Reply #4 on: December 06, 2007, 03:11:21 PM »
You mean something like this, screenshots of DB2.1 and at the bottom you see some text about a database file generated by DarwinBots and you see a graph made in excel, has DB made in earlier versions a database file?
To me it seems now a little like we're reinventing the wheel, so where is the database?
« Last Edit: December 06, 2007, 03:13:51 PM by Peter »
Oh my god, who the hell cares.

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genetic Distance
« Reply #5 on: December 06, 2007, 03:12:23 PM »
Quote from: shvarz
I don't think your genetic distance is much help, however, because you don't know whether these mutations are just drift or were selected for.
So, I'll push back a little bit.  It's true I can't tell drift from selection.   But I think it can tell us something about genome simularity and number of generations of separation between bots or groups of extant bots.   I do think we can draw some conclusions.   For example, when the most recent common ancestor between two extant bots is far enough back that the probability of both of them being alive by chance is sufficiently low, I think it may be safe to conclude that something interesting has happened even if is simply geographic isolation or the founder effect.  

We know nothign today the ancestrial relatedness of bots in an evo sim.  At least this is a first step towards gtting some data.

Quote from: shvarz
And without sexual reproduction genetic distance does not matter for defining species.
I don't get that.  Seems to me genetic distance is the only way to define a species when we are talking about  asexual reproducers since the fuzzy definitions we use for sexually reproducing species such as "can successfully mate with each other" don't apply.

Quote from: shvarz
The mutation counter already tracks accumulation of mutations.
Yes, but how many those mutations are shared in common with other extant bots?  We don't know today.

Quote from: shvarz
I think that just from the practical point of view we just need to know if there are distinct groups of bots co-existing in a sim.
Hopefully this will tell use that.

Quote from: shvarz
So, I think the easiest thing to do is just create new species when bots don't have a common ancestor for N generations. We can start with 100 and see how it goes. I'm sure this will result in speciation, simply because sims run at different speeds.
That is the first blush algorithm yes, but there are details to work out.  What distance do we use to define the members of the new species?   What if there are extant bots in the middle?  I'd really prefer something that leveraged relatedness clumbs of bots.   Hopefully we can see them (if they exist - our simes may simply be too small and simple) given this capability.  My suspision is that in a mixing population, the tree gets pruned pretty short.  Namely that most or all bots in a sim generally share a relatively recent common ancestor unless physically isolated.

And yes, different sim speeds in teleporter connected sims will need to be taken into account.  The distance to designate a new species should probably require at least several times the average number of generations beween migration events.
Many beers....

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genetic Distance
« Reply #6 on: December 06, 2007, 03:20:03 PM »
Quote from: Peter
You mean something like this, screenshots of DB2.1 and at the bottom you see some text about a database file generated by DarwinBots and you see a graph made in excel, has DB made in earlier versions a database file?
To me it seems now a little like we're reinventing the wheel, so where is the database?
It was never a database.  It was a comma seperated file.  The term database is misleading.

The code is still there.  Look at the recording tab on the options dialog.  I've never mucked with it so it may work.  But it may not.

The info it records is limited and in particular does not record parent-offspring relationships nor does it maintain information about relatedness through extinct individuals.
Many beers....

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genetic Distance
« Reply #7 on: December 06, 2007, 03:21:09 PM »
Quote from: Peter
You mean something like this, screenshots of DB2.1 and at the bottom you see some text about a database file generated by DarwinBots and you see a graph made in excel, has DB made in earlier versions a database file?
To me it seems now a little like we're reinventing the wheel, so where is the database?

That's exactly what I'm talking about (well, the graph would look different, but the idea is the same), only the conclusions drawn from that graph are a little naive, I think.  You can still (unless it's been broken by accident) run the program with database logging enabled, it just takes up a lot of space.

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genetic Distance
« Reply #8 on: December 06, 2007, 03:26:23 PM »
Quote from: EricL
and in particular does not record parent-offspring relationships nor does it maintain information about relatedness through extinct individuals.
Actually, I stand corrected.  That info is in there, one line for every bot that ever lived if you want it.
Many beers....

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Genetic Distance
« Reply #9 on: December 06, 2007, 03:39:06 PM »
In the distant future it would be neat for IM if we kept a database on the server.  When a program syncs itself with the database to upload and download bots, it would also update the database with all the bots that have died in its sim since last update.  You could then build phylogenic trees for the whole megasim, examining all sorts of neat data.

Offline shvarz

  • Bot God
  • *****
  • Posts: 1341
    • View Profile
Genetic Distance
« Reply #10 on: December 06, 2007, 07:35:32 PM »
Quote
I don't get that. Seems to me genetic distance is the only way to define a species when we are talking about asexual reproducers since the fuzzy definitions we use for sexually reproducing species such as "can successfully mate with each other" don't apply.

Sorry, you are correct here. I'm getting a bit sick, so I'll blame my lack of judgment on that .  I guess what I meant was that "can successfully mate" is an easy litmus test for genetic divergence. With asexual organisms every mutant is its own species. Which means that we will have to set an artificial limit on how far genetically a bot has to be from another bot to be called a new specie. Which is fine, it should work.

A couple of concerns/questions:

You are going back 100 generations to look for MRCA and then calculate genetic distance from that MRCA. What are you going to do if bots don't have a common ancestor over 100 generations? Automatically call them different species?

How can you deal with gradients of diversity? What if bots A and B are different enough, but there is a bot C that is in between them and is not different from either A or B?  

Anyway, let's try it and see how this works on practice. It's much easier to spot things that work and don't work on something specific, rather than on an abstract idea.

Suggestion: Now bots change color as they mutate. If we implement this system, it would be neat to change bot color only when a new specie is formed. Would make it much easier to track these events.
"Never underestimate the power of stupid things in big numbers" - Serious Sam

Offline EricL

  • Administrator
  • Bot God
  • *****
  • Posts: 2266
    • View Profile
Genetic Distance
« Reply #11 on: December 06, 2007, 07:57:37 PM »
Quote from: shvarz
You are going back 100 generations to look for MRCA and then calculate genetic distance from that MRCA. What are you going to do if bots don't have a common ancestor over 100 generations? Automatically call them different species?
100 is just an arbitrary nubmer I choose.  I could bump it to 1000 or more if needed.  Just takes memory.

I don't know what we should do when no MRCA is found.  Depends on the feature that is utilizing these underlying data structures I guess.  For phylogenetic trees, I'd just have multiple trees.  For speciation, well, yea, I assume the UI will expose some knob that allows the human to specify the speciation distance up to the limit.

Quote from: shvarz
How can you deal with gradients of diversity? What if bots A and B are different enough, but there is a bot C that is in between them and is not different from either A or B?
An excellent question!  I have no idea!  Perhaps we don't speciate unless there is clear clumping of populations with no intermediataries.  My plan is first just to take a look at what there is to see.  Do we see well defined clusters with short distances among them?  Or do we see gradients?  Our sims may just be too small...  

Quote from: shvarz
Suggestion: Now bots change color as they mutate. If we implement this system, it would be neat to change bot color only when a new specie is formed. Would make it much easier to track these events.
I like it.
Many beers....

Offline Sprotiel

  • Bot Destroyer
  • ***
  • Posts: 135
    • View Profile
Genetic Distance
« Reply #12 on: December 07, 2007, 12:14:14 AM »
Last time I was active in DB, I created a script to get phylogenetic trees from saved sims (see this thread). Eric, I believe you should reuse my algorithm, it needs in the average case only O(m*n log(n)) direct comparisons of mutations.

Basically, each node in the phylogenetic tree corresponds to a genome. The node stores the mutation history from its parent to itself and the number of living bots having the corresponding genome. To add a new bot, you compare recursively its mutation history with that of the nodes, starting from the root. When there's a full match with a node, you try to match the bot with the node's children. If there is no match (or no children), you create a new child. In the case where there's only a partial match (I'm not sure it can happen in your setting), you need to create a new node for the common ancestor of the bot and the node being matched. When a bot dies, you decrement the counter and prune nodes if it falls to zero.

Offline Testlund

  • Bot God
  • *****
  • Posts: 1574
    • View Profile
Genetic Distance
« Reply #13 on: December 07, 2007, 05:28:20 AM »
Great ideas all over!   Hope to see it!
The internet is corrupt and controlled by criminally minded people.