### Author Topic: Genetic Distance is slow as hell  (Read 1550 times)

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Genetic Distance is slow as hell
« on: April 02, 2013, 02:12:01 PM »
I am running into an interesting problem w/ genetic distance graph. The update is lagging by a good minute with only ~7 robots on the screen. I guess linking it up to the crossover algorithm kills the speed.

Any ideas?

Here is the code, I do not think it will help much, it is really straight forward and works for crossover great:

Code: [Select]
`'*** the code (snip) for genetic distance graph ***   Case GENETIC_DIST_GRAPH    t = Flex.last(nomi)        For P = 1 To t      dati(P, GENETIC_DIST_GRAPH) = 0    Next P        For t = 1 To MaxRobs      With rob(t)      If .exist And Not .Corpse Then        P = Flex.Position(rob(t).FName, nomi)        For X = t + 1 To MaxRobs          If rob(X).exist And Not rob(X).Corpse And rob(X).FName = .FName Then ' Must exist and be of same species            l = DoGeneticDistance(t, X) * 1000            If l > dati(P, GENETIC_DIST_GRAPH) Then dati(P, GENETIC_DIST_GRAPH) = l          End If        Next X      End If      End With    Next t'*** the public function links up to graphs ***Public Function DoGeneticDistance(r1 As Integer, r2 As Integer) As SingleDim t As Integer'Step1 Create block2 from robots      Dim dna1() As block2      Dim dna2() As block2      ReDim dna1(UBound(rob(r1).DNA))      For t = 0 To UBound(dna1)       dna1(t).tipo = rob(r1).DNA(t).tipo       dna1(t).value = rob(r1).DNA(t).value      Next            ReDim dna2(UBound(rob(r2).DNA))      For t = 0 To UBound(dna2)       dna2(t).tipo = rob(r2).DNA(t).tipo       dna2(t).value = rob(r2).DNA(t).value      Next'Step2 Figure out genetic distanceiinc = 0FindLongestSequences dna1, dna2, 0, UBound(dna1), 0, UBound(dna2)DoGeneticDistance = GeneticDistance(dna1, dna2)End Function'*** The genetic distance function iteself ***Private Function GeneticDistance(ByRef rob1() As block2, ByRef rob2() As block2) As SingleDim diffcount As IntegerDim a As IntegerFor a = 0 To UBound(rob1)If rob1(a).match = 0 Then diffcount = diffcount + 1NextFor a = 0 To UBound(rob2)If rob2(a).match = 0 Then diffcount = diffcount + 1NextGeneticDistance = diffcount / (UBound(rob1) + UBound(rob2) + 2)End Function'*** The actual cross over figure out function ***'si = start index, ei = end index, iinc = layerPrivate Sub FindLongestSequences(ByRef rob1() As block2, ByRef rob2() As block2, si1 As Integer, ei1 As Integer, si2 As Integer, ei2 As Integer)'Step1 What index range is smaller?Dim searchlen As Integersearchlen = ei1 - si1If ei2 - si2 < searchlen Then searchlen = ei2 - si2'Step2 Recrusivelly sweep from largest to shortest searchlen until match is foundDim mylen As IntegerFor mylen = (searchlen + 1) To 1 Step -1    'Step2A The sweep itself    Dim sweep1 As Integer    Dim sweep2 As Integer        For sweep1 = si1 To ei1 - (mylen - 1)        For sweep2 = si2 To ei2 - (mylen - 1)            'the match algo            Dim lenloop As Integer            Dim allmatch As Boolean 'are all values the same for this sweep?            allmatch = True            For lenloop = 0 To mylen - 1                If rob1(lenloop + sweep1).tipo <> rob2(lenloop + sweep2).tipo Or rob1(lenloop + sweep1).value <> rob2(lenloop + sweep2).value Then                    allmatch = False                    Exit For                End If            Next            If allmatch Then            'match is found, goto step3                iinc = iinc + 1                For lenloop = 0 To mylen - 1                    rob1(lenloop + sweep1).match = iinc                    rob2(lenloop + sweep2).match = iinc                Next                GoTo step3            End If        Next    NextNextExit Substep3:'find lefthand subsequanceIf sweep1 > si1 And sweep2 > si2 Then FindLongestSequences rob1, rob2, si1, sweep1 - 1, si2, sweep2 - 1'find righthand subsequanceIf sweep1 + (mylen - 1) < ei1 And sweep2 + (mylen - 1) < ei2 Then FindLongestSequences rob1, rob2, sweep1 + mylen, ei1, sweep2 + mylen, ei2End Sub`
edit:

I even tried adding the (.LastMut + .Mutations) > 0 condition to all robots checked, that did not help much, and besides there are ways to load two different DNA files with the same file name.
« Last Edit: April 02, 2013, 02:53:43 PM by Botsareus »

#### Numsgil

• Bot God
• Posts: 7713
##### Re: Genetic Distance is slow as hell
« Reply #1 on: April 02, 2013, 10:54:36 PM »
Have you tried profiling it?  It's a semi-expensive algorithm, but it shouldn't lag that bad.

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #2 on: April 03, 2013, 11:44:21 AM »
I don't think I can profile an isolated function with sleepy. Also, I know that the problem is in FindLongestSequances in robots the size of 'theone' or greater. It works exponentially faster for smaller robots.

Here is my temporary solution:

Code: [Select]
`    t = Flex.last(nomi)    For P = 1 To t      dati(P, GENETIC_DIST_GRAPH) = 0    Next P    GraphCounter.Visible = True    GraphCounter = "Updating Graph 0%"    DoEvents    Dim mymod As Integer    mymod = MaxRobs ^ 0.75    For t = 1 To MaxRobs      With rob(t)      If .exist And Not .Corpse And (t Mod mymod = 0 Or dati(P, GENETIC_DIST_GRAPH) = 0) Then        P = Flex.Position(rob(t).FName, nomi)               For x = t + 1 To MaxRobs          If rob(x).exist And Not rob(x).Corpse And rob(x).FName = .FName And (t Mod mymod = 0 Or dati(P, GENETIC_DIST_GRAPH) = 0) Then    ' Must exist and be of same species            l = DoGeneticDistance(t, x) * 1000            If l > dati(P, GENETIC_DIST_GRAPH) Then dati(P, GENETIC_DIST_GRAPH) = l          End If                   GraphCounter = "Updating Graph " & Int(t / MaxRobs * 100) & "." & Int(x / MaxRobs * 99) & "%"                   DoEvents        Next x              End If      End With    Next t    GraphCounter.Visible = False`
I am using 'mymod' to cut down on the number of checks at expense of accuracy.

I was hoping you can think of something better based on your statistics knowledge as I am aware that you have a masters in mathematics.

edit:

(I set the point mutations to 50 for these tests to emulate the worst case scenario)
« Last Edit: April 03, 2013, 01:23:11 PM by Botsareus »

#### Numsgil

• Bot God
• Posts: 7713
##### Re: Genetic Distance is slow as hell
« Reply #3 on: April 03, 2013, 04:12:28 PM »
You should be able to time an isolated function by searching for it in the sleepy result, and then viewing the children times.  I haven't used sleepy very much so I'm afraid I won't be much help on that front, but I'm pretty sure it's possible.

I don't actually have a degree :/  Never finished.

In terms of the algorithm, try something like Levenshtein distance.  It might be what we're doing already basically, but that's the canonical method for this sort of thing.  If you notice in the wiki article, to get reasonable speed you need to memoize (which is a fancy CS word for caching results that you already calculated).  That might be the thing we aren't doing right now that would make this fast.

Also, you don't want to be calculating the distance between all pairs of bots, since this is O(n^2), which is way too slow for a graph that doesn't change the sim at all.  Calculate the distance from the unmutated base species, or something like that.  Also, the distance only changes when a bot mutates, so you can cache the results across frames for most bots.
« Last Edit: April 03, 2013, 04:15:08 PM by Numsgil »

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #4 on: April 04, 2013, 04:36:34 PM »
Here is my current idea:

Calculate max distance only for robots that acquired Dnalength / 10 mutations since the last check, otherwise use the old max distance. I am still going to need to ballpark that '10' value. This will add two more variables to the robot structure but I hope I will get better speed as the result.

I am going to try this solution over the weekend and let you know if it somewhat works.

Quote
Levenshtein distance

I see a couple of problems w/ this:

1.) This will mean permanently expending the DNA to include markers for all robots.
1a.) How will we save these?
1b.) Now I am slowing down the simulation under any conditions.

2.) I am not seeing an algorithm in my head that will accurately correct for mutations to memorized structures, especially because I am planning to make translocation and amplification work.
« Last Edit: April 04, 2013, 05:36:41 PM by Botsareus »

#### Numsgil

• Bot God
• Posts: 7713
##### Re: Genetic Distance is slow as hell
« Reply #5 on: April 04, 2013, 07:22:42 PM »
1.) This will mean permanently expending the DNA to include markers for all robots.
1a.) How will we save these?
1b.) Now I am slowing down the simulation under any conditions.

You shouldn't need any extra data stored with the DNA.  The algorithm just works on arrays of objects (could be characters, could just as easily be DNA base pairs).  All you need is "is-equal" and "is-not-equal".  And if you're only running it on bots that've changed it shouldn't slow the sim down at all.

Quote
2.) I am not seeing an algorithm in my head that will accurately correct for mutations to memorized structures, especially because I am planning to make translocation and amplification work.

Just set a dirty flag when you mutate the DNA.  If the dirty flag is set when you need the genetic distance, recompute it.  Otherwise use the cached value.

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #6 on: April 07, 2013, 10:40:03 AM »
[Small bump]

Sorry guys, I got distracted watching BioShock Infinite on YouTube  . I promise I'll have something on Tuesday.

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #7 on: April 09, 2013, 12:34:18 PM »
[bump]
Still a little slow, but I don't think people who are running the one with x50 point mutations will care much about genetic distance.
« Last Edit: April 09, 2013, 01:46:13 PM by Botsareus »

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #8 on: April 13, 2013, 02:21:20 PM »
[bump]

Added another algorithm called 'Simple Genetic Distance' Still about as slow as the first one, but always checks all robots per graph update and a little less accurate. Here it is:

Code: [Select]
`Public Function GeneticDistanceSimple(ByVal r1 As Integer, ByVal r2 As Integer) As SingleDim b As IntegerDim a As IntegerDim mlst1() As IntegerDim mlst2() As IntegerReDim mlst1(UBound(rob(r1).DNA))ReDim mlst2(UBound(rob(r2).DNA))' match list is declaredDo 'loop until end of both DNA    'special case different dna length    If a > UBound(rob(r1).DNA) Then        If mlst2(b) = 0 Then mlst2(b) = -2        GoTo fine    End If    If b > UBound(rob(r2).DNA) Then        If mlst1(a) = 0 Then mlst1(a) = -2        GoTo fine    End If    If rob(r1).DNA(a).tipo = rob(r2).DNA(b).tipo And rob(r1).DNA(a).value = rob(r2).DNA(b).value Then        'the data is the same, store -1        mlst1(a) = -1        mlst2(b) = -1    Else        'lets see if data picks up later        Dim b2 As Integer        For b2 = b + 1 To UBound(rob(r2).DNA)            If rob(r2).DNA(b2).tipo = rob(r1).DNA(a).tipo And rob(r2).DNA(b2).value = rob(r1).DNA(a).value And mlst2(b2) < 1 Then                'data does pickup later                mlst1(a) = b2 + 1                mlst2(b2) = a + 1                b = b - 1                GoTo fine            End If        Next        'lets see if data picks up early        For b2 = b - 1 To 0 Step -1            If rob(r2).DNA(b2).tipo = rob(r1).DNA(a).tipo And rob(r2).DNA(b2).value = rob(r1).DNA(a).value And mlst2(b2) < 1 Then                'data does pickup later                mlst1(a) = b2 + 1                mlst2(b2) = a + 1                b = b - 1                GoTo fine            End If        Next        'the data is different, store -2        If mlst1(a) < 1 Then mlst1(a) = -2        If mlst2(b) < 1 Then mlst2(b) = -2    End Iffine:    a = a + 1    b = b + 1Loop Until a > UBound(rob(r1).DNA) And b > UBound(rob(r2).DNA)Dim diff As Integer 'holds diff data counterFor a = 0 To UBound(rob(r1).DNA)    If mlst1(a) = -2 Then        diff = diff + 1    ElseIf mlst1(a) > 0 Then 'data was moved        If a > 0 Then            If Abs(mlst1(a - 1) - mlst1(a)) > 1 Then 'only store if move unconsequtive                diff = diff + 1            End If        Else 'should we store the move automaticaly?            diff = diff + 1        End If    End IfNextFor a = 0 To UBound(rob(r2).DNA)    If mlst2(a) = -2 Then        diff = diff + 1    ElseIf mlst2(a) > 0 Then 'data was moved        If a > 0 Then            If Abs(mlst2(a - 1) - mlst2(a)) > 1 Then 'only store if move unconsequtive                diff = diff + 1            End If        Else 'should we store the move automaticaly?            diff = diff + 1        End If    End IfNextGeneticDistanceSimple = diff / (UBound(rob(r1).DNA) + UBound(rob(r2).DNA) + 2)End Function`
I check twice r1 <--- ---> r2, and select for the smallest genetic distance, because the algorithm calculates differently based on what order the dna is introduced, left handed or right handed.

It is up to the user if he wants better accuracy per robot or better accuracy for population, both "Simple genetic distance" and "Genetic distance" graphs are included.
« Last Edit: April 13, 2013, 03:35:15 PM by Botsareus »

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #9 on: April 16, 2013, 02:34:35 PM »
It is happening people. I am running my first sexual reproduction sim. I will post it tomorrow and send a SVN package.

edit:

I can not believe what I am looking at! I have an eco-system developing within 55000 cycles.
This is one of the coolest experiments I have ever ran.
« Last Edit: April 16, 2013, 03:39:29 PM by Botsareus »

#### Peter

• Bot God
• Posts: 1177
##### Re: Genetic Distance is slow as hell
« Reply #10 on: April 16, 2013, 03:41:49 PM »
Is it slow in general or just in sexual reproduction sims?
Oh my god, who the hell cares.

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #11 on: April 16, 2013, 03:47:13 PM »
no, It is only slow every time genetic distance needs to be calculate for two robots. This process takes about 1-3 seconds. So when these guys are reproducing (which is not very often) the screen does periodically lockup for like a second. However, if you want to figure out max genetic distance for all robots (as in the  max. genetic distance graph) it is reasonably slow as hell. Example: if we have 200 robots and we check them all that is 20100 checks, mult by 2 that is 40200 seconds. which is literally 11 hours.

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #12 on: April 16, 2013, 04:00:16 PM »
update:

Just as things began to get interesting the robots fully lost ability to reproduce.
I may have to disable point mutations and increase ageing cost.

edit:

actually disabling aging cost completely may actually have a positive effect. Still looking at this.
« Last Edit: April 16, 2013, 04:38:59 PM by Botsareus »

#### Peter

• Bot God
• Posts: 1177
##### Re: Genetic Distance is slow as hell
« Reply #13 on: April 16, 2013, 05:01:06 PM »
Ah, I understand it now.

It is up to the user if he wants better accuracy per robot or better accuracy for population, both "Simple genetic distance" and "Genetic distance" graphs are included.
I prefer to have just one graph if they're basically doing the same. It would also seem easier in code maintenance.
Oh my god, who the hell cares.

#### Botsareus

• Society makes it all backwards - there is a good reason for that
• Bot God
• Posts: 4483
##### Re: Genetic Distance is slow as hell
« Reply #14 on: April 16, 2013, 06:10:06 PM »
That's the thing, the algorithm is very different for both graphs.
I need the simpler algorithm to generate a genetic distance matrix (later) for an idea I had.
The simpler algorithm is about x40 faster, therefor (based on my example) instead of waiting 11 hours I will only need to wait 16 minutes.
You can not use crossover with the simpler algorithm.
My thinking was since I am adding it anyway, why not make it available to the user?
« Last Edit: April 16, 2013, 06:43:27 PM by Botsareus »