Author Topic: Genetic Distance is slow as hell  (Read 5256 times)

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Genetic Distance is slow as hell
« on: April 02, 2013, 02:12:01 PM »
I am running into an interesting problem w/ genetic distance graph. The update is lagging by a good minute with only ~7 robots on the screen. I guess linking it up to the crossover algorithm kills the speed.

Any ideas?


Here is the code, I do not think it will help much, it is really straight forward and works for crossover great:

Code: [Select]

'*** the code (snip) for genetic distance graph ***

   Case GENETIC_DIST_GRAPH
    t = Flex.last(nomi)
   
    For P = 1 To t
      dati(P, GENETIC_DIST_GRAPH) = 0
    Next P
   
    For t = 1 To MaxRobs
      With rob(t)
      If .exist And Not .Corpse Then
        P = Flex.Position(rob(t).FName, nomi)
        For X = t + 1 To MaxRobs
          If rob(X).exist And Not rob(X).Corpse And rob(X).FName = .FName Then ' Must exist and be of same species
            l = DoGeneticDistance(t, X) * 1000
            If l > dati(P, GENETIC_DIST_GRAPH) Then dati(P, GENETIC_DIST_GRAPH) = l
          End If
        Next X
      End If
      End With
    Next t


'*** the public function links up to graphs ***

Public Function DoGeneticDistance(r1 As Integer, r2 As Integer) As Single
Dim t As Integer
'Step1 Create block2 from robots
      Dim dna1() As block2
      Dim dna2() As block2

      ReDim dna1(UBound(rob(r1).DNA))
      For t = 0 To UBound(dna1)
       dna1(t).tipo = rob(r1).DNA(t).tipo
       dna1(t).value = rob(r1).DNA(t).value
      Next
     
      ReDim dna2(UBound(rob(r2).DNA))
      For t = 0 To UBound(dna2)
       dna2(t).tipo = rob(r2).DNA(t).tipo
       dna2(t).value = rob(r2).DNA(t).value
      Next

'Step2 Figure out genetic distance
iinc = 0
FindLongestSequences dna1, dna2, 0, UBound(dna1), 0, UBound(dna2)
DoGeneticDistance = GeneticDistance(dna1, dna2)
End Function

'*** The genetic distance function iteself ***

Private Function GeneticDistance(ByRef rob1() As block2, ByRef rob2() As block2) As Single
Dim diffcount As Integer
Dim a As Integer
For a = 0 To UBound(rob1)
If rob1(a).match = 0 Then diffcount = diffcount + 1
Next
For a = 0 To UBound(rob2)
If rob2(a).match = 0 Then diffcount = diffcount + 1
Next
GeneticDistance = diffcount / (UBound(rob1) + UBound(rob2) + 2)
End Function

'*** The actual cross over figure out function ***

'si = start index, ei = end index, iinc = layer
Private Sub FindLongestSequences(ByRef rob1() As block2, ByRef rob2() As block2, si1 As Integer, ei1 As Integer, si2 As Integer, ei2 As Integer)
'Step1 What index range is smaller?
Dim searchlen As Integer
searchlen = ei1 - si1
If ei2 - si2 < searchlen Then searchlen = ei2 - si2
'Step2 Recrusivelly sweep from largest to shortest searchlen until match is found
Dim mylen As Integer
For mylen = (searchlen + 1) To 1 Step -1
    'Step2A The sweep itself
    Dim sweep1 As Integer
    Dim sweep2 As Integer
   
    For sweep1 = si1 To ei1 - (mylen - 1)
        For sweep2 = si2 To ei2 - (mylen - 1)
            'the match algo
            Dim lenloop As Integer
            Dim allmatch As Boolean 'are all values the same for this sweep?
            allmatch = True
            For lenloop = 0 To mylen - 1
                If rob1(lenloop + sweep1).tipo <> rob2(lenloop + sweep2).tipo Or rob1(lenloop + sweep1).value <> rob2(lenloop + sweep2).value Then
                    allmatch = False
                    Exit For
                End If
            Next
            If allmatch Then
            'match is found, goto step3
                iinc = iinc + 1
                For lenloop = 0 To mylen - 1
                    rob1(lenloop + sweep1).match = iinc
                    rob2(lenloop + sweep2).match = iinc
                Next
                GoTo step3
            End If
        Next
    Next
Next
Exit Sub
step3:
'find lefthand subsequance
If sweep1 > si1 And sweep2 > si2 Then FindLongestSequences rob1, rob2, si1, sweep1 - 1, si2, sweep2 - 1
'find righthand subsequance
If sweep1 + (mylen - 1) < ei1 And sweep2 + (mylen - 1) < ei2 Then FindLongestSequences rob1, rob2, sweep1 + mylen, ei1, sweep2 + mylen, ei2
End Sub

edit:

I even tried adding the (.LastMut + .Mutations) > 0 condition to all robots checked, that did not help much, and besides there are ways to load two different DNA files with the same file name.
« Last Edit: April 02, 2013, 02:53:43 PM by Botsareus »

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #1 on: April 02, 2013, 10:54:36 PM »
Have you tried profiling it?  It's a semi-expensive algorithm, but it shouldn't lag that bad.

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #2 on: April 03, 2013, 11:44:21 AM »
I don't think I can profile an isolated function with sleepy. Also, I know that the problem is in FindLongestSequances in robots the size of 'theone' or greater. It works exponentially faster for smaller robots.

Here is my temporary solution:

Code: [Select]
    t = Flex.last(nomi)
    For P = 1 To t
      dati(P, GENETIC_DIST_GRAPH) = 0
    Next P
    GraphCounter.Visible = True
    GraphCounter = "Updating Graph 0%"
    DoEvents
    Dim mymod As Integer
    mymod = MaxRobs ^ 0.75
    For t = 1 To MaxRobs
      With rob(t)
      If .exist And Not .Corpse And (t Mod mymod = 0 Or dati(P, GENETIC_DIST_GRAPH) = 0) Then
        P = Flex.Position(rob(t).FName, nomi)
       
        For x = t + 1 To MaxRobs
          If rob(x).exist And Not rob(x).Corpse And rob(x).FName = .FName And (t Mod mymod = 0 Or dati(P, GENETIC_DIST_GRAPH) = 0) Then    ' Must exist and be of same species
            l = DoGeneticDistance(t, x) * 1000
            If l > dati(P, GENETIC_DIST_GRAPH) Then dati(P, GENETIC_DIST_GRAPH) = l
          End If
                   GraphCounter = "Updating Graph " & Int(t / MaxRobs * 100) & "." & Int(x / MaxRobs * 99) & "%"
                   DoEvents
        Next x
       
      End If
      End With
    Next t
    GraphCounter.Visible = False

I am using 'mymod' to cut down on the number of checks at expense of accuracy.

I was hoping you can think of something better based on your statistics knowledge as I am aware that you have a masters in mathematics.

edit:

(I set the point mutations to 50 for these tests to emulate the worst case scenario)
« Last Edit: April 03, 2013, 01:23:11 PM by Botsareus »

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #3 on: April 03, 2013, 04:12:28 PM »
You should be able to time an isolated function by searching for it in the sleepy result, and then viewing the children times.  I haven't used sleepy very much so I'm afraid I won't be much help on that front, but I'm pretty sure it's possible.

I don't actually have a degree :/  Never finished.

In terms of the algorithm, try something like Levenshtein distance.  It might be what we're doing already basically, but that's the canonical method for this sort of thing.  If you notice in the wiki article, to get reasonable speed you need to memoize (which is a fancy CS word for caching results that you already calculated).  That might be the thing we aren't doing right now that would make this fast. 

Also, you don't want to be calculating the distance between all pairs of bots, since this is O(n^2), which is way too slow for a graph that doesn't change the sim at all.  Calculate the distance from the unmutated base species, or something like that.  Also, the distance only changes when a bot mutates, so you can cache the results across frames for most bots.
« Last Edit: April 03, 2013, 04:15:08 PM by Numsgil »

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #4 on: April 04, 2013, 04:36:34 PM »
I was actually thinking about this in almost the same way.
Here is my current idea:

Calculate max distance only for robots that acquired Dnalength / 10 mutations since the last check, otherwise use the old max distance. I am still going to need to ballpark that '10' value. This will add two more variables to the robot structure but I hope I will get better speed as the result.


I am going to try this solution over the weekend and let you know if it somewhat works.



Quote
Levenshtein distance

I see a couple of problems w/ this:

1.) This will mean permanently expending the DNA to include markers for all robots.
1a.) How will we save these?
1b.) Now I am slowing down the simulation under any conditions.

2.) I am not seeing an algorithm in my head that will accurately correct for mutations to memorized structures, especially because I am planning to make translocation and amplification work.
« Last Edit: April 04, 2013, 05:36:41 PM by Botsareus »

Offline Numsgil

  • Administrator
  • Bot God
  • *****
  • Posts: 7742
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #5 on: April 04, 2013, 07:22:42 PM »
1.) This will mean permanently expending the DNA to include markers for all robots.
1a.) How will we save these?
1b.) Now I am slowing down the simulation under any conditions.

You shouldn't need any extra data stored with the DNA.  The algorithm just works on arrays of objects (could be characters, could just as easily be DNA base pairs).  All you need is "is-equal" and "is-not-equal".  And if you're only running it on bots that've changed it shouldn't slow the sim down at all.

Quote
2.) I am not seeing an algorithm in my head that will accurately correct for mutations to memorized structures, especially because I am planning to make translocation and amplification work.

Just set a dirty flag when you mutate the DNA.  If the dirty flag is set when you need the genetic distance, recompute it.  Otherwise use the cached value.

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #6 on: April 07, 2013, 10:40:03 AM »
[Small bump]

Sorry guys, I got distracted watching BioShock Infinite on YouTube  :P. I promise I'll have something on Tuesday.

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #7 on: April 09, 2013, 12:34:18 PM »
[bump]
Still a little slow, but I don't think people who are running the one with x50 point mutations will care much about genetic distance.
« Last Edit: April 09, 2013, 01:46:13 PM by Botsareus »

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #8 on: April 13, 2013, 02:21:20 PM »
[bump]

Added another algorithm called 'Simple Genetic Distance' Still about as slow as the first one, but always checks all robots per graph update and a little less accurate. Here it is:

Code: [Select]
Public Function GeneticDistanceSimple(ByVal r1 As Integer, ByVal r2 As Integer) As Single
Dim b As Integer
Dim a As Integer
Dim mlst1() As Integer
Dim mlst2() As Integer
ReDim mlst1(UBound(rob(r1).DNA))
ReDim mlst2(UBound(rob(r2).DNA))
' match list is declared
Do 'loop until end of both DNA

    'special case different dna length
    If a > UBound(rob(r1).DNA) Then
        If mlst2(b) = 0 Then mlst2(b) = -2
        GoTo fine
    End If
    If b > UBound(rob(r2).DNA) Then
        If mlst1(a) = 0 Then mlst1(a) = -2
        GoTo fine
    End If

    If rob(r1).DNA(a).tipo = rob(r2).DNA(b).tipo And rob(r1).DNA(a).value = rob(r2).DNA(b).value Then
        'the data is the same, store -1
        mlst1(a) = -1
        mlst2(b) = -1
    Else
        'lets see if data picks up later
        Dim b2 As Integer
        For b2 = b + 1 To UBound(rob(r2).DNA)
            If rob(r2).DNA(b2).tipo = rob(r1).DNA(a).tipo And rob(r2).DNA(b2).value = rob(r1).DNA(a).value And mlst2(b2) < 1 Then
                'data does pickup later
                mlst1(a) = b2 + 1
                mlst2(b2) = a + 1
                b = b - 1
                GoTo fine
            End If
        Next
        'lets see if data picks up early
        For b2 = b - 1 To 0 Step -1
            If rob(r2).DNA(b2).tipo = rob(r1).DNA(a).tipo And rob(r2).DNA(b2).value = rob(r1).DNA(a).value And mlst2(b2) < 1 Then
                'data does pickup later
                mlst1(a) = b2 + 1
                mlst2(b2) = a + 1
                b = b - 1
                GoTo fine
            End If
        Next
        'the data is different, store -2
        If mlst1(a) < 1 Then mlst1(a) = -2
        If mlst2(b) < 1 Then mlst2(b) = -2
    End If

fine:
    a = a + 1
    b = b + 1
Loop Until a > UBound(rob(r1).DNA) And b > UBound(rob(r2).DNA)

Dim diff As Integer 'holds diff data counter

For a = 0 To UBound(rob(r1).DNA)
    If mlst1(a) = -2 Then
        diff = diff + 1
    ElseIf mlst1(a) > 0 Then 'data was moved
        If a > 0 Then
            If Abs(mlst1(a - 1) - mlst1(a)) > 1 Then 'only store if move unconsequtive
                diff = diff + 1
            End If
        Else 'should we store the move automaticaly?
            diff = diff + 1
        End If
    End If
Next

For a = 0 To UBound(rob(r2).DNA)
    If mlst2(a) = -2 Then
        diff = diff + 1
    ElseIf mlst2(a) > 0 Then 'data was moved
        If a > 0 Then
            If Abs(mlst2(a - 1) - mlst2(a)) > 1 Then 'only store if move unconsequtive
                diff = diff + 1
            End If
        Else 'should we store the move automaticaly?
            diff = diff + 1
        End If
    End If
Next

GeneticDistanceSimple = diff / (UBound(rob(r1).DNA) + UBound(rob(r2).DNA) + 2)
End Function

I check twice r1 <--- ---> r2, and select for the smallest genetic distance, because the algorithm calculates differently based on what order the dna is introduced, left handed or right handed.

It is up to the user if he wants better accuracy per robot or better accuracy for population, both "Simple genetic distance" and "Genetic distance" graphs are included.
« Last Edit: April 13, 2013, 03:35:15 PM by Botsareus »

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #9 on: April 16, 2013, 02:34:35 PM »
It is happening people. I am running my first sexual reproduction sim. I will post it tomorrow and send a SVN package.  8)

edit:

I can not believe what I am looking at! I have an eco-system developing within 55000 cycles.
This is one of the coolest experiments I have ever ran.
« Last Edit: April 16, 2013, 03:39:29 PM by Botsareus »

Offline Peter

  • Bot God
  • *****
  • Posts: 1177
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #10 on: April 16, 2013, 03:41:49 PM »
Is it slow in general or just in sexual reproduction sims?
Oh my god, who the hell cares.

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #11 on: April 16, 2013, 03:47:13 PM »
no, It is only slow every time genetic distance needs to be calculate for two robots. This process takes about 1-3 seconds. So when these guys are reproducing (which is not very often) the screen does periodically lockup for like a second. However, if you want to figure out max genetic distance for all robots (as in the  max. genetic distance graph) it is reasonably slow as hell. Example: if we have 200 robots and we check them all that is 20100 checks, mult by 2 that is 40200 seconds. which is literally 11 hours.

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #12 on: April 16, 2013, 04:00:16 PM »
update:

Just as things began to get interesting the robots fully lost ability to reproduce.
I may have to disable point mutations and increase ageing cost.

edit:

actually disabling aging cost completely may actually have a positive effect. Still looking at this.
« Last Edit: April 16, 2013, 04:38:59 PM by Botsareus »

Offline Peter

  • Bot God
  • *****
  • Posts: 1177
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #13 on: April 16, 2013, 05:01:06 PM »
Ah, I understand it now.  :)

It is up to the user if he wants better accuracy per robot or better accuracy for population, both "Simple genetic distance" and "Genetic distance" graphs are included.
I prefer to have just one graph if they're basically doing the same. It would also seem easier in code maintenance.
Oh my god, who the hell cares.

Offline Botsareus

  • Society makes it all backwards - there is a good reason for that
  • Bot God
  • *****
  • Posts: 4483
    • View Profile
Re: Genetic Distance is slow as hell
« Reply #14 on: April 16, 2013, 06:10:06 PM »
That's the thing, the algorithm is very different for both graphs.
I need the simpler algorithm to generate a genetic distance matrix (later) for an idea I had.
The simpler algorithm is about x40 faster, therefor (based on my example) instead of waiting 11 hours I will only need to wait 16 minutes.
You can not use crossover with the simpler algorithm.
My thinking was since I am adding it anyway, why not make it available to the user?
« Last Edit: April 16, 2013, 06:43:27 PM by Botsareus »