Code center > Darwinbots Program Source Code

Genetic Distance is slow as hell

(1/3) > >>

Botsareus:
I am running into an interesting problem w/ genetic distance graph. The update is lagging by a good minute with only ~7 robots on the screen. I guess linking it up to the crossover algorithm kills the speed.

Any ideas?


Here is the code, I do not think it will help much, it is really straight forward and works for crossover great:


--- Code: ---
'*** the code (snip) for genetic distance graph ***

   Case GENETIC_DIST_GRAPH
    t = Flex.last(nomi)
   
    For P = 1 To t
      dati(P, GENETIC_DIST_GRAPH) = 0
    Next P
   
    For t = 1 To MaxRobs
      With rob(t)
      If .exist And Not .Corpse Then
        P = Flex.Position(rob(t).FName, nomi)
        For X = t + 1 To MaxRobs
          If rob(X).exist And Not rob(X).Corpse And rob(X).FName = .FName Then ' Must exist and be of same species
            l = DoGeneticDistance(t, X) * 1000
            If l > dati(P, GENETIC_DIST_GRAPH) Then dati(P, GENETIC_DIST_GRAPH) = l
          End If
        Next X
      End If
      End With
    Next t


'*** the public function links up to graphs ***

Public Function DoGeneticDistance(r1 As Integer, r2 As Integer) As Single
Dim t As Integer
'Step1 Create block2 from robots
      Dim dna1() As block2
      Dim dna2() As block2

      ReDim dna1(UBound(rob(r1).DNA))
      For t = 0 To UBound(dna1)
       dna1(t).tipo = rob(r1).DNA(t).tipo
       dna1(t).value = rob(r1).DNA(t).value
      Next
     
      ReDim dna2(UBound(rob(r2).DNA))
      For t = 0 To UBound(dna2)
       dna2(t).tipo = rob(r2).DNA(t).tipo
       dna2(t).value = rob(r2).DNA(t).value
      Next

'Step2 Figure out genetic distance
iinc = 0
FindLongestSequences dna1, dna2, 0, UBound(dna1), 0, UBound(dna2)
DoGeneticDistance = GeneticDistance(dna1, dna2)
End Function

'*** The genetic distance function iteself ***

Private Function GeneticDistance(ByRef rob1() As block2, ByRef rob2() As block2) As Single
Dim diffcount As Integer
Dim a As Integer
For a = 0 To UBound(rob1)
If rob1(a).match = 0 Then diffcount = diffcount + 1
Next
For a = 0 To UBound(rob2)
If rob2(a).match = 0 Then diffcount = diffcount + 1
Next
GeneticDistance = diffcount / (UBound(rob1) + UBound(rob2) + 2)
End Function

'*** The actual cross over figure out function ***

'si = start index, ei = end index, iinc = layer
Private Sub FindLongestSequences(ByRef rob1() As block2, ByRef rob2() As block2, si1 As Integer, ei1 As Integer, si2 As Integer, ei2 As Integer)
'Step1 What index range is smaller?
Dim searchlen As Integer
searchlen = ei1 - si1
If ei2 - si2 < searchlen Then searchlen = ei2 - si2
'Step2 Recrusivelly sweep from largest to shortest searchlen until match is found
Dim mylen As Integer
For mylen = (searchlen + 1) To 1 Step -1
    'Step2A The sweep itself
    Dim sweep1 As Integer
    Dim sweep2 As Integer
   
    For sweep1 = si1 To ei1 - (mylen - 1)
        For sweep2 = si2 To ei2 - (mylen - 1)
            'the match algo
            Dim lenloop As Integer
            Dim allmatch As Boolean 'are all values the same for this sweep?
            allmatch = True
            For lenloop = 0 To mylen - 1
                If rob1(lenloop + sweep1).tipo <> rob2(lenloop + sweep2).tipo Or rob1(lenloop + sweep1).value <> rob2(lenloop + sweep2).value Then
                    allmatch = False
                    Exit For
                End If
            Next
            If allmatch Then
            'match is found, goto step3
                iinc = iinc + 1
                For lenloop = 0 To mylen - 1
                    rob1(lenloop + sweep1).match = iinc
                    rob2(lenloop + sweep2).match = iinc
                Next
                GoTo step3
            End If
        Next
    Next
Next
Exit Sub
step3:
'find lefthand subsequance
If sweep1 > si1 And sweep2 > si2 Then FindLongestSequences rob1, rob2, si1, sweep1 - 1, si2, sweep2 - 1
'find righthand subsequance
If sweep1 + (mylen - 1) < ei1 And sweep2 + (mylen - 1) < ei2 Then FindLongestSequences rob1, rob2, sweep1 + mylen, ei1, sweep2 + mylen, ei2
End Sub

--- End code ---

edit:

I even tried adding the (.LastMut + .Mutations) > 0 condition to all robots checked, that did not help much, and besides there are ways to load two different DNA files with the same file name.

Numsgil:
Have you tried profiling it?  It's a semi-expensive algorithm, but it shouldn't lag that bad.

Botsareus:
I don't think I can profile an isolated function with sleepy. Also, I know that the problem is in FindLongestSequances in robots the size of 'theone' or greater. It works exponentially faster for smaller robots.

Here is my temporary solution:


--- Code: ---    t = Flex.last(nomi)
    For P = 1 To t
      dati(P, GENETIC_DIST_GRAPH) = 0
    Next P
    GraphCounter.Visible = True
    GraphCounter = "Updating Graph 0%"
    DoEvents
    Dim mymod As Integer
    mymod = MaxRobs ^ 0.75
    For t = 1 To MaxRobs
      With rob(t)
      If .exist And Not .Corpse And (t Mod mymod = 0 Or dati(P, GENETIC_DIST_GRAPH) = 0) Then
        P = Flex.Position(rob(t).FName, nomi)
       
        For x = t + 1 To MaxRobs
          If rob(x).exist And Not rob(x).Corpse And rob(x).FName = .FName And (t Mod mymod = 0 Or dati(P, GENETIC_DIST_GRAPH) = 0) Then    ' Must exist and be of same species
            l = DoGeneticDistance(t, x) * 1000
            If l > dati(P, GENETIC_DIST_GRAPH) Then dati(P, GENETIC_DIST_GRAPH) = l
          End If
                   GraphCounter = "Updating Graph " & Int(t / MaxRobs * 100) & "." & Int(x / MaxRobs * 99) & "%"
                   DoEvents
        Next x
       
      End If
      End With
    Next t
    GraphCounter.Visible = False

--- End code ---

I am using 'mymod' to cut down on the number of checks at expense of accuracy.

I was hoping you can think of something better based on your statistics knowledge as I am aware that you have a masters in mathematics.

edit:

(I set the point mutations to 50 for these tests to emulate the worst case scenario)

Numsgil:
You should be able to time an isolated function by searching for it in the sleepy result, and then viewing the children times.  I haven't used sleepy very much so I'm afraid I won't be much help on that front, but I'm pretty sure it's possible.

I don't actually have a degree :/  Never finished.

In terms of the algorithm, try something like Levenshtein distance.  It might be what we're doing already basically, but that's the canonical method for this sort of thing.  If you notice in the wiki article, to get reasonable speed you need to memoize (which is a fancy CS word for caching results that you already calculated).  That might be the thing we aren't doing right now that would make this fast. 

Also, you don't want to be calculating the distance between all pairs of bots, since this is O(n^2), which is way too slow for a graph that doesn't change the sim at all.  Calculate the distance from the unmutated base species, or something like that.  Also, the distance only changes when a bot mutates, so you can cache the results across frames for most bots.

Botsareus:
I was actually thinking about this in almost the same way.
Here is my current idea:

Calculate max distance only for robots that acquired Dnalength / 10 mutations since the last check, otherwise use the old max distance. I am still going to need to ballpark that '10' value. This will add two more variables to the robot structure but I hope I will get better speed as the result.


I am going to try this solution over the weekend and let you know if it somewhat works.



--- Quote ---Levenshtein distance
--- End quote ---

I see a couple of problems w/ this:

1.) This will mean permanently expending the DNA to include markers for all robots.
1a.) How will we save these?
1b.) Now I am slowing down the simulation under any conditions.

2.) I am not seeing an algorithm in my head that will accurately correct for mutations to memorized structures, especially because I am planning to make translocation and amplification work.

Navigation

[0] Message Index

[#] Next page

Go to full version