Darwinbots Forum
Code center => Darwinbots Program Source Code => Topic started by: Botsareus on April 02, 2013, 02:12:01 PM
-
I am running into an interesting problem w/ genetic distance graph. The update is lagging by a good minute with only ~7 robots on the screen. I guess linking it up to the crossover algorithm kills the speed.
Any ideas?
Here is the code, I do not think it will help much, it is really straight forward and works for crossover great:
'*** the code (snip) for genetic distance graph ***
Case GENETIC_DIST_GRAPH
t = Flex.last(nomi)
For P = 1 To t
dati(P, GENETIC_DIST_GRAPH) = 0
Next P
For t = 1 To MaxRobs
With rob(t)
If .exist And Not .Corpse Then
P = Flex.Position(rob(t).FName, nomi)
For X = t + 1 To MaxRobs
If rob(X).exist And Not rob(X).Corpse And rob(X).FName = .FName Then ' Must exist and be of same species
l = DoGeneticDistance(t, X) * 1000
If l > dati(P, GENETIC_DIST_GRAPH) Then dati(P, GENETIC_DIST_GRAPH) = l
End If
Next X
End If
End With
Next t
'*** the public function links up to graphs ***
Public Function DoGeneticDistance(r1 As Integer, r2 As Integer) As Single
Dim t As Integer
'Step1 Create block2 from robots
Dim dna1() As block2
Dim dna2() As block2
ReDim dna1(UBound(rob(r1).DNA))
For t = 0 To UBound(dna1)
dna1(t).tipo = rob(r1).DNA(t).tipo
dna1(t).value = rob(r1).DNA(t).value
Next
ReDim dna2(UBound(rob(r2).DNA))
For t = 0 To UBound(dna2)
dna2(t).tipo = rob(r2).DNA(t).tipo
dna2(t).value = rob(r2).DNA(t).value
Next
'Step2 Figure out genetic distance
iinc = 0
FindLongestSequences dna1, dna2, 0, UBound(dna1), 0, UBound(dna2)
DoGeneticDistance = GeneticDistance(dna1, dna2)
End Function
'*** The genetic distance function iteself ***
Private Function GeneticDistance(ByRef rob1() As block2, ByRef rob2() As block2) As Single
Dim diffcount As Integer
Dim a As Integer
For a = 0 To UBound(rob1)
If rob1(a).match = 0 Then diffcount = diffcount + 1
Next
For a = 0 To UBound(rob2)
If rob2(a).match = 0 Then diffcount = diffcount + 1
Next
GeneticDistance = diffcount / (UBound(rob1) + UBound(rob2) + 2)
End Function
'*** The actual cross over figure out function ***
'si = start index, ei = end index, iinc = layer
Private Sub FindLongestSequences(ByRef rob1() As block2, ByRef rob2() As block2, si1 As Integer, ei1 As Integer, si2 As Integer, ei2 As Integer)
'Step1 What index range is smaller?
Dim searchlen As Integer
searchlen = ei1 - si1
If ei2 - si2 < searchlen Then searchlen = ei2 - si2
'Step2 Recrusivelly sweep from largest to shortest searchlen until match is found
Dim mylen As Integer
For mylen = (searchlen + 1) To 1 Step -1
'Step2A The sweep itself
Dim sweep1 As Integer
Dim sweep2 As Integer
For sweep1 = si1 To ei1 - (mylen - 1)
For sweep2 = si2 To ei2 - (mylen - 1)
'the match algo
Dim lenloop As Integer
Dim allmatch As Boolean 'are all values the same for this sweep?
allmatch = True
For lenloop = 0 To mylen - 1
If rob1(lenloop + sweep1).tipo <> rob2(lenloop + sweep2).tipo Or rob1(lenloop + sweep1).value <> rob2(lenloop + sweep2).value Then
allmatch = False
Exit For
End If
Next
If allmatch Then
'match is found, goto step3
iinc = iinc + 1
For lenloop = 0 To mylen - 1
rob1(lenloop + sweep1).match = iinc
rob2(lenloop + sweep2).match = iinc
Next
GoTo step3
End If
Next
Next
Next
Exit Sub
step3:
'find lefthand subsequance
If sweep1 > si1 And sweep2 > si2 Then FindLongestSequences rob1, rob2, si1, sweep1 - 1, si2, sweep2 - 1
'find righthand subsequance
If sweep1 + (mylen - 1) < ei1 And sweep2 + (mylen - 1) < ei2 Then FindLongestSequences rob1, rob2, sweep1 + mylen, ei1, sweep2 + mylen, ei2
End Sub
edit:
I even tried adding the (.LastMut + .Mutations) > 0 condition to all robots checked, that did not help much, and besides there are ways to load two different DNA files with the same file name.
-
Have you tried profiling it? It's a semi-expensive algorithm, but it shouldn't lag that bad.
-
I don't think I can profile an isolated function with sleepy. Also, I know that the problem is in FindLongestSequances in robots the size of 'theone' or greater. It works exponentially faster for smaller robots.
Here is my temporary solution:
t = Flex.last(nomi)
For P = 1 To t
dati(P, GENETIC_DIST_GRAPH) = 0
Next P
GraphCounter.Visible = True
GraphCounter = "Updating Graph 0%"
DoEvents
Dim mymod As Integer
mymod = MaxRobs ^ 0.75
For t = 1 To MaxRobs
With rob(t)
If .exist And Not .Corpse And (t Mod mymod = 0 Or dati(P, GENETIC_DIST_GRAPH) = 0) Then
P = Flex.Position(rob(t).FName, nomi)
For x = t + 1 To MaxRobs
If rob(x).exist And Not rob(x).Corpse And rob(x).FName = .FName And (t Mod mymod = 0 Or dati(P, GENETIC_DIST_GRAPH) = 0) Then ' Must exist and be of same species
l = DoGeneticDistance(t, x) * 1000
If l > dati(P, GENETIC_DIST_GRAPH) Then dati(P, GENETIC_DIST_GRAPH) = l
End If
GraphCounter = "Updating Graph " & Int(t / MaxRobs * 100) & "." & Int(x / MaxRobs * 99) & "%"
DoEvents
Next x
End If
End With
Next t
GraphCounter.Visible = False
I am using 'mymod' to cut down on the number of checks at expense of accuracy.
I was hoping you can think of something better based on your statistics knowledge as I am aware that you have a masters in mathematics.
edit:
(I set the point mutations to 50 for these tests to emulate the worst case scenario)
-
You should be able to time an isolated function by searching for it in the sleepy result, and then viewing the children times. I haven't used sleepy very much so I'm afraid I won't be much help on that front, but I'm pretty sure it's possible.
I don't actually have a degree :/ Never finished.
In terms of the algorithm, try something like Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance). It might be what we're doing already basically, but that's the canonical method for this sort of thing. If you notice in the wiki article, to get reasonable speed you need to memoize (which is a fancy CS word for caching results that you already calculated). That might be the thing we aren't doing right now that would make this fast.
Also, you don't want to be calculating the distance between all pairs of bots, since this is O(n^2), which is way too slow for a graph that doesn't change the sim at all. Calculate the distance from the unmutated base species, or something like that. Also, the distance only changes when a bot mutates, so you can cache the results across frames for most bots.
-
I was actually thinking about this in almost the same way.
Here is my current idea:
Calculate max distance only for robots that acquired Dnalength / 10 mutations since the last check, otherwise use the old max distance. I am still going to need to ballpark that '10' value. This will add two more variables to the robot structure but I hope I will get better speed as the result.
I am going to try this solution over the weekend and let you know if it somewhat works.
Levenshtein distance
I see a couple of problems w/ this:
1.) This will mean permanently expending the DNA to include markers for all robots.
1a.) How will we save these?
1b.) Now I am slowing down the simulation under any conditions.
2.) I am not seeing an algorithm in my head that will accurately correct for mutations to memorized structures, especially because I am planning to make translocation and amplification work.
-
1.) This will mean permanently expending the DNA to include markers for all robots.
1a.) How will we save these?
1b.) Now I am slowing down the simulation under any conditions.
You shouldn't need any extra data stored with the DNA. The algorithm just works on arrays of objects (could be characters, could just as easily be DNA base pairs). All you need is "is-equal" and "is-not-equal". And if you're only running it on bots that've changed it shouldn't slow the sim down at all.
2.) I am not seeing an algorithm in my head that will accurately correct for mutations to memorized structures, especially because I am planning to make translocation and amplification work.
Just set a dirty flag when you mutate the DNA. If the dirty flag is set when you need the genetic distance, recompute it. Otherwise use the cached value.
-
[Small bump]
Sorry guys, I got distracted watching BioShock Infinite on YouTube :P. I promise I'll have something on Tuesday.
-
[bump]
Still a little slow, but I don't think people who are running the one with x50 point mutations will care much about genetic distance.
-
[bump]
Added another algorithm called 'Simple Genetic Distance' Still about as slow as the first one, but always checks all robots per graph update and a little less accurate. Here it is:
Public Function GeneticDistanceSimple(ByVal r1 As Integer, ByVal r2 As Integer) As Single
Dim b As Integer
Dim a As Integer
Dim mlst1() As Integer
Dim mlst2() As Integer
ReDim mlst1(UBound(rob(r1).DNA))
ReDim mlst2(UBound(rob(r2).DNA))
' match list is declared
Do 'loop until end of both DNA
'special case different dna length
If a > UBound(rob(r1).DNA) Then
If mlst2(b) = 0 Then mlst2(b) = -2
GoTo fine
End If
If b > UBound(rob(r2).DNA) Then
If mlst1(a) = 0 Then mlst1(a) = -2
GoTo fine
End If
If rob(r1).DNA(a).tipo = rob(r2).DNA(b).tipo And rob(r1).DNA(a).value = rob(r2).DNA(b).value Then
'the data is the same, store -1
mlst1(a) = -1
mlst2(b) = -1
Else
'lets see if data picks up later
Dim b2 As Integer
For b2 = b + 1 To UBound(rob(r2).DNA)
If rob(r2).DNA(b2).tipo = rob(r1).DNA(a).tipo And rob(r2).DNA(b2).value = rob(r1).DNA(a).value And mlst2(b2) < 1 Then
'data does pickup later
mlst1(a) = b2 + 1
mlst2(b2) = a + 1
b = b - 1
GoTo fine
End If
Next
'lets see if data picks up early
For b2 = b - 1 To 0 Step -1
If rob(r2).DNA(b2).tipo = rob(r1).DNA(a).tipo And rob(r2).DNA(b2).value = rob(r1).DNA(a).value And mlst2(b2) < 1 Then
'data does pickup later
mlst1(a) = b2 + 1
mlst2(b2) = a + 1
b = b - 1
GoTo fine
End If
Next
'the data is different, store -2
If mlst1(a) < 1 Then mlst1(a) = -2
If mlst2(b) < 1 Then mlst2(b) = -2
End If
fine:
a = a + 1
b = b + 1
Loop Until a > UBound(rob(r1).DNA) And b > UBound(rob(r2).DNA)
Dim diff As Integer 'holds diff data counter
For a = 0 To UBound(rob(r1).DNA)
If mlst1(a) = -2 Then
diff = diff + 1
ElseIf mlst1(a) > 0 Then 'data was moved
If a > 0 Then
If Abs(mlst1(a - 1) - mlst1(a)) > 1 Then 'only store if move unconsequtive
diff = diff + 1
End If
Else 'should we store the move automaticaly?
diff = diff + 1
End If
End If
Next
For a = 0 To UBound(rob(r2).DNA)
If mlst2(a) = -2 Then
diff = diff + 1
ElseIf mlst2(a) > 0 Then 'data was moved
If a > 0 Then
If Abs(mlst2(a - 1) - mlst2(a)) > 1 Then 'only store if move unconsequtive
diff = diff + 1
End If
Else 'should we store the move automaticaly?
diff = diff + 1
End If
End If
Next
GeneticDistanceSimple = diff / (UBound(rob(r1).DNA) + UBound(rob(r2).DNA) + 2)
End Function
I check twice r1 <--- ---> r2, and select for the smallest genetic distance, because the algorithm calculates differently based on what order the dna is introduced, left handed or right handed.
It is up to the user if he wants better accuracy per robot or better accuracy for population, both "Simple genetic distance" and "Genetic distance" graphs are included.
-
It is happening people. I am running my first sexual reproduction sim. I will post it tomorrow and send a SVN package. 8)
edit:
I can not believe what I am looking at! I have an eco-system developing within 55000 cycles.
This is one of the coolest experiments I have ever ran.
-
Is it slow in general or just in sexual reproduction sims?
-
no, It is only slow every time genetic distance needs to be calculate for two robots. This process takes about 1-3 seconds. So when these guys are reproducing (which is not very often) the screen does periodically lockup for like a second. However, if you want to figure out max genetic distance for all robots (as in the max. genetic distance graph) it is reasonably slow as hell. Example: if we have 200 robots and we check them all that is 20100 checks, mult by 2 that is 40200 seconds. which is literally 11 hours.
-
update:
Just as things began to get interesting the robots fully lost ability to reproduce.
I may have to disable point mutations and increase ageing cost.
edit:
actually disabling aging cost completely may actually have a positive effect. Still looking at this.
-
Ah, I understand it now. :)
It is up to the user if he wants better accuracy per robot or better accuracy for population, both "Simple genetic distance" and "Genetic distance" graphs are included.
I prefer to have just one graph if they're basically doing the same. It would also seem easier in code maintenance.
-
That's the thing, the algorithm is very different for both graphs.
I need the simpler algorithm to generate a genetic distance matrix (later) for an idea I had.
The simpler algorithm is about x40 faster, therefor (based on my example) instead of waiting 11 hours I will only need to wait 16 minutes.
You can not use crossover with the simpler algorithm.
My thinking was since I am adding it anyway, why not make it available to the user?