Code center > Darwinbots Program Source Code

Need fastest possible code to compare dna

<< < (2/5) > >>

JRaccoon:
So that's just Spanish for type then? I'm guessing that dates all the way back to Carlo's original code.

I was trying to interpret it as some special abbreviation, and couldn't come up with anything.

Botsareus:
Italian actually  :P

Botsareus:
Anyway, I am going to read up on Levenshtein distance on Wikipedia. If I still do not know what to do with the resulting matrix after that, I'll be asking.
I still want to get Numsgil view on these though.

JRaccoon:
A couple of points about the edit distance produced by the Levenshtein algorithm:

* It will find the distance to transform a string not only with substitutions, but also insertions and deletions. The edit distance between ACG and ATCG is 1 because one insertion is required for the first sequence. I don't know if that is the desired functionality for you.
* The final result of the computation is a matrix that contains all of the edit distances for each of the substrings. This allows you at the end to find the exact path that generated the complete shortest distance. The final edit distance will be found at I_MAX, J_MAX where those are the lengths of the arrays. You may not need this extra functionality, which means there are optimizations for the space complexity of the algorithm.
I looked up and there is a double-wide integral type for Visual Basic (LONG and ULONG), so my suggestion to simplify the algorithm (with at least a small hit to performance) stands as an alternative. Each block could become a single value for comparison (I have virtually no experience with Visual Basic so I apologize if the code below is gibberish):

--- Code: ---Dim nucleic As Long
nucleic = tipo << 32          ' Edit: You will probably need to cast tipo as a long for this to work, based on a cursory glance at the MSDN pages
nucleic = nucleic Or value

--- End code ---

Botsareus:
<< is not supported in vb


See picky. I kinda get what I am looking at. What I don't understand is what to do with this data to get a ratio of how different two strings are. Where 100% will be the same and 0% completely different (as an example.)  I also suspect there is optimization that can be done based on the type of result I need. Thx.


edit: Had to remember to actually upload my picky

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version