Couldn't kernels read tokenised DNA code from one buffer, memlocs from another, and interpret the DNA? Writing the new memloc values to a third buffer, perhaps. There is a lot of branching, which would reduce the benefit from SIMD processors, but maybe this could be reduced by grouping similar bots. Most bots spend the vast majority of their time doing one thing, from my experience (at least in evosims), like searching or spinning, so this might just work.