I like the idea of using the Elo system.
There are possibly issues of elo inflation to contend with and bots doing better/worse on different versions, but those are similar to the issues real elo systems face, and we have the advantage that bots don't age or really change much over time (except for the issues around different versions).
The bots shouldn't be powered differently over
shorter periods of time, only over a longer period of time. Perhaps the concepts of a season can be introduced (after an agreed number of versions or periodically) to help prevent problems between versions. This might reduce the effects of deflation. However, I don't believe that deflation is too much of a problem since it would probably be only used to compare current bots rather than bots between versions, since a bot is never removed from the rankings or never stops playing (unless the rules of the tournament are changed).
We wouldn't have a problem with bots "avoiding" playing, either.
Would it be sensible to have a round robin tournament (or something similar) periodically for the top
n bots to give a clear ranking?
Elo rating im still against, for reasons very clear in some of my bots that I wrote in DB2... Lets take a simple example of my 2 very iconic bots, Dizzy & All_hunter... All_hunter is superbly programmed, and can take out most bots, even some of the top F1 bots when i've pitted it in 1 species v 1 species scenario's but then when its pitted against Dizzy, its obliterated, because All_Hunter spreads out, and doesn't form colonies which allows it to target food across the map, and is a failsafe for cannibots evolving in evolutionary simulations and dizzy just spins it. Hence the name dizzy, it spins its prey... Brilliant against any bot that tries to go solo, because it spins the prey so quickly it actually breaks the eyes (some weird weakness in the code, and some sims they haven't even attacked a specific plant or bot but its eyes go as if its been spinned?) now, currently in Elo rating it'd put Dizzy above all_hunter probably, yet if we pitted Dizzy against a 3rd bot of mine, RBM (russian babies mobilised) it'd be obliterated, yet RBM is easily wiped out by all_hunter. So which one is better? You don't nessisarily save much more CPU cost than if did another type of league, or you'd get really inaccurate readings...
You're forgetting that you're only really comparing these 3 bots together. Where RBM > Dizzy > All_hunter (RBM ? All_hunter) and you aren't taking into account all of the other bots for all of them 00. With enough competitions, the bot should arrive at an appropriate rating. All_hunters rating would be reduced by Dizzy slightly and (if it beats RBM) then its elo would be increased. Elo rating doesn't work well when you're comparing 3 bots that can beat each other but works better when more can be compared.