Author Topic: DB3 Leagues (Read 20053 times)

Panda · « **on:** April 09, 2015, 02:02:51 PM »

I assume DB3 will have the concept of leagues. There are a few things to consider when it comes to the leagues. For the most part, these are just musings on my behalf.

league design
a server side league: an official league or set of leagues that can be computed periodically
- low run costs - top n winners from previous league
- central repository - allows users to download all bots more easily; linked to the forum (e.g. whenever a bot is posted a thread is posted)
- limiting user bot submission - too many bots posted could cause difficulty running leagues

Numsgil · « **Reply #1 on:** April 09, 2015, 02:51:18 PM »

When I first joined we were using a simple ladder tournament. ie: you start at the bottom and have to challenge your way up the ladder one rung at a time. Once you can't beat the guy ahead of you, that becomes your new place in the ladder and you basically stay there. It gets less reasonable as there are more and more bots.

I think Bots favors something more like a round robin tournament, which is admittedly more fair but scales even worse as you add more bots.

Another option is to take a statistical approach and give bots an elo rating. You wouldn't need to challenge every other bot to be declared the best; you'd just need to fight enough different matches for the system to converge with a reasonable confidence level to a elo rating for your bot. There are possibly issues of elo inflation to contend with and bots doing better/worse on different versions, but those are similar to the issues real elo systems face, and we have the advantage that bots don't age or really change much over time (except for the issues around different versions). The number of rounds you need to play is considerably less than even the ladder system, which means it scales nicely.

Botsareus · « **Reply #2 on:** April 09, 2015, 03:23:56 PM »

Actually, I think I did exactly as described. The challenger is ranked when it stops winning. It is not a requirement to fight everyone.

spike43884 · « **Reply #3 on:** April 09, 2015, 03:34:35 PM »

What about a "Multi-Floor League" of some form.
Multi-Floor is a concept half way between stepladder and round-robin.
It groups already ranked individuals into groups of N size, say 9? then the bot goes against this group and gets the ranking there, then if its in the top lets say, 3 on that tournament, it goes onto the next tournament (generally the floors overlap by one bot, so the top bot from the floor below goes up one)

It won't give an instantaneous result of course, its slightly slower than round robin but it'd work...Plus the fact you can have multiple bots entering, because you can go 1 competitor per floor, the bots positions are averaged out, then its given a specific placing...So if it gets No.1 in all of the floors it receives No.1 Place but if its No.2 in all of the floors except the final one, it gets through as its in top 3, but it only gets 2nd place, or possibly first...This is decided by each floors position giving X points, so 1st place in a floor gets 10 points, last gets 1 point...Any identical scores are put as oldest bot highest (as it'd have won most competitions). This also means if a random update comes a long because scores can be updates to their latest, all bots have correct positions.

Overrall its still a bit intensive, but not as intensive as round robin this also overcomes the elo rating problem that it might fair against other bots better as it challenges all currently ranked bots. You then can have a thing in the submission which quickly scans if any of the ACTIVE DNA (as we're not doing mutations in leagues yet) is already in that league, and if any of the ACTIVE DNA is violating the rules. Then you can also delete INACTIVE DNA because we don't have mutations in the leagues, notoriously speeding up the sims.

Botsareus · « **Reply #4 on:** April 09, 2015, 03:39:27 PM »

Cool, that is like my tournament mode mixed with stepladder.

Peter · « **Reply #5 on:** April 09, 2015, 05:40:31 PM »

A tournament like would have to repeat itself every time. That would drain cpu resources.

I like some parts of how it was done in the last google AI challenge.

They had a elo rating.
New versions were treated like new bots, i.e. elo rating is reset.
There was a (simple) test to be allowed at all to the league.
It was automatic.

spike43884 · « **Reply #6 on:** April 10, 2015, 06:30:10 AM »

Quote from: Peter on April 09, 2015, 05:40:31 PM

A tournament like would have to repeat itself every time. That would drain cpu resources.

I like some parts of how it was done in the last google AI challenge.

They had a elo rating.
New versions were treated like new bots, i.e. elo rating is reset.
There was a (simple) test to be allowed at all to the league.
It was automatic.

Automatic is covered by floors...
New versions treated like new bots, not to complicated, you just need to check for 100% duplicate DNA
Simple test to be allowed at all to leagues, as I said...Check active DNA for league violation
Elo rating im still against, for reasons very clear in some of my bots that I wrote in DB2... Lets take a simple example of my 2 very iconic bots, Dizzy & All_hunter... All_hunter is superbly programmed, and can take out most bots, even some of the top F1 bots when i've pitted it in 1 species v 1 species scenario's but then when its pitted against Dizzy, its obliterated, because All_Hunter spreads out, and doesn't form colonies which allows it to target food across the map, and is a failsafe for cannibots evolving in evolutionary simulations and dizzy just spins it. Hence the name dizzy, it spins its prey... Brilliant against any bot that tries to go solo, because it spins the prey so quickly it actually breaks the eyes (some weird weakness in the code, and some sims they haven't even attacked a specific plant or bot but its eyes go as if its been spinned?) now, currently in Elo rating it'd put Dizzy above all_hunter probably, yet if we pitted Dizzy against a 3rd bot of mine, RBM (russian babies mobilised) it'd be obliterated, yet RBM is easily wiped out by all_hunter. So which one is better? You don't nessisarily save much more CPU cost than if did another type of league, or you'd get really inaccurate readings...

Your point of the entire tournament with floors having to be repeated is incorrect. A bot is entered in, then it starts only floor one, once floor one is finished it goes to floor 2 if it scored highly in floor one, and that process continues till it hits a floor which it struggles in... This stops the problem of 1 species v 1 species scenario's cropping up as well. Only once a bot is entered does a floor need to be repeated...which also allows all bots on that floor to be re-ranked...Maybe to conserve CPU resources we update the positions every 6 hours instead of every tournament, just storing scores between those intervals...

Thanks for the support botsareus

Panda · « **Reply #7 on:** April 10, 2015, 06:57:23 AM »

I like the idea of using the Elo system.

Quote from: Numsgil on April 09, 2015, 02:51:18 PM

There are possibly issues of elo inflation to contend with and bots doing better/worse on different versions, but those are similar to the issues real elo systems face, and we have the advantage that bots don't age or really change much over time (except for the issues around different versions).

The bots shouldn't be powered differently over shorter periods of time, only over a longer period of time. Perhaps the concepts of a season can be introduced (after an agreed number of versions or periodically) to help prevent problems between versions. This might reduce the effects of deflation. However, I don't believe that deflation is too much of a problem since it would probably be only used to compare current bots rather than bots between versions, since a bot is never removed from the rankings or never stops playing (unless the rules of the tournament are changed).

We wouldn't have a problem with bots "avoiding" playing, either.

Would it be sensible to have a round robin tournament (or something similar) periodically for the top n bots to give a clear ranking?

Quote from: spike43884 on April 10, 2015, 06:30:10 AM

Elo rating im still against, for reasons very clear in some of my bots that I wrote in DB2... Lets take a simple example of my 2 very iconic bots, Dizzy & All_hunter... All_hunter is superbly programmed, and can take out most bots, even some of the top F1 bots when i've pitted it in 1 species v 1 species scenario's but then when its pitted against Dizzy, its obliterated, because All_Hunter spreads out, and doesn't form colonies which allows it to target food across the map, and is a failsafe for cannibots evolving in evolutionary simulations and dizzy just spins it. Hence the name dizzy, it spins its prey... Brilliant against any bot that tries to go solo, because it spins the prey so quickly it actually breaks the eyes (some weird weakness in the code, and some sims they haven't even attacked a specific plant or bot but its eyes go as if its been spinned?) now, currently in Elo rating it'd put Dizzy above all_hunter probably, yet if we pitted Dizzy against a 3rd bot of mine, RBM (russian babies mobilised) it'd be obliterated, yet RBM is easily wiped out by all_hunter. So which one is better? You don't nessisarily save much more CPU cost than if did another type of league, or you'd get really inaccurate readings...

You're forgetting that you're only really comparing these 3 bots together. Where RBM > Dizzy > All_hunter (RBM ? All_hunter) and you aren't taking into account all of the other bots for all of them 00. With enough competitions, the bot should arrive at an appropriate rating. All_hunters rating would be reduced by Dizzy slightly and (if it beats RBM) then its elo would be increased. Elo rating doesn't work well when you're comparing 3 bots that can beat each other but works better when more can be compared.

Peter · « **Reply #8 on:** April 10, 2015, 09:10:48 AM »

I assume everyone naming Elo means a statistical model instead of just Elo. From a quick search Glicko, TrueSkill and Elo are the ones where libraries are plenty available.

Trueskill has the advantage it could rate "free for all" fights. Which may be handy for newer bots to quickly rise on the league table.

Also how about multiple fight maps, size, shapes and different physics. Means a bot need to survive in different environments. A all round bot.

A round robin-like for the top bots seems logically. Might enforce a higher confidence/reliability level in a rating model for the top bots to create it naturally. Not sure if it needs to happen often, once in place positions would be rather static. Which has the advantage to extend the round robin more down.

Panda · « **Reply #9 on:** April 10, 2015, 09:26:23 AM »

Quote from: Peter on April 10, 2015, 09:10:48 AM

I assume everyone naming Elo means a statistical model instead of just Elo. From a quick search Glicko, TrueSkill and Elo are the ones where libraries are plenty available.

Trueskill has the advantage it could rate "free for all" fights. Which may be handy for newer bots to quickly rise on the league table.

I suppose I do. I was reading about a few of them earlier and which statistical system we use would be a decision we'd have to make.

Quote from: Peter on April 10, 2015, 09:10:48 AM

Also how about multiple fight maps, size, shapes and different physics. Means a bot need to survive in different environments. A all round bot.

A specific league for an "all round bot"? I think that's a brilliant idea but I foresee problems with computation times.

Quote from: Peter on April 10, 2015, 09:10:48 AM

A round robin-like for the top bots seems logically. Might enforce a higher confidence/reliability level in a rating model for the top bots to create it naturally. Not sure if it needs to happen often, once in place positions would be rather static. Which has the advantage to extend the round robin more down.

Prehaps a system where, when a new bot reaches a certain rating (possibly above the worst bot in the "top league"), it can be entered into the "top league" and pitted against each one of the bots and then placed (or not placed) in the league accordingly? Unless that is what you were trying to say?

Peter · « **Reply #10 on:** April 10, 2015, 09:48:47 AM »

Quote from: Panda on April 10, 2015, 09:26:23 AM

A specific league for an "all round bot"? I think that's a brilliant idea but I foresee problems with computation times.

Not necessarily, the fights would just have another random attribute, the map and physics. It doesn't have to take more computation time.

I expect fights not to have a definite win/lose like in db2. Continuing till you hit 95% confidence. But let the rating system take care of it, so a fight is just one fight.

Quote from: Panda on April 10, 2015, 09:26:23 AM

Prehaps a system where, when a new bot reaches a certain rating (possibly above the worst bot in the "top league"), it can be entered into the "top league" and pitted against each one of the bots and then placed (or not placed) in the league accordingly? Unless that is what you were trying to say?

Is one possibility. You could also extend the "top league" with one more bot each time the ranking in the "top league" is rock solid, or even decrease if multiple new strong bots appear.

Panda · « **Reply #11 on:** April 10, 2015, 11:38:27 AM »

Quote from: spike43884 on April 10, 2015, 06:30:10 AM

Your point of the entire tournament with floors having to be repeated is incorrect. A bot is entered in, then it starts only floor one, once floor one is finished it goes to floor 2 if it scored highly in floor one, and that process continues till it hits a floor which it struggles in... This stops the problem of 1 species v 1 species scenario's cropping up as well. Only once a bot is entered does a floor need to be repeated...which also allows all bots on that floor to be re-ranked...Maybe to conserve CPU resources we update the positions every 6 hours instead of every tournament, just storing scores between those intervals...

Yeah, it would be 1v1 but one loss won't mean it will be knocked out completely, it'll just lower its score a little (as it rightly should), which would be brought back up in the situation of 1 species obliterating so many others.

I would err on the side of using a statistical system since I trust the mathematics rather than your reasoning (sorry

).

Quote from: Peter on April 10, 2015, 09:48:47 AM

Quote from: Panda on April 10, 2015, 09:26:23 AM
A specific league for an "all round bot"? I think that's a brilliant idea but I foresee problems with computation times.
Not necessarily, the fights would just have another random attribute, the map and physics. It doesn't have to take more computation time.

So are you saying F1 should be a "all round" or it would be different?

Quote from: Peter on April 10, 2015, 09:48:47 AM

I expect fights not to have a definite win/lose like in db2. Continuing till you hit 95% confidence. But let the rating system take care of it, so a fight is just one fight.

Yeah, I agree that it will probably be the same and that the rating system will take care of it.

Peter · « **Reply #12 on:** April 10, 2015, 12:04:11 PM »

Quote from: Panda on April 10, 2015, 09:26:23 AM

Quote from: Peter on April 10, 2015, 09:48:47 AM
Quote from: Panda on April 10, 2015, 09:26:23 AM
A specific league for an "all round bot"? I think that's a brilliant idea but I foresee problems with computation times.
Not necessarily, the fights would just have another random attribute, the map and physics. It doesn't have to take more computation time.
So are you saying F1 should be a "all round" or it would be different?

Aye,as a F1 league. Yet, although I like the idea of having a all round F1 league, it's one of those things that may seen good in theory, but sucks in practice. But support for different league versions seems nice.

Numsgil · « **Reply #13 on:** April 10, 2015, 12:22:37 PM »

I've been reviewing the literature on how elo, etc. work. I have some observations:

1. Because we can choose which bots fight which other bots, at the simplest we could choose a random sample (with repeats) of bots for a challenger to fight a single round with. From that we can get its win rate, and a confidence bound on that win rate. So your global win rate is 75% +/- 3%, say. And we can control what the worst case +/- factor is by increasing the sample size. Also as Peter pointed out, winning doesn't have to be binary. You could have an 80% win after 100k cycles because you control 80% of the biomass in the sim, so your overall winrate can factor that in pretty easily.

Your final win rate could be your rating, because it's basically an unbiased sample of your actual win rate if we ran an infinite number of rounds against all other bots, and that's ordered the same way that the elo ratings would be. The disadvantage here is that the win rate will change over time as new bots are added to the league, so your rating is not constant over time. But everytime a challenger is added we only need to run the rounds necessary for it to get its global win rate percentage.

If we anchored a bot with a specific elo (the animal minimalis equivalent has say 1000 elo) we could probably figure out elo ratings from the relative win percentages, I think. Something something math math.

We'd have to rerun the leagues after every new version, though, as the win rates are pulled from old matches and no longer valid in that way. But that could form the seasons Panda was talking about.

2. There are N choose 2 ways to pair off N bots. If each pairing produces a probability that A wins over B (for pair (A,B)), call it P(A > B ) (which is just A's win rate for the A-B match), we can take the inverse of the CDF of the unit normal distribution (call it Phi_Inv) and get a system of N choose 2 equations, and least squares solve it for elo ratings. What I mean is: Phi_Inv(P(A > B )) = (s1 - s2) / (sqrt(2) Beta), where s1 and s2 are the elo ratings of A and B respectively, and Beta is the sqrt of the variance in performance of A and B (assuming each bot has the same variance, which is a big assumption but makes the math easier). That's basically what Elo is trying to approximate. But we have the computing power to calculate it directly.

I'm working through the articles on TrueSkill and Glickman right now; they might be doing something more clever. At the very least they factor in confidence intervals. But I think, because we can run exactly the rounds we want to and no more, and can choose how the matches are chosen, we can do a lot more global optimizing and a lot less incremental updating.

...

But yes, in principle I think the statistical approach is pretty compelling. I think that's the way to go for sure.

Numsgil · « **Reply #14 on:** April 10, 2015, 12:30:50 PM »

Oh, one more thought:

In situations of rock-paper-scissors, where there's a (large) margin of players choosing one strategy over another, the flat win rate I mentioned above would artificially inflate for the minority strategy, I think. Would elo have that same problem? I think not... I'd want to see some simulations, probably.

Darwinbots Forum

News:

Author Topic: DB3 Leagues (Read 20053 times)

Panda

DB3 Leagues

Numsgil

Re: DB3 Leagues

Botsareus

Re: DB3 Leagues

spike43884

Re: DB3 Leagues

Botsareus

Re: DB3 Leagues

Peter

Re: DB3 Leagues

spike43884

Re: DB3 Leagues

Panda

Re: DB3 Leagues

Peter

Re: DB3 Leagues

Panda

Re: DB3 Leagues

Peter

Re: DB3 Leagues

Panda

Re: DB3 Leagues

Peter

Re: DB3 Leagues

Numsgil

Re: DB3 Leagues

Numsgil

Re: DB3 Leagues