Darwinbots Forum

Code center => Darwinbots3 => Topic started by: Panda on April 09, 2015, 02:02:51 PM

Title: DB3 Leagues
Post by: Panda on April 09, 2015, 02:02:51 PM
I assume DB3 will have the concept of leagues. There are a few things to consider when it comes to the leagues. For the most part, these are just musings on my behalf.
Title: Re: DB3 Leagues
Post by: Numsgil on April 09, 2015, 02:51:18 PM
When I first joined we were using a simple ladder tournament.  ie: you start at the bottom and have to challenge your way up the ladder one rung at a time.  Once you can't beat the guy ahead of you, that becomes your new place in the ladder and you basically stay there.  It gets less reasonable as there are more and more bots.

I think Bots favors something more like a round robin tournament (https://en.wikipedia.org/wiki/Round-robin_tournament), which is admittedly more fair but scales even worse as you add more bots.

Another option is to take a statistical approach and give bots an elo rating (http://forum.darwinbots.com/index.php/topic,3371.msg1382241.html#msg1382241).  You wouldn't need to challenge every other bot to be declared the best; you'd just need to fight enough different matches for the system to converge with a reasonable confidence level to a elo rating for your bot.  There are possibly issues of elo inflation to contend with and bots doing better/worse on different versions, but those are similar to the issues real elo systems face, and we have the advantage that bots don't age or really change much over time (except for the issues around different versions).  The number of rounds you need to play is considerably less than even the ladder system, which means it scales nicely.
Title: Re: DB3 Leagues
Post by: Botsareus on April 09, 2015, 03:23:56 PM
Actually, I think I did exactly as described. The challenger is ranked when it stops winning. It is not a requirement to fight everyone.
Title: Re: DB3 Leagues
Post by: spike43884 on April 09, 2015, 03:34:35 PM
What about a "Multi-Floor League" of some form.
Multi-Floor is a concept half way between stepladder and round-robin.
It groups already ranked individuals into groups of N size, say 9? then the bot goes against this group and gets the ranking there, then if its in the top lets say, 3 on that tournament, it goes onto the next tournament (generally the floors overlap by one bot, so the top bot from the floor below goes up one)

It won't give an instantaneous result of course, its slightly slower than round robin but it'd work...Plus the fact you can have multiple bots entering, because you can go 1 competitor per floor, the bots positions are averaged out, then its given a specific placing...So if it gets No.1 in all of the floors it receives No.1 Place but if its No.2 in all of the floors except the final one, it gets through as its in top 3, but it only gets 2nd place, or possibly first...This is decided by each floors position giving X points, so 1st place in a floor gets 10 points, last gets 1 point...Any identical scores are put as oldest bot highest (as it'd have won most competitions). This also means if a random update comes a long because scores can be updates to their latest, all bots have correct positions.

Overrall its still a bit intensive, but not as intensive as round robin this also overcomes the elo rating problem that it might fair against other bots better as it challenges all currently ranked bots. You then can have a thing in the submission which quickly scans if any of the ACTIVE DNA (as we're not doing mutations in leagues yet) is already in that league, and if any of the ACTIVE DNA is violating the rules. Then you can also delete INACTIVE DNA because we don't have mutations in the leagues, notoriously speeding up the sims.
Title: Re: DB3 Leagues
Post by: Botsareus on April 09, 2015, 03:39:27 PM
Cool, that is like my tournament mode mixed with stepladder.
Title: Re: DB3 Leagues
Post by: Peter on April 09, 2015, 05:40:31 PM
A tournament like would have to repeat itself every time. That would drain cpu resources.

I like some parts of how it was done in the last google AI challenge.

They had a elo rating.
New versions were treated like new bots, i.e. elo rating is reset.
There was a (simple) test to be allowed at all to the league.
It was automatic.
Title: Re: DB3 Leagues
Post by: spike43884 on April 10, 2015, 06:30:10 AM
A tournament like would have to repeat itself every time. That would drain cpu resources.

I like some parts of how it was done in the last google AI challenge.

They had a elo rating.
New versions were treated like new bots, i.e. elo rating is reset.
There was a (simple) test to be allowed at all to the league.
It was automatic.
Automatic is covered by floors...
New versions treated like new bots, not to complicated, you just need to check for 100% duplicate DNA
Simple test to be allowed at all to leagues, as I said...Check active DNA for league violation
Elo rating im still against, for reasons very clear in some of my bots that I wrote in DB2... Lets take a simple example of my 2 very iconic bots, Dizzy & All_hunter... All_hunter is superbly programmed, and can take out most bots, even some of the top F1 bots when i've pitted it in 1 species v 1 species scenario's but then when its pitted against Dizzy, its obliterated, because All_Hunter spreads out, and doesn't form colonies which allows it to target food across the map, and is a failsafe for cannibots evolving in evolutionary simulations and dizzy just spins it. Hence the name dizzy, it spins its prey... Brilliant against any bot that tries to go solo, because it spins the prey so quickly it actually breaks the eyes (some weird weakness in the code, and some sims they haven't even attacked a specific plant or bot but its eyes go as if its been spinned?) now, currently in Elo rating it'd put Dizzy above all_hunter probably, yet if we pitted Dizzy against a 3rd bot of mine, RBM (russian babies mobilised) it'd be obliterated, yet RBM is easily wiped out by all_hunter. So which one is better? You don't nessisarily save much more CPU cost than if did another type of league, or you'd get really inaccurate readings...

Your point of the entire tournament with floors having to be repeated is incorrect. A bot is entered in, then it starts only floor one, once floor one is finished it goes to floor 2 if it scored highly in floor one, and that process continues till it hits a floor which it struggles in... This stops the problem of 1 species v 1 species scenario's cropping up as well. Only once a bot is entered does a floor need to be repeated...which also allows all bots on that floor to be re-ranked...Maybe to conserve CPU resources we update the positions every 6 hours instead of every tournament, just storing scores between those intervals...


Thanks for the support botsareus :D
Title: Re: DB3 Leagues
Post by: Panda on April 10, 2015, 06:57:23 AM
I like the idea of using the Elo system.

There are possibly issues of elo inflation to contend with and bots doing better/worse on different versions, but those are similar to the issues real elo systems face, and we have the advantage that bots don't age or really change much over time (except for the issues around different versions).
The bots shouldn't be powered differently over shorter periods of time, only over a longer period of time. Perhaps the concepts of a season can be introduced (after an agreed number of versions or periodically) to help prevent problems between versions. This might reduce the effects of deflation. However, I don't believe that deflation is too much of a problem since it would probably be only used to compare current bots rather than bots between versions, since a bot is never removed from the rankings or never stops playing (unless the rules of the tournament are changed).

We wouldn't have a problem with bots "avoiding" playing, either.

Would it be sensible to have a round robin tournament (or something similar) periodically for the top n bots to give a clear ranking?

Elo rating im still against, for reasons very clear in some of my bots that I wrote in DB2... Lets take a simple example of my 2 very iconic bots, Dizzy & All_hunter... All_hunter is superbly programmed, and can take out most bots, even some of the top F1 bots when i've pitted it in 1 species v 1 species scenario's but then when its pitted against Dizzy, its obliterated, because All_Hunter spreads out, and doesn't form colonies which allows it to target food across the map, and is a failsafe for cannibots evolving in evolutionary simulations and dizzy just spins it. Hence the name dizzy, it spins its prey... Brilliant against any bot that tries to go solo, because it spins the prey so quickly it actually breaks the eyes (some weird weakness in the code, and some sims they haven't even attacked a specific plant or bot but its eyes go as if its been spinned?) now, currently in Elo rating it'd put Dizzy above all_hunter probably, yet if we pitted Dizzy against a 3rd bot of mine, RBM (russian babies mobilised) it'd be obliterated, yet RBM is easily wiped out by all_hunter. So which one is better? You don't nessisarily save much more CPU cost than if did another type of league, or you'd get really inaccurate readings...

You're forgetting that you're only really comparing these 3 bots together. Where RBM > Dizzy > All_hunter (RBM ? All_hunter) and you aren't taking into account all of the other bots for all of them 00. With enough competitions, the bot should arrive at an appropriate rating. All_hunters rating would be reduced by Dizzy slightly and (if it beats RBM) then its elo would be increased. Elo rating doesn't work well when you're comparing 3 bots that can beat each other but works better when more can be compared.
Title: Re: DB3 Leagues
Post by: Peter on April 10, 2015, 09:10:48 AM
I assume everyone naming Elo means a statistical model instead of just Elo. From a quick search Glicko, TrueSkill and Elo are the ones where libraries are plenty available.

Trueskill has the advantage it could rate "free for all" fights. Which may be handy for newer bots to quickly rise on the league table.

Also how about multiple fight maps, size, shapes and different physics. Means a bot need to survive in different environments. A all round bot.

A round robin-like for the top bots seems logically. Might enforce a higher confidence/reliability level in a rating model for the top bots to create it naturally. Not sure if it needs to happen often, once in place positions would be rather static. Which has the advantage to extend the round robin more down.
Title: Re: DB3 Leagues
Post by: Panda on April 10, 2015, 09:26:23 AM
I assume everyone naming Elo means a statistical model instead of just Elo. From a quick search Glicko, TrueSkill and Elo are the ones where libraries are plenty available.

Trueskill has the advantage it could rate "free for all" fights. Which may be handy for newer bots to quickly rise on the league table.
I suppose I do. I was reading about a few of them earlier and which statistical system we use would be a decision we'd have to make.

Also how about multiple fight maps, size, shapes and different physics. Means a bot need to survive in different environments. A all round bot.
A specific league for an "all round bot"? I think that's a brilliant idea but I foresee problems with computation times.
 
A round robin-like for the top bots seems logically. Might enforce a higher confidence/reliability level in a rating model for the top bots to create it naturally. Not sure if it needs to happen often, once in place positions would be rather static. Which has the advantage to extend the round robin more down.
Prehaps a system where, when a new bot reaches a certain rating (possibly above the worst bot in the "top league"), it can be entered into the "top league" and pitted against each one of the bots and then placed (or not placed) in the league accordingly? Unless that is what you were trying to say?
Title: Re: DB3 Leagues
Post by: Peter on April 10, 2015, 09:48:47 AM
A specific league for an "all round bot"? I think that's a brilliant idea but I foresee problems with computation times.
Not necessarily, the fights would just have another random attribute, the map and physics. It doesn't have to take more computation time.

I expect fights not to have a definite win/lose like in db2. Continuing till you hit 95% confidence. But let the rating system take care of it, so a fight is just one fight.
 
Prehaps a system where, when a new bot reaches a certain rating (possibly above the worst bot in the "top league"), it can be entered into the "top league" and pitted against each one of the bots and then placed (or not placed) in the league accordingly? Unless that is what you were trying to say?
Is one possibility. You could also extend the "top league" with one more bot each time the ranking in the "top league" is rock solid, or even decrease if multiple new strong bots appear.
Title: Re: DB3 Leagues
Post by: Panda on April 10, 2015, 11:38:27 AM
Your point of the entire tournament with floors having to be repeated is incorrect. A bot is entered in, then it starts only floor one, once floor one is finished it goes to floor 2 if it scored highly in floor one, and that process continues till it hits a floor which it struggles in... This stops the problem of 1 species v 1 species scenario's cropping up as well. Only once a bot is entered does a floor need to be repeated...which also allows all bots on that floor to be re-ranked...Maybe to conserve CPU resources we update the positions every 6 hours instead of every tournament, just storing scores between those intervals...
Yeah, it would be 1v1 but one loss won't mean it will be knocked out completely, it'll just lower its score a little (as it rightly should), which would be brought back up in the situation of 1 species obliterating so many others.

I would err on the side of using a statistical system since I trust the mathematics rather than your reasoning (sorry :().



A specific league for an "all round bot"? I think that's a brilliant idea but I foresee problems with computation times.
Not necessarily, the fights would just have another random attribute, the map and physics. It doesn't have to take more computation time.
So are you saying F1 should be a "all round" or it would be different?

I expect fights not to have a definite win/lose like in db2. Continuing till you hit 95% confidence. But let the rating system take care of it, so a fight is just one fight.
Yeah, I agree that it will probably be the same and that the rating system will take care of it.
Title: Re: DB3 Leagues
Post by: Peter on April 10, 2015, 12:04:11 PM
A specific league for an "all round bot"? I think that's a brilliant idea but I foresee problems with computation times.
Not necessarily, the fights would just have another random attribute, the map and physics. It doesn't have to take more computation time.
So are you saying F1 should be a "all round" or it would be different?
Aye,as a F1 league. Yet, although I like the idea of having a all round F1 league, it's one of those things that may seen good in theory, but sucks in practice. But  support for different league versions seems nice.
Title: Re: DB3 Leagues
Post by: Numsgil on April 10, 2015, 12:22:37 PM
I've been reviewing the literature on how elo, etc. work.  I have some observations:

1.  Because we can choose which bots fight which other bots, at the simplest we could choose a random sample (with repeats) of bots for a challenger to fight a single round with.  From that we can get its win rate, and a confidence bound on that win rate.  So your global win rate is 75% +/- 3%, say.  And we can control what the worst case +/- factor is by increasing the sample size. Also as Peter pointed out, winning doesn't have to be binary.  You could have an 80% win after 100k cycles because you control 80% of the biomass in the sim, so your overall winrate can factor that in pretty easily. 

Your final win rate could be your rating, because it's basically an unbiased sample of your actual win rate if we ran an infinite number of rounds against all other bots, and that's ordered the same way that the elo ratings would be.  The disadvantage here is that the win rate will change over time as new bots are added to the league, so your rating is not constant over time.  But everytime a challenger is added we only need to run the rounds necessary for it to get its global win rate percentage.

If we anchored a bot with a specific elo (the animal minimalis equivalent has say 1000 elo) we could probably figure out elo ratings from the relative win percentages, I think.  Something something math math.

We'd have to rerun the leagues after every new version, though, as the win rates are pulled from old matches and no longer valid in that way.  But that could form the seasons Panda was talking about.

2.  There are N choose 2 ways to pair off N bots.  If each pairing produces a probability that A wins over B (for pair (A,B)), call it P(A > B ) (which is just A's win rate for the A-B match), we can take the inverse of the CDF of the unit normal distribution (call it Phi_Inv) and get a system of N choose 2 equations, and least squares solve it for elo ratings.  What I mean is: Phi_Inv(P(A > B )) = (s1 - s2) / (sqrt(2) Beta), where s1 and s2 are the elo ratings of A and B respectively, and Beta is the sqrt of the variance in performance of A and B (assuming each bot has the same variance, which is a big assumption but makes the math easier).  That's basically what Elo is trying to approximate.  But we have the computing power to calculate it directly.

I'm working through the articles on TrueSkill and Glickman right now; they might be doing something more clever.  At the very least they factor in confidence intervals.  But I think, because we can run exactly the rounds we want to and no more, and can choose how the matches are chosen, we can do a lot more global optimizing and a lot less incremental updating.

...

But yes, in principle I think the statistical approach is pretty compelling.  I think that's the way to go for sure.
Title: Re: DB3 Leagues
Post by: Numsgil on April 10, 2015, 12:30:50 PM
Oh, one more thought:

In situations of rock-paper-scissors, where there's a (large) margin of players choosing one strategy over another, the flat win rate I mentioned above would artificially inflate for the minority strategy, I think.  Would elo have that same problem?  I think not...  I'd want to see some simulations, probably.
Title: Re: DB3 Leagues
Post by: Panda on April 10, 2015, 02:35:58 PM
I'm working through the articles on TrueSkill and Glickman right now; they might be doing something more clever.  At the very least they factor in confidence intervals.  But I think, because we can run exactly the rounds we want to and no more, and can choose how the matches are chosen, we can do a lot more global optimizing and a lot less incremental updating.
The confidence decreases over time? This idea could be used to ignore differences between versions so no reruns and bots with a low confidence could be prioritised for a rerun?

In situations of rock-paper-scissors, where there's a (large) margin of players choosing one strategy over another, the flat win rate I mentioned above would artificially inflate for the minority strategy, I think.  Would elo have that same problem?  I think not...  I'd want to see some simulations, probably.
I can't work out your reasoning for this (don't judge my lack of statistical knowledge).
Title: Re: DB3 Leagues
Post by: Numsgil on April 10, 2015, 05:49:25 PM
I'm working through the articles on TrueSkill and Glickman right now; they might be doing something more clever.  At the very least they factor in confidence intervals.  But I think, because we can run exactly the rounds we want to and no more, and can choose how the matches are chosen, we can do a lot more global optimizing and a lot less incremental updating.
The confidence decreases over time? This idea could be used to ignore differences between versions so no reruns and bots with a low confidence could be prioritised for a rerun?

Confidence is a matter of how many games someone's played.  If they've played 2 times and beat a grandmaster, that doesn't necessarily mean they were a grandmaster.  It might be that they got very very lucky.  Elo handles it by handwaving if you have less than 20 games.  But I think TrueSkill explicitly handles it.

For new versions (of DB3) we'd probably want to wipe the slate clean and redo everything.  If you perform differently in one version (of DB3) over the next, it means the stats we gathered on you in one version are moot in the version we're currently running on.

For new versions of the bot, you could take the past performance as the starting values in to some Bayesian stats stuff I think.  A bit beyond me at the moment, but something I'm studying.

Quote
In situations of rock-paper-scissors, where there's a (large) margin of players choosing one strategy over another, the flat win rate I mentioned above would artificially inflate for the minority strategy, I think.  Would elo have that same problem?  I think not...  I'd want to see some simulations, probably.
I can't work out your reasoning for this (don't judge my lack of statistical knowledge).

Suppose you have three types of players: one always plays rock, one always plays scissors, and one always plays paper.  Suppose the ratios between them are 1:1:1.  I'd expect everyone's win percentages and elo rating to be roughly the same, since everyone wins a third of their games and ties a third of their games.  Now suppose the ratios are something like 2:1:1.  There are more rock players, so the paper players' win rates will be higher than the scissor's win rate (paper wins have their games and ties a quarter, scissors win a quarter and tie a quarter), even though scissors always win against paper.  Which is an odd artifact of using straight win rates.

I'm not sure but I think Elo might handle that case better.  But if it doesn't, it means I might lean towards using the straight win rates because there's no math involved in that.  But I'd have to play with it to know for sure.
Title: Re: DB3 Leagues
Post by: spike43884 on April 11, 2015, 08:21:14 AM
What about just to throw in here, because a major limiting factor is the CPU space we can dedicate to the leagues.
We run the league finale every week, or every other week... I mean, its not to long every week but it means were not constantly dedicating resources to it, maybe have it a round robin of 15 different bots (maybe 7 already ranked top bots, then the rest the entree's or challengers). Then we use bots spaced along the full ranking table as 'checkpoints' so its 2 species already ranked and the challenger, which run slightly more often (maybe daily) each species in each battle then getting a rating out of 100 depending on how they faired throughout the simulation (as I'd love ranking to take place maybe every quarter of the simulation, so if a bot is doing really good until the very last few cycles then its still got some chance to a better rating). Also I'd like to see some slight variation in each league from one to the next. Maybe occasionally 1 or 2 shapes, or a slightly larger simulation...or a tiny bit more friction, as we have slight variations in the real life enviroment?
Title: Re: DB3 Leagues
Post by: Numsgil on April 11, 2015, 01:21:14 PM
Here's a quick stab at some python to simulate some rock-paper-scissors type matches with asymmetric numbers of players playing each type.  Right now winrates are high for players with strategies that happen to beat the dominant strategy, as you'd expect.  Later I'm going to try and add in elo and see what it would do.  I found some python packages for it, but I'm feeling too lazy right now to figure it out.

Code: [Select]
import numpy
import skills
import elo

winrates = { }

wintable = numpy.matrix([
[ 0, 1, -1, ],
[-1, 0, 1, ],
[ 1, -1, 0 ]
])

typenames = [
"rock    ",
"paper   ",
"scissors"
]

print('type, "score", \t \t winrate')
for i in range(0, 10):
type = numpy.random.randint(0, 4)

if type > 2: type = 2

score = 0
matches = 0
wins = 0

for i in range(0, 10000):
opponent_type = numpy.random.randint(0, 4)
if opponent_type > 2: opponent_type = 2

score = score + wintable[type, opponent_type]
matches = matches + 1
wins = wins + (1 if wintable[type, opponent_type] > 0 else 0)

print(typenames[type], (score/matches), "\t", (wins/matches))

Title: Re: DB3 Leagues
Post by: Peter on April 11, 2015, 01:36:30 PM
I'm not sure but I think Elo might handle that case better.  But if it doesn't, it means I might lean towards using the straight win rates because there's no math involved in that.  But I'd have to play with it to know for sure.
A big advantage of Elo and alike over win rates is that wins from higher ranked players are valued more. With winrates if you beat the #2 and lost from the #1, you're mediocre. You beat 2 bots that can't even survive, you're amazing!

Edit: the code got a mistake. It's rounding the wins/matches to a int, getting zero.
Title: Re: DB3 Leagues
Post by: Numsgil on April 11, 2015, 02:38:35 PM
I'm not sure but I think Elo might handle that case better.  But if it doesn't, it means I might lean towards using the straight win rates because there's no math involved in that.  But I'd have to play with it to know for sure.
A big advantage of Elo and alike over win rates is that wins from higher ranked players are valued more. With winrates if you beat the #2 and lost from the #1, you're mediocre. You beat 2 bots that can't even survive, you're amazing!

And that's really important if the matches you're given information on aren't a random sample.  But I think if the matches you play are randomly chosen from the space of all possible matches, the win rate you get will be representative of your global win rate, which should correspond to elo, or at least have the same relative ordering, assuming sample sizes are big enough for either.

But I'm of two minds about it w.r.t. rock-paper-scissors situations.

Quote
Edit: the code got a mistake. It's rounding the wins/matches to a int, getting zero.

That's odd, it works on my machine.  Which version of Python are you running?  I'm using 3.4 I think.
Title: Re: DB3 Leagues
Post by: Peter on April 11, 2015, 02:59:37 PM
Python2, apparently it's still the default it starts up on my machine. Good to see they covered it in python3.

I don't think it should be randomly chosen from all bots. I like to have the upper bots to be able to fight each other enough to have a good leaguetable. I think fights should be randomly picked between positions, like if a bit is ranked #100, his opponent may be between #50 and #150. Fights are more even and will tell more, pinpointing the strength more clearly.

Assuming you got a large amount of bots that are weak. A decent(but not great) lucky bot can get a high win percentage with some luck. Can be compensated  eventually with more random fights, but Elo like wouldn't give him that advantage in the first place.
Title: Re: DB3 Leagues
Post by: Panda on April 12, 2015, 07:25:41 AM
Here's a quick stab at some python to simulate some rock-paper-scissors type matches with asymmetric numbers of players playing each type.  Right now winrates are high for players with strategies that happen to beat the dominant strategy, as you'd expect.  Later I'm going to try and add in elo and see what it would do.  I found some python packages for it, but I'm feeling too lazy right now to figure it out.

Code: [Select]
import numpy
import skills
import elo

winrates = { }

wintable = numpy.matrix([
[ 0, 1, -1, ],
[-1, 0, 1, ],
[ 1, -1, 0 ]
])

typenames = [
"rock    ",
"paper   ",
"scissors"
]

print('type, "score", \t \t winrate')
for i in range(0, 10):
type = numpy.random.randint(0, 4)

if type > 2: type = 2

score = 0
matches = 0
wins = 0

for i in range(0, 10000):
opponent_type = numpy.random.randint(0, 4)
if opponent_type > 2: opponent_type = 2

score = score + wintable[type, opponent_type]
matches = matches + 1
wins = wins + (1 if wintable[type, opponent_type] > 0 else 0)

print(typenames[type], (score/matches), "\t", (wins/matches))
Is this 1 player vs 4 players (including the original player) pitted against each other randomly 10000 times, or 1 player pitted against 10000 players? Just trying to work our how you'd do the elo in the second situation.
Title: Re: DB3 Leagues
Post by: spike43884 on April 12, 2015, 07:34:47 AM
Just to point out, did anyone actually read my point of only repeating leagues on a certain timescale. We know where most of our users come from by the language they speak, so run the leagues when their offline.
Title: Re: DB3 Leagues
Post by: Peter on April 12, 2015, 10:41:11 AM
Is this 1 player vs 4 players (including the original player) pitted against each other randomly 10000 times, or 1 player pitted against 10000 players? Just trying to work our how you'd do the elo in the second situation.
The second, you can nearly throw all of it away if you want to do elo.  :P
Title: Re: DB3 Leagues
Post by: Panda on April 12, 2015, 10:58:46 AM
We don't know the elo of each of the 10000 players when we're trying to simulate that, unless we just assume they're all new?
Title: Re: DB3 Leagues
Post by: Peter on April 12, 2015, 11:10:52 AM
There's need of some boilerplate code to keep track of all players.

Btw, I'm curious how well Elo/TrueSkill/Glicko/win% compare when matches are picked at random.
Title: Re: DB3 Leagues
Post by: Panda on April 12, 2015, 11:51:14 AM
Yeah, there is boilerplate code for it but I'm just trying to work out how to do it.

Each of these systems are designed so that matches are picked at random, aren't they? You basically want to take a small sample that represents the global population.
Title: Re: DB3 Leagues
Post by: Peter on April 12, 2015, 01:10:18 PM
Each of these systems are designed so that matches are picked at random, aren't they? You basically want to take a small sample that represents the global population.
They're not. Otherwise a chess grandmaster has to play a low ranked player often. They're designed to calculate the right strength of players. Including on how to deal with matches not being a random sample.

Edit: I was playing around with TrueSkill. And the library I took even has issues with high vs. low ranked players. Calculating skill as NaN in some cases...

edit2:
Code: [Select]
using System;
using System.Collections.Generic;
using Moserware.Skills;
using System.Diagnostics;

namespace TrueSkillTest
{
    class Program
    {
        static void Main(string[] args)
        {
            GameInfo defaultInfo = GameInfo.DefaultGameInfo;
            List<BotPlayer> botPlayers = new List<BotPlayer>();
           
            for (int i = 0; i < 10; i++)
            {
                var newb = new BotPlayer(i);
                newb.rating = defaultInfo.DefaultRating;
                botPlayers.Add(newb);

            }
            var random = new Random(1234);
            for (int i = 0; i < 250; i++)
            {
                int id1 = random.Next(10);
                int id2 = random.Next(10);
                if (id1 == id2)
                    continue;
           if (id1>id2)
           {
               int switchId = id1;
               id1=id2;
               id2=switchId;
           }

           var team1 = new Team(botPlayers[id1], botPlayers[id1].rating);
           var team2 = new Team(botPlayers[id2], botPlayers[id2].rating);
            var teams = Teams.Concat(team1, team2);
           
            var results = TrueSkillCalculator.CalculateNewRatings(defaultInfo, teams,1,2);
            if (results[botPlayers[id2]].ConservativeRating.Equals(double.NaN) || results[botPlayers[id1]].ConservativeRating.Equals(double.NaN))
            {
                Debug.WriteLine("NaN happened  "+id1   +" "+id2);
                continue;
            }
            botPlayers[id2].rating = results[botPlayers[id2]];
            botPlayers[id1].rating = results[botPlayers[id1]];
            botPlayers[id1].wins++;
            botPlayers[id2].losses++;
            botPlayers[id1].games++;
            botPlayers[id2].games++;
            }

            Debug.WriteLine( "id \t ConservativeRating  \t  Mean \t\t  StandardDeviation  \t win% ");

            for (int i = 0; i < 10; i++)
            {
                var rat = botPlayers[i].rating;
                Debug.WriteLine(i + "\t" + rat.ConservativeRating + "\t" + rat.Mean + "\t" + rat.StandardDeviation + "\t" + ((float)botPlayers[i].wins / (float) botPlayers[i].games)*100 +"% ");
            }

        }
    }

    public class BotPlayer : Player
    {
        public Rating rating;
        public int games, wins, losses;

        public BotPlayer(int i)
            :base(i)
        {
        }

    }

}

TrueSkill ratings and win% after 250 random matches.

Player with id 0>1>2>3 etc.

id     ConservativeRating       Mean         StandardDeviation      win%
0   36.6018093613847   44.3298552173074   2.57601528530759   100%
1   32.3380042534196   39.2166471449276   2.29288096383602   91.42857%
2   32.2120554191209   38.422586286346   2.07017695574173   89.58334%
3   24.4339328188783   29.6447006659815   1.73692261570105   50.9434%
4   23.2533477507645   29.1590048994883   1.96855238290791   68.18182%
5   18.3016520660433   23.4776208283092   1.72532292075527   44.64286%
6   13.2212442413914   19.1699259868905   1.98289391516637   26.82927%
7   10.0415863402407   15.7679791596343   1.90879760646453   25%
8   4.18826635383718   10.6105575553436   2.14076373383547   10.41667%
9   -2.27059972504378   4.86224661720219   2.37761544741532   0%

edit: RPC, TrueSkill same amount of matches
Major strategy is Scissor. Paper and Rock as minor strategies. As you can see the minor strategy beating the major tactic does take place.

Rock Paper Scissor
id     ConservativeRating    Mean            StandardDeviation      win%      games
0   22.2911857672074   25.635469626594   1.11476128646221     29.54545%    44  Scissor
1   21.7841748039772   25.0038555837073   1.07322692657669     17.77778%    45  Scissor
2   19.0423325347074   22.3747330556796   1.11080017365737     26.92308%    52  Paper
3   24.8168168325261   28.5840867091271   1.25575662553368     68.88889%    45  Rock
4   22.2261144113132   25.2695586097977   1.01448139949485     23.07692%    52  Scissor
5   21.3862248458399   24.6455366523104   1.08643726882351     15.55556%    45  Scissor
6   20.652998113402   24.0505831870944   1.13252835789746     11.62791%    43 Scissor
7   17.4614961926684   21.3511453077313   1.29654970502095     20%    40  Paper
8   22.4296910660544   25.6949142849584   1.08840773963469     34%    50  Scissor
9   24.2526401114872   28.2686183545096   1.3386594143408     71.42857%    42  Rock
Title: Re: DB3 Leagues
Post by: Numsgil on April 12, 2015, 04:39:12 PM
edit: RPC, TrueSkill same amount of matches
Major strategy is Scissor. Paper and Rock as minor strategies. As you can see the minor strategy beating the major tactic does take place.

Rock Paper Scissor
id     ConservativeRating    Mean            StandardDeviation      win%      games
0   22.2911857672074   25.635469626594   1.11476128646221     29.54545%    44  Scissor
1   21.7841748039772   25.0038555837073   1.07322692657669     17.77778%    45  Scissor
2   19.0423325347074   22.3747330556796   1.11080017365737     26.92308%    52  Paper
3   24.8168168325261   28.5840867091271   1.25575662553368     68.88889%    45  Rock
4   22.2261144113132   25.2695586097977   1.01448139949485     23.07692%    52  Scissor
5   21.3862248458399   24.6455366523104   1.08643726882351     15.55556%    45  Scissor
6   20.652998113402   24.0505831870944   1.13252835789746     11.62791%    43 Scissor
7   17.4614961926684   21.3511453077313   1.29654970502095     20%    40  Paper
8   22.4296910660544   25.6949142849584   1.08840773963469     34%    50  Scissor
9   24.2526401114872   28.2686183545096   1.3386594143408     71.42857%    42  Rock

Looks like it places them all around the same skill?  I think that's what we'd want if so.  It's hard to get a sense of the relative scales.    Can you add in a strategy that has a 80% chance to win against any of RPS?  I'd like to see if it's noticeably higher than all the others.
Title: Re: DB3 Leagues
Post by: spike43884 on April 13, 2015, 08:09:48 AM
I've been mulling over the elo thing. What about just using multiple factors across the entire simulation and scoring them, instead of relative to their opponents, relative to their performance in the simulation...Do it for multiple battles, maybe even average out the score and then rank them?
Title: Re: DB3 Leagues
Post by: Peter on April 13, 2015, 12:16:44 PM
Looks like it places them all around the same skill?  I think that's what we'd want if so.  It's hard to get a sense of the relative scales.    Can you add in a strategy that has a 80% chance to win against any of RPS?  I'd like to see if it's noticeably higher than all the others.
Added.
Well the Rocks are placed higher than Scissor, Scissor higher than Paper. But the difference isn't a lot.

id     ConservativeRating    Mean         StandardDeviation      win%    wins\draws\losses\matches\type
0   22.0095591377625    25.2836566772087     1.09136584648208     16.66667%   7   26   7   42   Scissor
1   21.7821811891753    24.8081119506593     1.00864358716132     23.07692%   12   23   11   52   Scissor
2   19.2005201463236    22.5465603332125     1.11534672896296     31.37255%   16   4   28   51   Paper
3   23.1122029152767    26.3987975437787     1.095531542834     53.84615%   28   5   13   52   Rock
4   22.1717569038829    25.5748763424698     1.13437314619566     21.95122%   9   23   4   41   Scissor
5   21.9605295488926    25.3588763504315     1.13278226717963     21.95122%   9   18   8   41   Scissor
6   21.1505654852868    24.269753597743     1.03972937081875     16.66667%   8   24   12   48   Scissor
7   18.2254824660445    21.8923336948571     1.22228374293755     22.72727%   10   4   26   44   Paper
8   21.4013812078553    24.7129946001344     1.1038711307597     20.93023%   9   20   9   43   Scissor
9   22.9477399984015    26.4990987016865     1.18378623442832     51.11111%   23   5   13   45   Rock
10   23.6013905336762    26.8589475968437     1.08585235438918     73.33334%   33   0   12   45   Percent80

After a million random matches the upper 2 sigma of Scissor is still higher than the lower 2 sigma of Rock.
id     ConservativeRating    Mean         StandardDeviation      win%    wins\draws\losses\matches\type
0   22.4396128116937    24.6603104463431     0.740232544883134     20.1069%   33217   82356   33136   165202   Scissor
1   22.3980369678499    24.6155831330575     0.739182055069193     19.99733%   32991   82504   32976   164977   Scissor
2   18.6285481147599    21.0068128521428     0.792754912460967     19.95815%   33000   16463   99321   165346   Paper
3   24.7068243962618    27.0176793425063     0.770284982081475     59.98304%   99038   16415   33058   165110   Rock
4   22.8829721831836    25.1137388028982     0.743588873238181     20.0285%   33030   82475   32812   164915   Scissor
5   23.0007569918914    25.2302209591881     0.743154655765566     19.97823%   33042   82536   33128   165390   Scissor
6   22.8817029092318    25.1019168680217     0.740071319596633     20.11042%   33330   82598   33106   165735   Scissor
7   20.4789112234694    22.7848573440528     0.768648706861115     19.92037%   32920   16463   99293   165258   Paper
8   23.2252037617677    25.4660024444581     0.746932894230129     19.98256%   33004   82413   33157   165164   Scissor
9   25.138645723613    27.4710506811453     0.777468319177446     60.21605%   99277   16415   32862   164868   Rock
10   24.5903878471413    26.8158904155278     0.741834189462153     79.99807%   132500   0   33129   165629   Percent80

Edit: it's wrong
Title: Re: DB3 Leagues
Post by: Numsgil on April 13, 2015, 12:26:47 PM
I find it interesting that after a million matches the one that wins 80% of the time is actually a bit lower than the second rock player.  I guess the system finds its occasional losses to "weak" bots confusing.  Also the flat win rate treats a draw the same as a loss, which is probably over penalizing the rock strategy.

Interesting results, I'll need to mull it over.
Title: Re: DB3 Leagues
Post by: Peter on April 13, 2015, 12:56:58 PM
For reference the results of player 0>1>2  with a million games.

1. The difference is quite big in comparison with RPC + random.
2. Fights with big difference in rating cause NaN errors. Games with NaN errors didn't count in the final results. You can see that in matches played, top and down players got less games registrations. NaN erros may be fixed somewhere in the settings, but at least in this implementation it doesn't like huge power differences. This can also be the reason for the flattened win rates, as a big portion of games got skipped. Might be fixed with different settings, I picked the default rating settings.
id     ConservativeRating    Mean         StandardDeviation         win%       wins\draws\losses\matches
0   157.364160859048    175.093666523749     5.90983522156717     100%      93357   0   0   93357   
1   122.187090794375    137.148117096133     4.98700876725273     68.71486%   94351   0   42957   137308   
2   89.1558840881495    103.337093844302     4.72706991871741     52.63669%   95334   0   85783   181117   
3   57.9310423376475    71.3621156175018     4.47702442661809     51.34298%   95290   0   90305   185595   
4   27.50057942478       40.4213063289044     4.30690896804144     50.3262%   94266   0   93044   187310   
5   -3.32350541905488    9.68240334301635     4.33530292069041     49.87444%   93547   0   94018   187565   
6   -34.637104314333    -21.3219560268339     4.4383827624997     48.72458%   90999   0   95763   186762   
7   -67.5653127637627    -53.3291196269003     4.74539771228745     47.14106%   85512   0   95884   181396   
8   -102.079416500229    -87.1381894801877     4.98040900668033     31.22995%   43028   0   94750   137778   
9   -142.80346240494    -125.061759330556     5.91390102479457     0%      0   0   93180   93180   
Title: Re: DB3 Leagues
Post by: Peter on April 13, 2015, 02:41:14 PM
Uh, there was a mistake with creating wins from the 80% win player. Half of his wins were registered as draws. :redface:

Correct stats.

RPC/random
id     ConservativeRating    Mean         StandardDeviation      win%    wins\draws\losses\matches\type
0   21.2025014802368    24.5072604715912     1.10158633045147     19.04762%   8   27   7   42   Scissor
1   21.1703789746567    24.2605458085435     1.03005561129561     22%   11   24   15   50   Scissor
2   19.3873917419835    22.8479451170323     1.15351779168295     38.77551%   19   3   27   49   Paper
3   23.2809913533546    26.6242733109851     1.11442731921019     58%   29   10   11   50   Rock
4   22.1327373676122    25.5369245974301     1.13472907660596     26.31579%   10   25   3   38   Scissor
5   20.8980341485925    24.3963410032841     1.16610228489721     23.07692%   9   21   9   39   Scissor
6   21.6943165763599    24.9387952612258     1.0814928949553     34.78261%   16   23   7   46   Scissor
7   17.9036343796341    22.1480159446247     1.41479385499687     40%   14   3   18   35   Paper
8   20.7780899350452    24.0535665005142     1.09182552182302     26.08696%   12   24   10   46   Scissor
9   22.3978387144379    25.7676782802645     1.12327985527554     49.01961%   25   10   16   51   Rock
10   26.8815568846738    31.1951332533136     1.43785878954662     76.19048%   32   0   10   42   Percent80

RPC/random after a million games

id     ConservativeRating    Mean         StandardDeviation      win%    wins\draws\losses\matches\type
0   21.8673281444733    24.1097779335181     0.747483263014924     27.83857%   45947   82858   36243   165048   Scissor
1   22.0131180929267    24.2333995808674     0.740093829313591     27.82333%   45948   82799   36395   165142   Scissor
2   17.8442112961662    20.2239537778002     0.793247493877987     27.95814%   46141   16516   102379   165036   Paper
3   24.4196078416116    26.7526319905644     0.777674716317609     68.05129%   112237   16466   36227   164930   Rock
4   21.8505862259689    24.0823444478475     0.743919407292882     28.11717%   46485   82358   36483   165326   Scissor
5   21.9716403885154    24.2016352220766     0.743331611187086     28.10022%   46489   82414   36537   165440   Scissor
6   21.7202167758399    23.9643481135206     0.748043779226886     28.10896%   46479   82355   36519   165353   Scissor
7   19.2396494056951    21.5913457018039     0.783898765369623     27.98824%   46360   16516   102765   165641   Paper
8   21.3340998118027    23.5937401326201     0.753213440272473     28.09844%   46422   82480   36310   165212   Scissor
9   24.2867711234825    26.617695694018     0.776974856845171     68.22602%   112824   16466   36078   165368   Rock
10   26.9820148326741    29.4765897709184     0.831524979414755     80.06544%   132638   0   33024   165662   Percent80
Title: Re: DB3 Leagues
Post by: Numsgil on April 13, 2015, 04:03:47 PM
I like those numbers much more :)
Title: Re: DB3 Leagues
Post by: spike43884 on April 14, 2015, 06:54:11 AM
[Feeling totally ignored]
Title: Re: DB3 Leagues
Post by: Panda on April 14, 2015, 10:05:01 AM
It's clear that the conversation is going in a different direction, spike43884, isn't it?
Title: Re: DB3 Leagues
Post by: spike43884 on April 15, 2015, 07:36:12 AM
It is, but we still have a sort of unresolved point.
Title: Re: DB3 Leagues
Post by: Botsareus on June 04, 2016, 09:08:11 PM
I am actually not totally against of the ranking model proposed here. But I think it will run real close to 4ever. I can take away from this is maybe I should test 9 robots down after testing the top robot. This way if it beats the top robot but loses to one of the next 9 robots it is appropriately ranked. Also I am planning to take the 700 population limit out just because the evolutions have no such limit and take a long time anyway. Finally on the issue of robots simply outlasting the time limit. I think outlasting a 'dynamic' time limit is perfectly legit as is configured right now. edit: Also to eliminate the luck factor a little I will probably make each league go up to a score of 2 but will not go as far as any statistical draw because that also takes 4ever.

Anyway this is what I took away from this post as far as DB3 is concerned. Unfortunately I do not think my ideas on the topic will make it into DB3, and I think the current DB2 config. is good enough. Maybe this will make it into my DB version if I ever get to write one.
Title: Re: DB3 Leagues
Post by: spike43884 on June 24, 2016, 03:38:57 AM
I am actually not totally against of the ranking model proposed here. But I think it will run real close to 4ever. I can take away from this is maybe I should test 9 robots down after testing the top robot. This way if it beats the top robot but loses to one of the next 9 robots it is appropriately ranked. Also I am planning to take the 700 population limit out just because the evolutions have no such limit and take a long time anyway. Finally on the issue of robots simply outlasting the time limit. I think outlasting a 'dynamic' time limit is perfectly legit as is configured right now. edit: Also to eliminate the luck factor a little I will probably make each league go up to a score of 2 but will not go as far as any statistical draw because that also takes 4ever.

Anyway this is what I took away from this post as far as DB3 is concerned. Unfortunately I do not think my ideas on the topic will make it into DB3, and I think the current DB2 config. is good enough. Maybe this will make it into my DB version if I ever get to write one.

Just throwing in a random idea for a structure of it.
Split bots randomly into groups of 3, 4, 5 (whatever you want really, perhaps even relative to the total number of bots?), winners move on to next stage, 2nd places play against eachother in groups of X, 2nd place winners & winners of round 1 in groups of X fight. Repeats till X bots remain. Final with the remaining bots all fighting?

Title: Re: DB3 Leagues
Post by: Botsareus on June 24, 2016, 12:49:26 PM
To reduce some confusion: Spike is talking about the current tournament league that I am planning to faze out. Also 7 robots down sounds better than 10 and should still do the job.
Title: Re: DB3 Leagues
Post by: spike43884 on June 25, 2016, 04:06:59 AM
To reduce some confusion: Spike is talking about the current tournament league that I am planning to faze out. Also 7 robots down sounds better than 10 and should still do the job.
Oh wow, thats actually the current tournament setup? (I've never had the need to locally run a tournament. I just hound the beastiary with bots).