DB3 Leagues

Code center > Darwinbots3

DB3 Leagues

<< < (4/9) > >>

Panda:

--- Quote from: Numsgil on April 10, 2015, 12:22:37 PM ---I'm working through the articles on TrueSkill and Glickman right now; they might be doing something more clever. At the very least they factor in confidence intervals. But I think, because we can run exactly the rounds we want to and no more, and can choose how the matches are chosen, we can do a lot more global optimizing and a lot less incremental updating.

--- End quote ---
The confidence decreases over time? This idea could be used to ignore differences between versions so no reruns and bots with a low confidence could be prioritised for a rerun?

--- Quote from: Numsgil on April 10, 2015, 12:30:50 PM ---In situations of rock-paper-scissors, where there's a (large) margin of players choosing one strategy over another, the flat win rate I mentioned above would artificially inflate for the minority strategy, I think. Would elo have that same problem? I think not... I'd want to see some simulations, probably.

--- End quote ---
I can't work out your reasoning for this (don't judge my lack of statistical knowledge).

Numsgil:

--- Quote from: Panda on April 10, 2015, 02:35:58 PM ---
--- Quote from: Numsgil on April 10, 2015, 12:22:37 PM ---I'm working through the articles on TrueSkill and Glickman right now; they might be doing something more clever. At the very least they factor in confidence intervals. But I think, because we can run exactly the rounds we want to and no more, and can choose how the matches are chosen, we can do a lot more global optimizing and a lot less incremental updating.

--- End quote ---
The confidence decreases over time? This idea could be used to ignore differences between versions so no reruns and bots with a low confidence could be prioritised for a rerun?

--- End quote ---

Confidence is a matter of how many games someone's played. If they've played 2 times and beat a grandmaster, that doesn't necessarily mean they were a grandmaster. It might be that they got very very lucky. Elo handles it by handwaving if you have less than 20 games. But I think TrueSkill explicitly handles it.

For new versions (of DB3) we'd probably want to wipe the slate clean and redo everything. If you perform differently in one version (of DB3) over the next, it means the stats we gathered on you in one version are moot in the version we're currently running on.

For new versions of the bot, you could take the past performance as the starting values in to some Bayesian stats stuff I think. A bit beyond me at the moment, but something I'm studying.

--- Quote ---
--- Quote from: Numsgil on April 10, 2015, 12:30:50 PM ---In situations of rock-paper-scissors, where there's a (large) margin of players choosing one strategy over another, the flat win rate I mentioned above would artificially inflate for the minority strategy, I think. Would elo have that same problem? I think not... I'd want to see some simulations, probably.

--- End quote ---
I can't work out your reasoning for this (don't judge my lack of statistical knowledge).

--- End quote ---

Suppose you have three types of players: one always plays rock, one always plays scissors, and one always plays paper. Suppose the ratios between them are 1:1:1. I'd expect everyone's win percentages and elo rating to be roughly the same, since everyone wins a third of their games and ties a third of their games. Now suppose the ratios are something like 2:1:1. There are more rock players, so the paper players' win rates will be higher than the scissor's win rate (paper wins have their games and ties a quarter, scissors win a quarter and tie a quarter), even though scissors always win against paper. Which is an odd artifact of using straight win rates.

I'm not sure but I think Elo might handle that case better. But if it doesn't, it means I might lean towards using the straight win rates because there's no math involved in that. But I'd have to play with it to know for sure.

spike43884:
What about just to throw in here, because a major limiting factor is the CPU space we can dedicate to the leagues.
We run the league finale every week, or every other week... I mean, its not to long every week but it means were not constantly dedicating resources to it, maybe have it a round robin of 15 different bots (maybe 7 already ranked top bots, then the rest the entree's or challengers). Then we use bots spaced along the full ranking table as 'checkpoints' so its 2 species already ranked and the challenger, which run slightly more often (maybe daily) each species in each battle then getting a rating out of 100 depending on how they faired throughout the simulation (as I'd love ranking to take place maybe every quarter of the simulation, so if a bot is doing really good until the very last few cycles then its still got some chance to a better rating). Also I'd like to see some slight variation in each league from one to the next. Maybe occasionally 1 or 2 shapes, or a slightly larger simulation...or a tiny bit more friction, as we have slight variations in the real life enviroment?

Numsgil:
Here's a quick stab at some python to simulate some rock-paper-scissors type matches with asymmetric numbers of players playing each type. Right now winrates are high for players with strategies that happen to beat the dominant strategy, as you'd expect. Later I'm going to try and add in elo and see what it would do. I found some python packages for it, but I'm feeling too lazy right now to figure it out.

--- Code: ---import numpy
import skills
import elo

winrates = { }

wintable = numpy.matrix([
[ 0, 1, -1, ],
[-1, 0, 1, ],
[ 1, -1, 0 ]
])

typenames = [
"rock ",
"paper ",
"scissors"
]

print('type, "score", \t \t winrate')
for i in range(0, 10):
type = numpy.random.randint(0, 4)

if type > 2: type = 2

score = 0
matches = 0
wins = 0

for i in range(0, 10000):
opponent_type = numpy.random.randint(0, 4)
if opponent_type > 2: opponent_type = 2

score = score + wintable[type, opponent_type]
matches = matches + 1
wins = wins + (1 if wintable[type, opponent_type] > 0 else 0)

print(typenames[type], (score/matches), "\t", (wins/matches))

--- End code ---

Peter:

--- Quote from: Numsgil on April 10, 2015, 05:49:25 PM ---I'm not sure but I think Elo might handle that case better. But if it doesn't, it means I might lean towards using the straight win rates because there's no math involved in that. But I'd have to play with it to know for sure.

--- End quote ---
A big advantage of Elo and alike over win rates is that wins from higher ranked players are valued more. With winrates if you beat the #2 and lost from the #1, you're mediocre. You beat 2 bots that can't even survive, you're amazing!

Edit: the code got a mistake. It's rounding the wins/matches to a int, getting zero.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version