Author Topic: DB3 Leagues (Read 20111 times)

Panda · « **Reply #15 on:** April 10, 2015, 02:35:58 PM »

Quote from: Numsgil on April 10, 2015, 12:22:37 PM

I'm working through the articles on TrueSkill and Glickman right now; they might be doing something more clever. At the very least they factor in confidence intervals. But I think, because we can run exactly the rounds we want to and no more, and can choose how the matches are chosen, we can do a lot more global optimizing and a lot less incremental updating.

The confidence decreases over time? This idea could be used to ignore differences between versions so no reruns and bots with a low confidence could be prioritised for a rerun?

Quote from: Numsgil on April 10, 2015, 12:30:50 PM

In situations of rock-paper-scissors, where there's a (large) margin of players choosing one strategy over another, the flat win rate I mentioned above would artificially inflate for the minority strategy, I think. Would elo have that same problem? I think not... I'd want to see some simulations, probably.

I can't work out your reasoning for this (don't judge my lack of statistical knowledge).

Numsgil · « **Reply #16 on:** April 10, 2015, 05:49:25 PM »

Quote from: Panda on April 10, 2015, 02:35:58 PM

Quote from: Numsgil on April 10, 2015, 12:22:37 PM
I'm working through the articles on TrueSkill and Glickman right now; they might be doing something more clever. At the very least they factor in confidence intervals. But I think, because we can run exactly the rounds we want to and no more, and can choose how the matches are chosen, we can do a lot more global optimizing and a lot less incremental updating.
The confidence decreases over time? This idea could be used to ignore differences between versions so no reruns and bots with a low confidence could be prioritised for a rerun?

Confidence is a matter of how many games someone's played. If they've played 2 times and beat a grandmaster, that doesn't necessarily mean they were a grandmaster. It might be that they got very very lucky. Elo handles it by handwaving if you have less than 20 games. But I think TrueSkill explicitly handles it.

For new versions (of DB3) we'd probably want to wipe the slate clean and redo everything. If you perform differently in one version (of DB3) over the next, it means the stats we gathered on you in one version are moot in the version we're currently running on.

For new versions of the bot, you could take the past performance as the starting values in to some Bayesian stats stuff I think. A bit beyond me at the moment, but something I'm studying.

Quote

Quote from: Numsgil on April 10, 2015, 12:30:50 PM
In situations of rock-paper-scissors, where there's a (large) margin of players choosing one strategy over another, the flat win rate I mentioned above would artificially inflate for the minority strategy, I think. Would elo have that same problem? I think not... I'd want to see some simulations, probably.
I can't work out your reasoning for this (don't judge my lack of statistical knowledge).

Suppose you have three types of players: one always plays rock, one always plays scissors, and one always plays paper. Suppose the ratios between them are 1:1:1. I'd expect everyone's win percentages and elo rating to be roughly the same, since everyone wins a third of their games and ties a third of their games. Now suppose the ratios are something like 2:1:1. There are more rock players, so the paper players' win rates will be higher than the scissor's win rate (paper wins have their games and ties a quarter, scissors win a quarter and tie a quarter), even though scissors always win against paper. Which is an odd artifact of using straight win rates.

I'm not sure but I think Elo might handle that case better. But if it doesn't, it means I might lean towards using the straight win rates because there's no math involved in that. But I'd have to play with it to know for sure.

spike43884 · « **Reply #17 on:** April 11, 2015, 08:21:14 AM »

What about just to throw in here, because a major limiting factor is the CPU space we can dedicate to the leagues.
We run the league finale every week, or every other week... I mean, its not to long every week but it means were not constantly dedicating resources to it, maybe have it a round robin of 15 different bots (maybe 7 already ranked top bots, then the rest the entree's or challengers). Then we use bots spaced along the full ranking table as 'checkpoints' so its 2 species already ranked and the challenger, which run slightly more often (maybe daily) each species in each battle then getting a rating out of 100 depending on how they faired throughout the simulation (as I'd love ranking to take place maybe every quarter of the simulation, so if a bot is doing really good until the very last few cycles then its still got some chance to a better rating). Also I'd like to see some slight variation in each league from one to the next. Maybe occasionally 1 or 2 shapes, or a slightly larger simulation...or a tiny bit more friction, as we have slight variations in the real life enviroment?

Numsgil · « **Reply #18 on:** April 11, 2015, 01:21:14 PM »

Here's a quick stab at some python to simulate some rock-paper-scissors type matches with asymmetric numbers of players playing each type. Right now winrates are high for players with strategies that happen to beat the dominant strategy, as you'd expect. Later I'm going to try and add in elo and see what it would do. I found some python packages for it, but I'm feeling too lazy right now to figure it out.

Code: [Select]

import numpy
import skills
import elo

winrates = { }

wintable = numpy.matrix([
	[ 0, 1, -1, ],
	[-1, 0, 1, ],
	[ 1, -1, 0 ]
])

typenames = [
	"rock    ",
	"paper   ", 
	"scissors"
]

print('type, "score", \t \t winrate')
for i in range(0, 10):
	type = numpy.random.randint(0, 4)
	
	if type > 2: type = 2
	
	score = 0
	matches = 0
	wins = 0
	
	for i in range(0, 10000):
		opponent_type = numpy.random.randint(0, 4)		
		if opponent_type > 2: opponent_type = 2
		
		score = score + wintable[type, opponent_type]		
		matches = matches + 1
		wins = wins + (1 if wintable[type, opponent_type] > 0 else 0)
		
	print(typenames[type], (score/matches), "\t", (wins/matches))

Peter · « **Reply #19 on:** April 11, 2015, 01:36:30 PM »

Quote from: Numsgil on April 10, 2015, 05:49:25 PM

I'm not sure but I think Elo might handle that case better. But if it doesn't, it means I might lean towards using the straight win rates because there's no math involved in that. But I'd have to play with it to know for sure.

A big advantage of Elo and alike over win rates is that wins from higher ranked players are valued more. With winrates if you beat the #2 and lost from the #1, you're mediocre. You beat 2 bots that can't even survive, you're amazing!

Edit: the code got a mistake. It's rounding the wins/matches to a int, getting zero.

Numsgil · « **Reply #20 on:** April 11, 2015, 02:38:35 PM »

Quote from: Peter on April 11, 2015, 01:36:30 PM

Quote from: Numsgil on April 10, 2015, 05:49:25 PM
I'm not sure but I think Elo might handle that case better. But if it doesn't, it means I might lean towards using the straight win rates because there's no math involved in that. But I'd have to play with it to know for sure.
A big advantage of Elo and alike over win rates is that wins from higher ranked players are valued more. With winrates if you beat the #2 and lost from the #1, you're mediocre. You beat 2 bots that can't even survive, you're amazing!

And that's really important if the matches you're given information on aren't a random sample. But I think if the matches you play are randomly chosen from the space of all possible matches, the win rate you get will be representative of your global win rate, which should correspond to elo, or at least have the same relative ordering, assuming sample sizes are big enough for either.

But I'm of two minds about it w.r.t. rock-paper-scissors situations.

Quote

Edit: the code got a mistake. It's rounding the wins/matches to a int, getting zero.

That's odd, it works on my machine. Which version of Python are you running? I'm using 3.4 I think.

Peter · « **Reply #21 on:** April 11, 2015, 02:59:37 PM »

Python2, apparently it's still the default it starts up on my machine. Good to see they covered it in python3.

I don't think it should be randomly chosen from all bots. I like to have the upper bots to be able to fight each other enough to have a good leaguetable. I think fights should be randomly picked between positions, like if a bit is ranked #100, his opponent may be between #50 and #150. Fights are more even and will tell more, pinpointing the strength more clearly.

Assuming you got a large amount of bots that are weak. A decent(but not great) lucky bot can get a high win percentage with some luck. Can be compensated eventually with more random fights, but Elo like wouldn't give him that advantage in the first place.

Panda · « **Reply #22 on:** April 12, 2015, 07:25:41 AM »

Quote from: Numsgil on April 11, 2015, 01:21:14 PM

Here's a quick stab at some python to simulate some rock-paper-scissors type matches with asymmetric numbers of players playing each type. Right now winrates are high for players with strategies that happen to beat the dominant strategy, as you'd expect. Later I'm going to try and add in elo and see what it would do. I found some python packages for it, but I'm feeling too lazy right now to figure it out.

Code: [Select]
import numpy import skills import elo winrates = { } wintable = numpy.matrix([ [ 0, 1, -1, ], [-1, 0, 1, ], [ 1, -1, 0 ] ]) typenames = [ "rock ", "paper ", "scissors" ] print('type, "score", \t \t winrate') for i in range(0, 10): type = numpy.random.randint(0, 4) if type > 2: type = 2 score = 0 matches = 0 wins = 0 for i in range(0, 10000): opponent_type = numpy.random.randint(0, 4) if opponent_type > 2: opponent_type = 2 score = score + wintable[type, opponent_type] matches = matches + 1 wins = wins + (1 if wintable[type, opponent_type] > 0 else 0) print(typenames[type], (score/matches), "\t", (wins/matches))

Is this 1 player vs 4 players (including the original player) pitted against each other randomly 10000 times, or 1 player pitted against 10000 players? Just trying to work our how you'd do the elo in the second situation.

spike43884 · « **Reply #23 on:** April 12, 2015, 07:34:47 AM »

Just to point out, did anyone actually read my point of only repeating leagues on a certain timescale. We know where most of our users come from by the language they speak, so run the leagues when their offline.

Peter · « **Reply #24 on:** April 12, 2015, 10:41:11 AM »

Quote from: Panda on April 12, 2015, 07:25:41 AM

Is this 1 player vs 4 players (including the original player) pitted against each other randomly 10000 times, or 1 player pitted against 10000 players? Just trying to work our how you'd do the elo in the second situation.

The second, you can nearly throw all of it away if you want to do elo.

Panda · « **Reply #25 on:** April 12, 2015, 10:58:46 AM »

We don't know the elo of each of the 10000 players when we're trying to simulate that, unless we just assume they're all new?

Peter · « **Reply #26 on:** April 12, 2015, 11:10:52 AM »

There's need of some boilerplate code to keep track of all players.

Btw, I'm curious how well Elo/TrueSkill/Glicko/win% compare when matches are picked at random.

Panda · « **Reply #27 on:** April 12, 2015, 11:51:14 AM »

Yeah, there is boilerplate code for it but I'm just trying to work out how to do it.

Each of these systems are designed so that matches are picked at random, aren't they? You basically want to take a small sample that represents the global population.

Peter · « **Reply #28 on:** April 12, 2015, 01:10:18 PM »

Quote from: Panda on April 12, 2015, 11:51:14 AM

Each of these systems are designed so that matches are picked at random, aren't they? You basically want to take a small sample that represents the global population.

They're not. Otherwise a chess grandmaster has to play a low ranked player often. They're designed to calculate the right strength of players. Including on how to deal with matches not being a random sample.

Edit: I was playing around with TrueSkill. And the library I took even has issues with high vs. low ranked players. Calculating skill as NaN in some cases...

edit2:

Code: [Select]

using System;
using System.Collections.Generic;
using Moserware.Skills;
using System.Diagnostics;

namespace TrueSkillTest
{
    class Program
    {
        static void Main(string[] args)
        {
            GameInfo defaultInfo = GameInfo.DefaultGameInfo;
            List<BotPlayer> botPlayers = new List<BotPlayer>();
            
            for (int i = 0; i < 10; i++)
            {
                var newb = new BotPlayer(i);
                newb.rating = defaultInfo.DefaultRating;
                botPlayers.Add(newb);

            }
            var random = new Random(1234);
            for (int i = 0; i < 250; i++)
            {
                int id1 = random.Next(10);
                int id2 = random.Next(10);
                if (id1 == id2)
                    continue;
           if (id1>id2)
           {
               int switchId = id1;
               id1=id2;
               id2=switchId;
           }

           var team1 = new Team(botPlayers[id1], botPlayers[id1].rating);
           var team2 = new Team(botPlayers[id2], botPlayers[id2].rating);
            var teams = Teams.Concat(team1, team2);
            
            var results = TrueSkillCalculator.CalculateNewRatings(defaultInfo, teams,1,2);
            if (results[botPlayers[id2]].ConservativeRating.Equals(double.NaN) || results[botPlayers[id1]].ConservativeRating.Equals(double.NaN))
            {
                Debug.WriteLine("NaN happened  "+id1   +" "+id2);
                continue;
            }
            botPlayers[id2].rating = results[botPlayers[id2]];
            botPlayers[id1].rating = results[botPlayers[id1]];
            botPlayers[id1].wins++;
            botPlayers[id2].losses++;
            botPlayers[id1].games++;
            botPlayers[id2].games++;
            }

            Debug.WriteLine( "id \t ConservativeRating  \t  Mean \t\t  StandardDeviation  \t win% ");

            for (int i = 0; i < 10; i++)
            {
                var rat = botPlayers[i].rating;
                Debug.WriteLine(i + "\t" + rat.ConservativeRating + "\t" + rat.Mean + "\t" + rat.StandardDeviation + "\t" + ((float)botPlayers[i].wins / (float) botPlayers[i].games)*100 +"% ");
            }

        }
    }

    public class BotPlayer : Player
    {
        public Rating rating;
        public int games, wins, losses;

        public BotPlayer(int i)
            :base(i)
        {
        }

    }

}

TrueSkill ratings and win% after 250 random matches.

Player with id 0>1>2>3 etc.

id     ConservativeRating      Mean         StandardDeviation     win%
0   36.6018093613847   44.3298552173074   2.57601528530759   100%
1   32.3380042534196   39.2166471449276   2.29288096383602   91.42857%
2   32.2120554191209   38.422586286346   2.07017695574173   89.58334%
3   24.4339328188783   29.6447006659815   1.73692261570105   50.9434%
4   23.2533477507645   29.1590048994883   1.96855238290791   68.18182%
5   18.3016520660433   23.4776208283092   1.72532292075527   44.64286%
6   13.2212442413914   19.1699259868905   1.98289391516637   26.82927%
7   10.0415863402407   15.7679791596343   1.90879760646453   25%
8   4.18826635383718   10.6105575553436   2.14076373383547   10.41667%
9   -2.27059972504378   4.86224661720219   2.37761544741532   0%

edit: RPC, TrueSkill same amount of matches
Major strategy is Scissor. Paper and Rock as minor strategies. As you can see the minor strategy beating the major tactic does take place.

Rock Paper Scissor
id     ConservativeRating    Mean            StandardDeviation     win%      games
0   22.2911857672074   25.635469626594   1.11476128646221    29.54545% 44 Scissor
1   21.7841748039772   25.0038555837073   1.07322692657669    17.77778% 45 Scissor
2   19.0423325347074   22.3747330556796   1.11080017365737    26.92308% 52 Paper
3   24.8168168325261   28.5840867091271   1.25575662553368    68.88889% 45 Rock
4   22.2261144113132   25.2695586097977   1.01448139949485    23.07692% 52 Scissor
5   21.3862248458399   24.6455366523104   1.08643726882351    15.55556% 45 Scissor
6   20.652998113402   24.0505831870944   1.13252835789746    11.62791% 43 Scissor
7   17.4614961926684   21.3511453077313   1.29654970502095    20% 40 Paper
8   22.4296910660544   25.6949142849584   1.08840773963469    34% 50 Scissor
9   24.2526401114872   28.2686183545096   1.3386594143408    71.42857% 42 Rock

Numsgil · « **Reply #29 on:** April 12, 2015, 04:39:12 PM »

Quote from: Peter on April 12, 2015, 01:10:18 PM

edit: RPC, TrueSkill same amount of matches
Major strategy is Scissor. Paper and Rock as minor strategies. As you can see the minor strategy beating the major tactic does take place.

Rock Paper Scissor
id     ConservativeRating    Mean            StandardDeviation     win%      games
0   22.2911857672074   25.635469626594   1.11476128646221    29.54545% 44 Scissor
1   21.7841748039772   25.0038555837073   1.07322692657669    17.77778% 45 Scissor
2   19.0423325347074   22.3747330556796   1.11080017365737    26.92308% 52 Paper
3   24.8168168325261   28.5840867091271   1.25575662553368    68.88889% 45 Rock
4   22.2261144113132   25.2695586097977   1.01448139949485    23.07692% 52 Scissor
5   21.3862248458399   24.6455366523104   1.08643726882351    15.55556% 45 Scissor
6   20.652998113402   24.0505831870944   1.13252835789746    11.62791% 43 Scissor
7   17.4614961926684   21.3511453077313   1.29654970502095    20% 40 Paper
8   22.4296910660544   25.6949142849584   1.08840773963469    34% 50 Scissor
9   24.2526401114872   28.2686183545096   1.3386594143408    71.42857% 42 Rock

Looks like it places them all around the same skill? I think that's what we'd want if so. It's hard to get a sense of the relative scales. Can you add in a strategy that has a 80% chance to win against any of RPS? I'd like to see if it's noticeably higher than all the others.

Darwinbots Forum

News:

Author Topic: DB3 Leagues (Read 20111 times)

Panda

Re: DB3 Leagues

Numsgil

Re: DB3 Leagues

spike43884

Re: DB3 Leagues

Numsgil

Re: DB3 Leagues

Peter

Re: DB3 Leagues

Numsgil

Re: DB3 Leagues

Peter

Re: DB3 Leagues

Panda

Re: DB3 Leagues

spike43884

Re: DB3 Leagues

Peter

Re: DB3 Leagues

Panda

Re: DB3 Leagues

Peter

Re: DB3 Leagues

Panda

Re: DB3 Leagues

Peter

Re: DB3 Leagues

Numsgil

Re: DB3 Leagues