Code center > Bugs and fixes

League Problems

<< < (16/19) > >>

Jez:
Griz, I have already changed the league far more than I intended at the start. In the interests of continuity and out of respect for all the good work PY has done in the past, it will remain a challenge league while I am running it.
I have already trod on the toes of tradition by starting a rerun, the rules that the league was set up under were that it was a challenge match and bug exploiting bots, once they had gained their position, were there until a newer bot beat all the other bots underneath them.
The very first league was just a list on the forum and was run in just the same way. I can’t/won’t change that of my own accord.
Out of respect for all the hard work Eric puts in to chasing bugs down as quickly as possible and the fact that some of the older bots will survive in the league longer if these ‘cheat’ bots are removed (plus previous complaints about non-working bots being in the league at all) I thought a quick shake up of the current standings would be a acceptable though.

The current order of things in the leagues has been determined by over 4+ years of matches.
Re-entering the bots in the current shake up by order of age would mean D. Scarab (2004) would be entered before Kyushu (2006)

A Round Robin style league, when the ability is within the program would make for an interesting and alternative style of league, until then, without the win button, or even with the win button, it would be a manually intensive, long drawn out way to decide the results.
I know you have done a lot of work along these lines, if you want to work out what the results would be I would be happy to create a Round Robin league so you can post the results. It is no good just setting up the initial rankings this way if you don’t allow new entries the same chance after all.

I do understand the point you are making but in the context of challenge matches the results will be both accurate and fair.

Numsgil:
About the statistical significance of the leagues, each match is a statistical test with n trials that attempts to test the null hypothesis that two bots are evenly matched.  The number of trials n increases until that null hypothesis is rejected.

A contendor is given the title of victor in a league match if it wins 1/2 n + sqrt(n) rounded up rounds.  It should be easy to see that eventually, this will reduce to: whichever bots wins the majority of rounds.  This happens when n approaches infinity (because the big O of 1/2n + sqrt(n) = big O of n).

Now, I don't know what sort of confidence interval this is using.  The thing with statistics, is that when you reject the null hypothesis you're never sure if you're not making an error.  Usually a confidence interval of 95% is used, which means that you'll be wrong when rejecting the null hypothesis 5% of the time.  Which means that when you run the leagues over, you might get a different result, because there's still that 5% error.  Or 1%, or .001 %, or whatever the confidence interval happens to be.  I'm going over my stats notes now to find out what confidence interval we're using.

The final league standings don't represent what bot is the "best".  It should be easy to see that the only possible ranking in a round robin tournament is by groups.  Ie: 0 losses, 1 losses, etc.  What a ladder represents, rather, is a somewhat arbitrary, but <I>fast</I> way of ranking contendors.  In general, the relative rankings represent relative strength.  But there are exceptions.  Run any real world ladder twice and you'll get 2 different results.

If we want to rank bots based on their absolute strength, we would need to have n^2 matches, where n is the number of bots.  We would need to use a chi squared test to rank them.

I'll present a well researched report on probability models to use in the leagues in a few hours.

EricL:
Well said both of you.

Without getting too far off topic, I would ultimatly like to see a real-time 'league' of internet connected sims where agregated statistics of the overall populations of each species in the distributed, connected "eco-verse" were available in real-time.  Imagine 20 (or 200, or 2000) distributed, teleporter-connected sims, running on each of our machines.  They could come and go, but some subset would always be running (a screen saver would help increase this number).  Each would report local population statistics to a central repository that everyone could access via a web page.  To compete, you simply build (or evolve) your bot, testing it off line in your own sim, and when you are ready, you connect your sim to the eco-verse.  How well your bot does, how high it places, would be a function of the overall population it acheives relative to all the others.  Now, each sim need not be running the same conditions.  There would be niches.  Some bots would not survive long in some local sims, perhaps no single species could be desinged to fully dominate every corner of the eco-verse yet some would acheive higher populations than others.  Combat would be a viable strategy but so would running and hiding and multiplying quietly.

I'm all for the tradition of 1:1 all out combat, but the real test would be to survie and compete simultaniously against the vast variety of bots in vast variety of environmental conditions in the eco-verse.  Ohh, I get goose flesh just thinking about it!

Griz:

--- Quote from: Numsgil ---About the statistical significance of the leagues, each match is a statistical test with n trials that attempts to test the null hypothesis that two bots are evenly matched.  The number of trials n increases until that null hypothesis is rejected.

A contendor is given the title of victor in a league match if it wins 1/2 n + sqrt(n) rounded up rounds.  It should be easy to see that eventually, this will reduce to: whichever bots wins the majority of rounds.  This happens when n approaches infinity (because the big O of 1/2n + sqrt(n) = big O of n).

Now, I don't know what sort of confidence interval this is using.  The thing with statistics, is that when you reject the null hypothesis you're never sure if you're not making an error.  Usually a confidence interval of 95% is used, which means that you'll be wrong when rejecting the null hypothesis 5% of the time.  Which means that when you run the leagues over, you might get a different result, because there's still that 5% error.  Or 1%, or .001 %, or whatever the confidence interval happens to be.  I'm going over my stats notes now to find out what confidence interval we're using.

The final league standings don't represent what bot is the "best".  It should be easy to see that the only possible ranking in a round robin tournament is by groups.  Ie: 0 losses, 1 losses, etc.  What a ladder represents, rather, is a somewhat arbitrary, but <I>fast</I> way of ranking contendors.  In general, the relative rankings represent relative strength.  But there are exceptions.  Run any real world ladder twice and you'll get 2 different results.

If we want to rank bots based on their absolute strength, we would need to have n^2 matches, where n is the number of bots.  We would need to use a chi squared test to rank them.
--- End quote ---
not so ...
you only need n(n+1)/2 ...
which in the case of 30 bots, is 465.
btw, leagues of 10 is only 55, a reasonable number
to run without taking days to do so.

and the chi-sqr test wouldn't rank them? ...
it's purpose is to tell you if your results fall withing a
range that is acceptable, that you can be confidant in.
please tell me exactly what data you would be using to
run this chi-sqr test in this case?

and once again ...
people, please hear what I am saying ...
I don't care how precise you think you are being in
calculating how many rounds it take to find a statistically
valid winner of a given match ...
when the 'arbitrary' initial order that you start the bots with ...
upon attempting to establish league standings ...
will have a much greater affect on their ranking than does
all your playing with numbers ...
unless every bot is not allowed to go up against every other.
that is all I am saying.

now you can dance around that all day long ...
it won't change a thing.
don't get so caught up in the details that you miss
the bigger picture here.

and don't shoot the messenger ...
just 'cause you don't like the message.

do it however you want ....
but please ...
don't pretend it's statistically valid as it is now.
it isn't.

Griz:

--- Quote from: Jez ---
--- End quote ---
it would be a manually intensive, long drawn out way to decide the results.

not really ...
regardless of the what method you use to
run a league, a 30 bot league is going to take you
one hell of a long time ... IF you ever even get thru
it without a crash somewhere along the way.
[let me know if/when you ever get it completed]  lol

and as I've pointed out ...
for each bot to meet every other ...
it's n(n+1)/2 ... not n^2
so ...
30 bots = 465 matches
20 bots = 210
10 bots = 55
8 bots = 36
and I still think we would be much better off making
4 leagues of 8 bots, 32 all told ...
then allowing the top one or two in each ...
to challenge the league ranked above them.
a new bot could start off challenging the lowest league ...
and be allowed to move to others if he is good enough
to 'make the cut'
compiling the rankings in this way ...
would not only be more accurate ...
but would actually make the project manageable.
 

--- Quote ---I do understand the point you are making but in the context of challenge matches the results will be both accurate and fair.
--- End quote ---
well ... close perhaps, but I don't see how you can say accurate.
of course one could always take a given bot that has been 'stopped' ...
and put it up against a higher ranked bot outside of leagues just
to see if it was a fluke, and if so, present the results and request
a rematch or re-evaluation.

but ... whatever.

Act as you will.
Go on as you feel.
This is the incomparable way.

I've got all those pesky real world things
that better deserve my attention anyway.

good luck.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version