Epistemic Status: True story
The plan is that this post will begin the sequence Zbybpu’f Nezl.
In college I once took a class called Rational Choice. Because obviously.
Each week we got the rules for, played and discussed a game. It was awesome.
For the grand finale, and to determine the winner of the prestigious Golden Shark Award for best overall performance, we submitted computer program architectures (we’d tell the professor what our program did, within reason, and he’d code it for us) to play The Darwin Game.
The Darwin Game is a variation slash extension of the iterated prisoner’s dilemma. It works like this:
For the first round, each player gets 100 copies of their program in the pool, and the pool pairs those programs at random. You can and often will play against yourself.
Each pair now plays an iterated prisoner’s dilemma variation, as follows. Each turn, each player simultaneously submits a number from 0 to 5. If the two numbers add up to 5 or less, both players earn points equal to their number. If the two numbers add up to 6 or more, neither player gets points. This game then lasts for a large but unknown number of turns, so no one knows when the game is about to end; for us this turned out to be 102 turns.
Each pairing is independent of every other pairing. You do not know what round of the game it is, whether you are facing a copy of yourself, or any history of the game to this point. Your decision algorithm does the same thing each pairing.
At the end of the round, all of the points scored by all of your copies are combined. Your percentage of all the points scored by all programs becomes the percentage of the pool your program gets in the next round. So if you score 10% more points, you get 10% more copies next round, and over time successful programs will displace less successful programs. Hence the name, The Darwin Game.
Your goal is to have as many copies in the pool at the end of the 200th round as possible, or failing that, to survive as many rounds as possible with at least one copy.
If both players coordinate to split the pot, they will score 2.5 per round.
To create some common terminology for discussions, ‘attack’ or means to submit 3 (or a higher number) more than half the time against an opponent willing to not do that, and to ‘fold’ or ‘surrender’ is to submit 2 (or a lower number) more than half the time, with ‘full surrender’ being to always submit 2. To ‘cooperate’ is to alternate 2 and 3 such that you each score 2.5 per round.
In this particular example we expected and got about 30 entries, and I was in second place in the points standings, so to win The Golden Shark, I had to beat David by a substantial amount and not lose horribly to the students in third or fourth.
What program do you submit?
(I recommend actually taking some time to think about this before you proceed.)
Some basic considerations I thought about:
1. The late game can come down to very small advantages that compound over time.
2. You need to survive the early game and win the late game. This means you need to succeed in a pool of mostly-not-smart programs, and then win in a pool of smart programs, and then in a pool of smart programs that outlasted other smart programs.
3. Scoring the maximum now regardless of what your opponent scores helps you early, but kills you late. In the late game, not letting your opponent score more points than you is very important, especially once you are down to two or three programs.
4. In the late game, how efficiently you cooperate with yourself is very important.
5. Your reaction to different programs in the mid game will help determine your opponents in the end game. If an opponent that outscores you in a pairing survives into the late game, and co-operates with itself, you lose.
6. It is all right to surrender, even fully surrender, to an opponent if and only if they will be wiped out by other programs before you become too big a portion of the pool, provided you can’t do better.
7. It is much more important to get to a good steady state than to get there quickly, although see point one. Getting people to surrender to you would be big game.
8. Some of the people entering care way more than others. Some programs will be complex and consider many cases and be trying hard to win, others will be very simple and not trying to be optimal.
9. It is hard to tell what others will interpret as cooperation and defection, and it might be easy to accidentally make them think you’re attacking them.
10. There will be some deeply silly programs out there at the start. One cannot assume early programs are doing remotely sensible things.
That leaves out many other considerations, including at least one central one. Next time, I’ll go over what happened on game day.
Starting with a simpler problem, let's say that I want to build a bot that cooperates with itself as well as possible and also tries to distribute points as evenly as possible between copies of itself. This is the best that I've come up with:
Always play either 2 (low) or 3 (high).
On the first turn, play 2 with probability p and 3 with probability 1-p.
If we got 5 points last turn, then play what my opponent played last turn. (This makes mutual cooperation continue forever, while alternating high-low.)
If we got less than 5 points last turn, then play 2 with probability q and 3 with probability 1-q.
If we got went over 5 last turn, then play 2 with probability r and 3 with probability 1-r.
Let's call this fairbot(p,q,r).
If my calculations are correct, expected score in self-play is maximized with p=q=r=0.69 (to the nearest .01). On average this brings in a combined 2.24 total points less than the fair maximum (5 per turn) for the two bots, because it might take a few turns to coordinate on which copy plays 3 vs. 2. (The quickest coordination happens with p=q=r=0.5, but that would miss out on 3 points because half the coordination failures cost the whole pot; with more weight on 2 the coordination failure happens more often but usually only shrinks the pot by 1 point.)
Starting to think about how fairbot plays against other bots:
Fairbot never settles into a stable pattern against "always 3" or "always 2"; it continues to switch between 2 and 3 (and plays the same thing as its opponent more often than not). Presumably it would be better it stick with 3s against "always 2" and to either stick with 2s or stick with 3s against "always 3" (depending on whether I prioritize relative or absolute score).
If I knew that fairbot(0.69,0.69,0.69) was my opponent, I think I'd want start by playing 3s, and then settle into mutual alternating coordination once we got a pot of 5. I would do slightly better than this bot does against me or against itself. I could get a much larger relative score by playing always 3, but it would hurt my absolute score by a lot. Or possibly I could do something in between.
An obvious way to make fairbot(p,q,r) more aggressive / less exploitable is to reduce p, q, or especially r.
We can add other rules which cover situations that never arise in self-play, such as when the opponent plays something other than 2 or 3, or when 5-point mutual cooperation gets broken (a non-5 pot after the previous turn had a pot of 5).
I think I found a way to do slightly better than .69 for 2 and .31 for 3 when cooperating with yourself:
Play 1 with probability .12, 2 with .60, and 3 with .28. If you played the same number, repeat. If you played different numbers, the higher one can play a 3 and the lower one can play a 2 (or whichever cooperating pattern you choose), and they can start alternating from there.
If my calculations are correct, this only loses 2.10 points on average from the maximum possible, which seems to be the best you can do. The goal of the randomization is to pick a ... (read more)