AlphaGo Zero and the Foom Debate

AlphaGo Zero uses 4 TPUs, is built entirely out of neural nets with no handcrafted features, doesn't pretrain against expert games or anything else human, reaches a superhuman level after 3 days of self-play, and is the strongest version of AlphaGo yet.

The architecture has been simplified. Previous AlphaGo had a policy net that predicted good plays, and a value net that evaluated positions, both feeding into lookahead using MCTS (random probability-weighted plays out to the end of a game). AlphaGo Zero has one neural net that selects moves and this net is trained by Paul Christiano-style capability amplification, playing out games against itself to learn new probabilities for winning moves.

As others have also remarked, this seems to me to be an element of evidence that favors the Yudkowskian position over the Hansonian position in my and Robin Hanson's AI-foom debate.

As I recall and as I understood:

- Hanson doubted that what he calls "architecture" is much of a big deal, compared to (Hanson said) elements like cumulative domain knowledge, or special-purpose components built by specialized companies in what he expects to be an ecology of companies serving an AI economy.

- When I remarked upon how it sure looked to me like humans had an architectural improvement over chimpanzees that counted for a lot, Hanson replied that this seemed to him like a one-time gain from allowing the cultural accumulation of knowledge.

I emphasize how all the mighty human edifice of Go knowledge, the joseki and tactics developed over centuries of play, the experts teaching children from an early age, was entirely discarded by AlphaGo Zero with a subsequent performance improvement. These mighty edifices of human knowledge, as I understand the Hansonian thesis, are supposed to be the bulwark against rapid gains in AI capability across multiple domains at once. I was like "Human intelligence is crap and our accumulated skills are crap" and this appears to have been bourne out.

Similarly, single research labs like Deepmind are not supposed to pull far ahead of the general ecology, because adapting AI to any particular domain is supposed to require lots of components developed all over the place by a market ecology that makes those components available to other companies. AlphaGo Zero is much simpler than that. To the extent that nobody else can run out and build AlphaGo Zero, it's either because Google has Tensor Processing Units that aren't generally available, or because Deepmind has a silo of expertise for being able to actually make use of existing ideas like ResNets, or both.

Sheer speed of capability gain should also be highlighted here. Most of my argument for FOOM in the Y-H debate was about self-improvement and what happens when an optimization loop is folded in on itself. Though it wasn't necessary to my argument, the fact that Go play went from "nobody has come close to winning against a professional" to "so strongly superhuman they're not really bothering any more" over two years just because that's what happens when you improve and simplify the architecture, says you don't even need self-improvement to get things that look like FOOM.

Yes, Go is a closed system allowing for self-play. It still took humans centuries to learn how to play it. Perhaps the new Hansonian bulwark against rapid capability gain can be that the environment has lots of empirical bits that are supposed to be very hard to learn, even in the limit of AI thoughts fast enough to blow past centuries of human-style learning in 3 days; and that humans have learned these vital bits over centuries of cultural accumulation of knowledge, even though we know that humans take centuries to do 3 days of AI learning when humans have all the empirical bits they need; and that AIs cannot absorb this knowledge very quickly using "architecture", even though humans learn it from each other using architecture. If so, then let's write down this new world-wrecking assumption (that is, the world ends if the assumption is false) and be on the lookout for further evidence that this assumption might perhaps be wrong.

Tl;dr: As others are already remarking, the situation with AlphaGo Zero looks nothing like the Hansonian hypothesis and a heck of a lot more like the Yudkowskian one.

Added: AlphaGo clearly isn't a general AI. There's obviously stuff humans do that make us much more general than AlphaGo, and AlphaGo obviously doesn't do that. However, if even with the human special sauce we're to expect AGI capabilities to be slow, domain-specific, and requiring feed-in from a big market ecology, then the situation we see without human-equivalent generality special sauce should not look like this.

To put it another way, I put a lot of emphasis in my debate on recursive self-improvement and the remarkable jump in generality across the change from primate intelligence to human intelligence. It doesn't mean we can't get info about speed of capability gains without self-improvement. It doesn't mean we can't get info about the importance and generality of algorithms without the general intelligence trick. The debate can start to settle for fast capability gains before we even get to what I saw as the good parts; I wouldn't have predicted AlphaGo and lost money betting against the speed of its capability gains, because reality held a more extreme position than I did on the Yudkowsky-Hanson spectrum.


Crossposted here.

14 comments, sorted by
magical algorithm
Highlighting new comments since Today at 8:48 AM
Select new highlight date
Moderation Guidelines: Reign of Terror - I delete anything I judge to be annoying or counterproductiveexpand_more

This might not be the actually important or central questions, but given my background, it's where my brain keeps going...

I need to read up more about the techniques used, but they are clearly very general and basic. Given that, what other games or problems would we expect this system to master, versus ones that it would not master?

Go is a strange mix of a super hard problem and a strangely easy problem.

The game takes two hundred moves over a 19x19 board where large portions of that board might constitute not-obviously-wrong moves at any given time, often with sharp discontinuities in value where a slight adjustment changes groups from alive to dead or vice versa. Thus, using pure ply search and considering all possibilities is a non-starter and you need to have some way of figuring out what moves to consider at all. Doing anything with raw calculation doesn't work.

On the other hand, Go has some things that make it a good target for this algorithm and for ML in general. Go is well-defined and bounded. Go has a limited number of possible moves - yes it's very large, but you can know all of them and see what your ML function thinks of them for at least 1 ply. Go is complete information and deterministic. Go has a well-defined victory condition and no in-between goals (which AlphaGo exploits quite well). Go has no state - if you know the board and number of captured stones you don't care about how you got there, which goes with complete information since you don't need a prior on hidden information.

This means that Go has a lot of nice properties that allow you to use such a simple architecture. What would happen with other games?

This should work on Chess, with almost no modifications, so I'm wondering why they haven't done that, and whether we'll see it in a week. Would it crush the best computer chess programs out there? Wouldn't shock me if it did. Another thing I'd want to see is if it's better in the middle/end game versus the opening and by how much. I'd assume that AlphaChess would be basically perfect in the endgame and super strong in the middle game, whereas the opening is where we'd find out if past knowledge is actually worth a damn at all. Note that even if AlphaChess crushes everything after three days, we might learn interesting things by looking at it when it's only worked for less time than that and isn't yet crushing existing programs (even if that's a pretty short window, you can find it by continuously testing it against Stockfish or whatever between cycles). Similarly, I bet we'd get some interesting results if we saw what partially trained versions of AlphaGo-Zero do well versus poorly in Go, as well.

One of the nice properties of such games is that playing as if you are playing against an expert is basically correct, whereas in games with hidden information that is no longer a safe assumption. Would that prevent the system from being able to train against itself without getting exploited by strategies it had trained itself not to use? Could you get it to play Hold 'em this way without getting exploited?

In some senses I'm super impressed and scared with DeepMind smashing Go this badly, while in other ways it seems cool but not scary.

I wouldn't be so sure it'll work well on chess without modifications. In go, the board is fairly large relative to many important configurations that occur on it. This is a good fit for convolutional NNs, which I believe AlphaGo uses. In chess, the board is tiny. In go, even a crude evaluation of a position is subtle and probably expensive, and much of the time strategic considerations outweigh tactical ones. In chess, a simple material count is really cheap to maintain and tells a substantial fraction of the truth about who's ahead, and tactics commonly dominate. This makes expensive "leaf" evaluation with an NN much more attractive in go than in chess. In go, the raw branching factor is rather large (hundreds of legal moves) but it's "easy" (ha!) to prune it down a lot and be reasonably sure you aren't missing important moves. In chess, the raw branching factor isn't so large but almost any legal move could turn out to be best. For all these reasons, fairly naive searching + fairly simple evaluation is much more effective in chess, and fancy deep-NN evaluation (and move selection) seems less likely to make a good tradeoff. I'm sure AlphaGo-like chess programs are possible and would play reasonably well, but I'd be surprised if they could avoid just getting outsearched by something like Stockfish on "equal" hardware.

(I am not a very strong player of either game, nor have I implemented a program that plays either game well. I could well be wrong. But I think the above is the conventional wisdom, and it seems plausible to me.)

I do think Go is more tactical / sensitive to detailed changes than all that, and also that it's not as easy as it looks to narrow the range of plausible moves especially when there's nothing tactical going on, so this working in Go makes me optimistic about Chess. I agree that it's not a slam dunk but I'd certainly like to see it tried. One frustrating thing is that negative results aren't reported, so there might be lots of things that interestingly didn't work and they wouldn't tell us.

Giraffe is the most successful attempt I know of to build a NN-based chess program. It played pretty well but much weaker (on standard PC hardware) than more standard fast-searching rival programs. Of course it's entirely possible that the DeepMind team could do better. Giraffe's NN is a fairly simple and shallow one; I forget what AlphaGo's looks like but I think it's much bigger. (That doesn't mean a bigger one would be better for chess; as I indicated above, I suspect the optimal size of a NN for playing chess may be zero.)

https://arxiv.org/pdf/1712.01815.pdf

Shows AlphaGoZero training for a day, and then beating Stockfish at Chess. Also answers the question of why they hadn't applied it to Chess, since they have now done that (and to Shogi).

Yup. Of course it's not playing on "equal" hardware -- their TPUs pack a big punch -- but for sure this latest work is impressive. (They're beating Stockfish while searching something like 1000x fewer positions per second.)

Let's see. AlphaZero was using four TPUs, each of which does about 180 teraflops. Stockfish was using 64 threads, and let's suppose each of those gets about 2 GIPS. Naively equating flops to CPU operations, that says AZ had ~5000x more processing power. My hazy memory is that the last time I looked, which was several years ago, a 2x increase in speed bought you about 50 ELO rating points; I bet there are some diminishing returns so let's suppose the figure is 20 points per 2x speed increase at the Stockfish/AZ level. With all these simplistic assumptions, throwing AZ's processing power at Stockfish by more conventional means might buy you about 240 points of ELO rating, corresponding to a win probability (taking a draw as half a win) of about 0.8. Which is a little better than AZ is reported as getting against Stockfish.

Tentative conclusion: AZ is an impressive piece of work, especially as it learns entirely from self-play, but it's not clear that it's doing any better against Stockfish than the same amount of hardware would if dedicated to running Stockfish faster; so the latest evidence is still (just about) consistent with the hypothesis that the optimal size of a neural network for computer chess is zero.

I think it is interesting to look at the information efficiency. So we can ask how much better it is at absorbing information compared to humans civilization. So if we gave it and humanity a new problem how many trials/tests would we expect it to need compared to humans.

I'm guessing serious go players (who may have contributed to human knowledge) play on average of 2 games per day? Over a 60 year lifespan. Gives 44k games per person over their life time. Around 60 million go players currently today. I'm assuming 120 million over the lifespan of humanity, because of the rapid growth in our population. This gives roughly 5.25 x 10 ^12 games.


Alpha Go played 5 million games against itself. So it is roughly 1 million times more efficient than historical civilization (at learning Go).

Number are very much back of the envelope kind of thing. If anyone knows more about the likely number of games an average person plays I would be interested.

Caveats:

If we were both starting to learn a game from this point in history, we would expect humans to be more efficient at information extraction than we were historically as we had bad communication between them and they knowledge might have to be re-learnt many times.

Go might be an unusually bad game for transferring knowledge gained between people. We could look at people learning go and see how much they improved by playing the game vs by reading strategies/openings. We could look at other important tasks (like physics and programming) and see how much people improved by doing the activity vs reading about the activity.

Edit: So we should be really worried if new variant of Alpha Go comes out that only need 44k games to get as good as it did.

Given my Go knowledge I'm not sure that the amount of games matters. Part of what playing go is about is reading the variable tree. I don't think there's significantly more learning by playing 10 games in four hours than by playing one game in the same timeframe.

You make an assumption that the only activities that humans use when learning go is either playing go or reading about playing go. I don't think that's correct. A lot of what training to be a high level go player is about is solving go problems. I remember reading that some professional players have solved all published go problems.

Knowledge also gets passed from player to player by discussing games together which is an activity that also includes reading out positions.

I haven't played that much Go but this is very much correct for every other board or card game I know. Number of games doesn't matter much for making a human good, and number of games across humans means basically nothing. What matters, once you've played enough to get 'a handle' on things and give you enough food for further thought, is analysis, work, deliberate practice.

Thanks for the context, I've not played any significant go myself.

Would I be right to lump both solving go problems and discussing games in the knowledge transfer portion of equation. Solving go problems are a form of knowledge transfer, I assume because experienced go players think they will be salient for solving problems in go generally.

Okay so I think we can classify two types of behaviour, directly playing the game and other knowledge transfer (e.g. doing puzzles that people think are salient to the game, reading about strategies or discussing a particular interesting game; all these are done because other people good at the game thought they might be useful for other people to do).

The exact number of games might not be the important part, but I imagine experience is somewhat proportional to playing games. I'd be surprised if you could play no games and just do the knowledge transfer stuff, or only play one game and spend a long time thinking about it. I'm assuming different games will teach you (and humanity if you codify/spread your knowledge) about different parts of the game space.


Alpha Go Zero got better than humanity just by doing the first. I think it would help our predictions about foom if we could look to see how much direct interaction is important vs knowledge transfer in humans for different skills.

I've become substantially more confident in the past two years about Robin's position on the differences between humans and chimps.

Henrich's book The Secret Of Our Success presents a model in which social learning is the key advantage that caused humans to diverge from other apes. This model is more effective than any other model I've seen at explaining when human intelligence evolved.

To summarize chapter 2: Three year old humans score slightly worse than three year old chimpanzees on most subsets of IQ-like tests. "Social learning" is the only category where the humans outperform chimpanzees. Adult humans outperform chimpanzees on many but not all categories of IQ-like tests.

If humans have an important improvement in general intelligence over chimpanzees, why are its effects hard to observe at age 3?

Promoted to Featured for incorporating new and relevant evidence into the Foom Debate.

(Obviously I meant if you are not Chinese and have not been in China.)

"However, if even with the human special sauce we're to expect AGI capabilities to be slow, domain-specific, and requiring feed-in from a big market ecology, then the situation we see without human-equivalent generality special sauce should not look like this. "

Actually, most of what humans know is from other humans. This is not unrelated to generality: you cannot deduce facts about China from your experiences if you exclude what you have heard and read from other humans. Any general AI will be in the same situation. Most of what they know will have to come from the testimony of human beings. This is unlike the situation with Go, where everything is already available by considering the board and the moves.