How big a deal is this? What, if anything, does it signal about when we get smarter than human AI?

[Link] AlphaGo: Mastering the ancient game of Go with Machine Learning

DeepMind's go AI, called AlphaGo, has beaten the European champion with a score of 5-0. A match against top ranked human, Lee Se-dol, is scheduled for March.

 

Games are a great testing ground for developing smarter, more flexible algorithms that have the ability to tackle problems in ways similar to humans. Creating programs that are able to play games better than the best humans has a long history

[...]

But one game has thwarted A.I. research thus far: the ancient game of Go.


http://googleresearch.blogspot.com/2016/01/alphago-mastering-ancient-game-of-go.html

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 6:58 AM
Select new highlight date
Rendering 50/122 comments  show more

How big a deal is this? What, if anything, does it signal about when we get smarter than human AI?

It's a big deal for Go, but I don't think it's a very big deal for AGI.

Conceptually Go is like Chess or Checkers: fully deterministic, perfect information two-player games.

Go is more challenging for computers because the search space (and in particular the average branching factor) is larger and known position evaluation heuristics are not as good, so traditional alpha-beta minimax search becomes infeasible.

The first big innovation, already put into use by most Go programs for a decade (although the idea is older) was Monte Carlo tree search, which addresses the high branching factor issue: while traditional search either does not expand a node or expands it and recursively evaluates all its children, MCTS stochastically evaluates nodes with a probability that depends on how promising they look, according to some heuristic.

DeepMind's innovation consists in using a NN to learn a good position evaluation heuristic in a supervised fashion from a large database of professional games, refining it with reinforcement learning in "greedy" self-play mode and then using both the refined heuristic and the supervised heuristic in a MCTS engine.

Their approach essentially relies on big data and big hardware. From an engineering point of view, it is a major advancement of neural network technology because of the sheer scale and in particular the speed of the thing, which required significant non-trivial parallelization, but the core techniques aren't particularly new and I doubt that they can scale well to more general domains with non-determinism and partial observability. However, neural networks may be more robust to noise and certain kinds of disturbances than hand-coded heuristics, so take this with a grain of salt.

So, to the extent that AGI will rely on large and fast neural networks, this work is a significant step towards practical AGI engineering, but to the extent that AGI will rely on some "master algorithm" this work is probably not a very big step towards the discovery of such algorithm, at least compared to previously known techniques.

I think it is a bigger deal than chess because it doesn't use brute-force but mostly unsupervised learning. It is not the breakthrough in AGI but it is telling that this approach thoroughly beats all the other Go algorithms (1 out of 500 plays lost, even with handicap 4. And they say that it still improves by training.

Thanks. Key quote:

What this indicates is not that deep learning in particular is going to be the Game Over algorithm. Rather, the background variables are looking more like "Human neural intelligence is not that complicated and current algorithms are touching on keystone, foundational aspects of it." What's alarming is not this particular breakthrough, but what it implies about the general background settings of the computational universe.

His argument proves too much.

You could easily transpose it for the time when Checkers or Chess programs beat professional players: back then the "keystone, foundational aspect" of intelligence was thought to be the ability to do combinatorial search in large solution spaces, and scaling up to AGI was "just" a matter of engineering better heuristics. Sure, it didn't work on Go yet, but Go players were not using a different cortical algorithm than Chess players, were they?

Or you could transpose it for the time when MCTS Go programs reached "dan" (advanced amateur) level. They still couldn't beat professional players, but professional players were not using a different cortical algorithm than advanced amateur players, were they?

AlphaGo succeded at the current achievement by using artificial neural networks in a regime where they are know to do well. But this regime, and the type of games like Go, Chess, Checkers, Othello, etc. represent a small part of the range of human cognitive tasks. In fact, we probably find this kind of board games fascinating precisely because they are very different than the usual cognitive stimuli we deal with in everyday life.

It's tempting to assume that the "keystone, foundational aspect" of intelligence is learning essentially the same way that artificial neural networks learn. But humans can do things like "one-shot" learning, learning from weak supervision, learning in non-stationary environments, etc. which no current neural network can do, and not just because a matter of scale or architectural "details". Researchers generally don't know how to make neural networks, or really any other kind of machine learning algorithm, do these things, except with massive task-specific engineering. Thus I think it's fair to say that we still don't know what the foundational aspects of intelligence are.

In the brain, the same circuitry that is used to solve vision is used to solve most of the rest of cognition - vision is 10% of the cortex. Going from superhuman vision to superhuman Go suggests superhuman anything/everything is getting near.

The reason being that strong Go requires both deep slow inference over huge data/time (which DL excels in, similar to what the cortex/cerebellum specialize in), combined with fast/low data inference (the MCTS part here). There is still much room for improvement in generalizing beyond current MCTS techniques, and better integration into larger scale ANNs, but that is increasingly looking straightforward.

It's tempting to assume that the "keystone, foundational aspect" of intelligence is learning essentially the same way that artificial neural networks learn.

Yes, but only because "ANN" is enormously broad (tensor/linear algebra program space), and basically includes all possible routes to AGI (all possible approximations of bayesian inference).

But humans can do things like "one-shot" learning, learning from weak supervision, learning in non-stationary environments, etc. which no current neural network can do, and not just because a matter of scale or architectural "details".

Bayesian methods excel at one shot learning, and are steadily integrating themselves into ANN techniques (providing the foundation needed to derive new learning and inference rules). Progress in transfer and semi-supervised learning is also progressing rapidly and the theory is all there. I don't know about non-stationary as much, but I'd be pretty surprised if there wasn't progress there as well.

Thus I think it's fair to say that we still don't know what the foundational aspects of intelligence are.

LOL. Generalized DL + MCTS is - rather obviously - a practical approximation of universal intelligence like AIXI. I doubt MCTS scales to all domains well enough, but the obvious next step is for DL to eat MCTS techniques (so that super new complex heuristic search techniques can be learned automatically).

In the brain, the same circuitry that is used to solve vision is used to solve most of the rest of cognition

And in a laptop the same circuitry that it is used to run a spreadsheet is used to play a video game.

Systems that are Turing-complete (in the limit of infinite resources) tend to have an independence between hardware and possibly many layers of software (program running on VM running on VM running on VM and so on). Things that look similar at a some levels may have lots of difference at other levels, and thus things that look simple at some levels can have lots of hidden complexity at other levels.

Going from superhuman vision

Human-level (perhaps weakly superhuman) vision is achieved only in very specific tasks where large supervised datasets are available. This is not very surprising, since even traditional "hand-coded" computer vision could achieve superhuman performances in some narrow and clearly specified tasks.

Yes, but only because "ANN" is enormously broad (tensor/linear algebra program space), and basically includes all possible routes to AGI (all possible approximations of bayesian inference).

Again, ANN are Turing-complete, therefore in principle they include literally everything, but so does the brute-force search of C programs.

In practice if you try to generate C programs by brute-force search you will get stuck pretty fast, while ANN with gradient descent training empirically work well on various kinds of practical problems, but not on all kinds practical problems that humans are good at, and how to make them work on these problems, if it even efficiently possible, is a whole open research field.

Bayesian methods excel at one shot learning

With lots of task-specific engineering.

Generalized DL + MCTS is - rather obviously - a practical approximation of universal intelligence like AIXI.

So are things like AIXI-tl, Hutter-search, Gödel machine, and so on. Yet I would not consider any of them as the "foundational aspect" of intelligence.

And in a laptop the same circuitry that it is used to run a spreadsheet is used to play a video game.

Exactly, and this a good analogy to illustrate my point. Discovering that the cortical circuitry is universal vs task-specific (like an ASIC) was a key discovery.

Human-level (perhaps weakly superhuman) vision is achieved only in very specific tasks where large supervised datasets are available.

Note I didn't say that we have solved vision to superhuman level, but this is simply not true. Current SOTA nets can achieve human-level performance in at least some domains using modest amounts of unsupervised data combined with small amounts of supervised data.

Human vision builds on enormous amounts of unsupervised data - much larger than ImageNet. Learning in the brain is complex and multi-objective, but perhaps best described as self-supervised (unsupervised meta-learning of sub-objective functions which then can be used for supervised learning).

A five year old will have experienced perhaps 50 million seconds worth of video data. Imagenet consists of 1 million images, which is vaguely equivalent to 1 million seconds of video if we include 30x amplification for small translations/rotations.

The brain's vision system is about 100x larger than current 'large' vision ANNs. But If deepmind decided to spend the cash on that and make it a huge one off research priority, do you really doubt that they could build a superhuman general vision system that learns with a similar dataset and training duration?

So are things like AIXI-tl, Hutter-search, Gödel machine, and so on. Yet I would not consider any of them as the "foundational aspect" of intelligence.

The foundation of intelligence is just inference - simply because universal inference is sufficient to solve any other problem. AIXI is already simple, but you can make it even simpler by replacing the planning component with inference over high EV actions, or even just inference over program space to learn approx planning.

So it all boils down to efficient inference. The new exciting progress in DL - for me at least - is in understanding how successful empirical optimization techniques can be derived as approx inference update schemes with various types of priors. This is what I referred to as new and upcoming "Bayesian methods" - bayesian grounded DL.

There are other big deals. The MS ImageNet win also contained frightening progress on the training meta level.

The other issue is that constructing this kind of mega-neural net is tremendously difficult. Landing on a particular set of algorithms—determining how each layer should operate and how it should talk to the next layer—is an almost epic task. But Microsoft has a trick here, too. It has designed a computing system that can help build these networks.

As Jian Sun explains it, researchers can identify a promising arrangement for massive neural networks, and then the system can cycle through a range of similar possibilities until it settles on this best one. “In most cases, after a number of tries, the researchers learn [something], reflect, and make a new decision on the next try,” he says. “You can view this as ‘human-assisted search.'”

-- extracted from very readable summary at wired: http://www.wired.com/2016/01/microsoft-neural-net-shows-deep-learning-can-get-way-deeper/

Going by that description, it is much much less important than residual learning, because hyperparameter optimization is not new. There are a lot of approaches: grid search, random search, Gaussian processes. Some hyperparameter optimizations baked into MSR's deep learning framework would save some researcher time and effort, certainly, but I don't know that it would've made any big difference unless they have something quite unusual going one.

(I liked one paper which took a Bayesian multi-armed bandit approach and treated error curves as partial information about final performance, and it would switch between different networks being trained based on performance, regularly 'freezing' and 'thawing' networks as the probability each network would become the best performer changed with information from additional mini-batches/epoches.) Probably the single coolest one is last year some researchers showed that it is possible to somewhat efficiently backpropagate on hyperparameters! So hyperparameters just become more parameters to learn, and you can load up on all sorts of stuff without worrying about it making your hyperparameter optimization futile or having to train a billion times, and would both save people a lot of time (for using vanilla networks) and allow exploring extremely complicated and heavily parameterized families of architectures, and would be a big deal. Unfortunately, it's still not efficient enough for the giant networks we want to train. :(

How big a deal is this? What, if anything, does it signal about when we get smarter than human AI?

It shows that Monte-Carlo tree search meshes remarkably well with neural-network-driven evaluation ("value networks") and decision pruning/policy selection ("policy networks"). This means that if you have a planning task to which MCTS can be usefully applied, and sufficient data to train networks for state-evaluation and policy selection, and substantial computation power (a distributed cluster, in AlphaGo's case), you can significantly improve performance on your task (from "strong amateur" to "human champion" level). It's not an AGI-complete result however, any more than Deep-Blue or TD-gammon were AGI-complete.

The "training data" factor is a biggie; we lack this kind of data entirely for things like automated theorem proving, which would otherwise be quite amenable to this 'planning search + complex learned heuristics' approach. In particular, writing provably-correct computer code is a minor variation on automated theorem proving. (Neural networks can already write incorrect code, but this is not good enough if you want a provably Friendly AGI.)

Humans need extensive training to become competent, as will AGI, and this should have been obvious for anyone with a good understanding of ML.

This is a big deal, and it is another sign that AGI is near.

Intelligence boils down to inference. Go is an interesting case because good play for both humans and bots like AlphaGo requires two specialized types of inference operating over very different timescales:

  • rapid combinatoric inference over move sequences during a game(planning). AlphaGo uses MCT search for this, whereas the human brain uses a complex network of modules involving the basal ganglia, hippocampus, and PFC.
  • slow deep inference over a huge amount of experience to develop strong pattern recognition and intuitions (deep learning). AlphaGo uses deep supervised and reinforcement learning via SGD over a CNN for this. The human brain uses the cortex.

Machines have been strong in planning/search style inference for a while. It is only recently that the slower learning component (2nd order inference over circuit/program structure) is starting to approach and surpass human level.

Critics like to point out that DL requires tons of data, but so does the human brain. A more accurate comparison requires quantifying the dataset human pro go players train on.

A 30 year old asian pro will have perhaps 40,000 hours of playing experience (20 years 50 40 hrs/week). The average game duration is perhaps an hour and consists of 200 moves. In addition, pros (and even fans) study published games. Reading a game takes less time, perhaps as little as 5 minutes or so.

So we can estimate very roughly that a top pro will have absorbed between 100,000 games to 1 million games, and between 20 to 200 million individual positions (around 200 moves per game) .

AlphaGo was trained on the KGS dataset: 160,00 games and 29 million positions. So it did not train on significantly more data than a human pro. The data quantities are actually very similar.

Furthermore, the human's dataset is perhaps of better quality for a pro, as they will be familiar with mainly pro level games, whereas the AlphaGo dataset is mostly amateur level.

The main difference is speed. The human brain's 'clockrate' or equivalent is about 100 hz, whereas AlphaGo's various CNNs can run at roughly 1000hz during training on a single machine, and perhaps 10,000 hz equivalent distributed across hundreds of machines. 40,000 hours - a lifetime of experience - can be compressed 100x or more into just a couple of weeks for a machine. This is the key lesson here.

The classification CNN trained on KGS was run for 340 million steps, which is about 10 iterations per unique position in the database.

The ANNs that AlphaGo uses are much much smaller than a human brain, but the brain has to do a huge number of other tasks, and also has to solve complex vision and motor problems just to play the game. AlphaGO's ANNs get to focus purely on Go.

A few hundred TitanX's can muster up perhaps a petaflop of compute. The high end estimate of the brain is 10 petaflops (100 trillion synapses 100 hz max firing rate). The more realistic estimate is 100 teraflops (100 trillion synapes 1 hz avg firing rate), and the lower end is 1/10 that or less.

So why is this a big deal? Because it suggests that training a DL AI to master more economically key tasks, such as becoming an expert level programmer, could be much closer than people think.

The techniques used here are nowhere near their optimal form yet in terms of efficiency. When Deep Blue beat Kasparov in 1996, it required a specialized supercomputer and a huge team. 10 years later chess bots written by individual programmers running on modest PC's soared past Deep Blue - thanks to more efficient algorithms and implementations.

A 30 year old asian pro will have perhaps 40,000 hours of playing experience (20 years 50 40 hrs/week). The average game duration is perhaps an hour and consists of 200 moves. In addition, pros (and even fans) study published games. Reading a game takes less time, perhaps as little as 5 minutes or so.

So we can estimate very roughly that a top pro will have absorbed between 100,000 games to 1 million games, and between 20 to 200 million individual positions (around 200 moves per game) .

I asked a pro player I know whether these numbers sounded reasonable. He replied:

At least the order of magnitude should be more or less right. Hours of playing weekly is probably somewhat lower on average (say 20-30 hours), and I'd also use 10-15 minutes to read a game instead of five. Just 300 seconds to place 200 stones sounds pretty tough. Still, I'd imagine that a 30-year-old professional has seen at least 50 000 games, and possibly many more.

Critics like to point out that DL requires tons of data, but so does the human brain.

Both deep networks and the human brain require lots of data, but the kind of data they require is not the same. Humans engage mostly in semi-supervised learning, where supervised data comprises a small fraction of the total. They also manage feats of "one-shot learning" (making critically-important generalizations from single datapoints) that are simply not feasible for neural networks or indeed other 'machine learning' methods.

A few hundred TitanX's can muster up perhaps a petaflop of compute.

Could you elaborate? I think this number is too high by roughly one order of magnitude.

The high end estimate of the brain is 10 petaflops (100 trillion synapses * 100 hz max firing rate).

Estimating the computational capability of the human brain is very difficult. Among other things, we don't know what the neuroglia cells may be up to, and these are just as numerous as neurons.

Both deep networks and the human brain require lots of data, but the kind of data they require is not the same. Humans engage mostly in semi-supervised learning, where supervised data comprises a small fraction of the total.

This is probably a misconception for several reasons. Firstly, given that we don't fully understand the learning mechanisms in the brain yet, it's unlikely that it's mostly one thing. Secondly, we have some pretty good evidence for reinforcement learning in the cortex, hippocampus, and basal ganglia. We have evidence for internally supervised learning in the cerebellum, and unsupervised learning in the cortex.

The point being: these labels aren't all that useful. Efficient learning is multi-objective and doesn't cleanly divide into these narrow categories.

The best current guess for questions like this is almost always to guess that the brain's solution is highly efficient, given it's constraints.

In the situation where a go player experiences/watches a game between two other players far above one's own current skill, the optimal learning update is probably going to be a SL style update. Even if you can't understand the reasons behind the moves yet, it's best to compress them into the cortex for later. If you can do a local search to understand why the move is good, then that is even better and it becomes more like RL, but again, these hard divisions are arbitrary and limiting.

A few hundred TitanX's can muster up perhaps a petaflop of compute.

Could you elaborate? I think this number is too high by roughly one order of magnitude.

The GTX TitanX has a peak perf of 6.1 terraflops, so you'd need only a few hundred to get a petaflop supercomputer (more specifically, around 175).

The high end estimate of the brain is 10 petaflops (100 trillion synapses * 100 hz max firing rate).

Estimating the computational capability of the human brain is very difficult. Among other things, we don't know what the neuroglia cells may be up to, and these are just as numerous as neurons.

It's just a circuit, and it obeys the same physical laws. We have this urge to mystify it for various reasons. Neuroglia can not possibly contribute more to the total compute power than the neurons, based on simple physics/energy arguments. It's another stupid red herring like quantum woo.

These estimates are only validated when you can use them to make predictions. And if you have the right estimates (brain equivalent to 100 terraflops ish, give or take an order of magnitude), you can roughly predict the outcome of many comparisons between brain circuits vs equivalent ANN circuits (more accurately than using the wrong estimates).

I think it's a giant leap for go and one small step for mankind.

(Well, I don't know how giant a leap it is. But it's a hell of an achievement.)

I should say, getting this working is very impressive, and took an enormous amount of effort. +1 to the team!

An interesting comment:

The European champion of Go is not the world champion, or even close. The BBC, for example, reported that “Google achieves AI ‘breakthrough’ by beating Go champion,” and hundreds of other news outlets picked up essentially the same headline. But Go is scarcely a sport in Europe; and the champion in question is ranked only #633 in the world. A robot that beat the 633rd-ranked tennis pro would be impressive, but it still wouldn’t be fair to say that it had “mastered” the game. DeepMind made major progress, but the Go journey is still not over; a fascinating thread at YCombinator suggests that the program — a work in progress — would currently be ranked #279.

Does one have to be the master to be a master?

I would be amazingly impressed by a robot beating the 633rd-ranked tennis pro. That would easily put it in the top 1% of the top 1% of those who play tennis. How close to the top of a sport or game would a human have to be before we would call them a master of it? Surely not that high!

Imagine the following exchange:

"I'm the best blacksmith in Britain."

"Oh. Well, this is awkward. You see, I was looking for a master blacksmith..."