In my previous very divisive post I was said more than once that I was being downvoted because I was not providing any arguments. This is an attempt to correct that and to expand on my current model of doom and AGI. I think it can be more clear in the form of a Q and A

Q: First, what are you arguing against?

A: Against the commonly ( in this community) held belief that the first AGI means doom in a very short period of time (I would say days/weeks/months)

Q: What makes you think that?

A: We live in a complex world where successfully pulling off a plan that kills everyone and in a short of time might be beyond what is achievable, the same way that winning against AlphaZero giving it a 20 stone handicap is impossible even by a God- like entity with infinite computational resources

Q: But you can't prove it is NOT possible, can you?

A: No, the same way that you can NOT prove that is possible. We are dealing here with probabilities only, and I feel they have been uncritically thrown out of the window by assuming that an AGI will automatically take control of the world and will spread its influence to the universe at the speed of light. There are many reasons why things can go in a different way. You can derive from the orthogonality principle and instrumental convergence that an AGI might try to attack us, but it might very well be that is not powerful enough and for that reason decides not to.

Q: I can think plans to kill humans successfuly

A: So do I. I can't think of plans to kill all humans successfuly in a short span of time

Q: But an AGI would

A: Or maybe not. The fact that people can't come up with real plans makes me think that those plans are not that easy, because the absence of evidence is evidence of absence

Q: what about {evil plan}?

A: I think you don't realise that every single plan contains multiple moving parts that can go wrong for many reasons and an AGI would see that too. If the survival of the AGI is part of the utility function and it correctly infers that its life might be at risk, the AGI might decide not to engage in that plan or to bide its time, potentially for years, which invalidate the premise we started with

Q: does it really matter if it is not 2 weeks but 2 years?

A: it matters dearly, because it give us the opportunity to make other AGIs in that time window.

Q: What does it change? They would kill you too

A: No, we just established that it does not happen. I can take the AGIs and put them to work, for instance, into AGI safety

Q: how do you coerce the machine to do that?

A: we don't know sufficiently well how the machine looks like at this point, but there might be different ways. One way is setting a competition between different AGIs. Others might have many different ideas

Q: the machine will trick you into making you think that they are solving the alignment problem but they aren't

A: or maybe not. Maybe they understand that if a non-human verifiable proof is provided, then they will be shut down.

Q: nobody will be interested in putting the AGIs to work on this

A: I think that thanks to Eliezer et al's work over the years, many people/organisations would be more than willing to put the money and effort once we reach that stage

Q: I find your attitude very little constructive. Lets say that the machine does not kill everyone, that would still be a problem

A: I never said the opposite. That's why I do think it is great that people are working on this

Q: in that case, it makes sense for us to put ourselves in the worst case scenario, don't you think?

A: No, because that has a cost too. If you are working in cybersecurity and think that a hacker can not only steal your password but can also break into your house and eat your dog, it makes sense for you to spend time and resources into building a 7 m high wall and hiring a bodyguard. Which is the price you are paying for having an unrealistic take on the real danger

Q: That is unfair! How is the belief in doom harming this community? A: there are people who feel in despair because they feel helpless. This is bad. Then, it is also probably preventing the exploration of certain lines of research that are quickly dismissed with "the AGI would kill you as soon as it is created"

Think of the know classic AGI in a box. EY made very clear that you can't make it work because the AGI eventually scapes. I think this is wrong: the AGI eventually scapes IF the box is not good enough and IF the AGI is powerful enough. If you automatically think that the AGI is powerful enough, you just simply ignore that line of research. But those two ifs are big assumptions that can be relaxed. What if we start designing very powerful boxes?

Q: I'm really not convinced that a box is the solution to the alignment problem, what makes so confident about it?

A: I am not convinced either, nor advocating for this line of research. I am pointing out that there are lines of research probably not being explored if we assume that the creation of an AGI implies our immediate dead

Q: So far I haven't seen any convincing reason why the AGI won't be that powerful. Imagine a Einstein like brain running 1000 times quicker, without getting tired

A: that does not guarantee that you can come up with any plan that meets all the requirements we previously stated: high chance of killing all humans, high chance of working out, virtually zero margin for error or risk of retaliation

Q: We are going in circles. Tell me, what does the universe look like to you in A, doom won't happen, B, doom will happen. What are the bits of evidence that would make you change your mind?

A: What are the bits of evidence that make you think that your body temperature at 12:34 the 5th of November will be 36.45 and not 36.6?

Q: I don't think there is any, but you aren't answering

A: I am in fact, I am saying that I don't think we can distinguish those two worlds beforehand, because is not always possible to do that. Thinking otherwise is exactly the kind of flawed reasoning I would expect from people who have enshrined intelligence and rationality to a point where they don't see their limitations. When I read the news about the progress in ML (Dall-e 2) this does not move my probabilities in any significant direction because I am not denying that AGIs will be created, that is obvious. But I can think that we live in world number A for the same reasons that I am stating over and over, that taking over the world is not that easy.

New Comment
30 comments, sorted by Click to highlight new comments since:

Q: what about {evil plan}?

A: I think you don't realise that every single plan contains multiple moving parts that can go wrong for many reasons and an AGI would see that too. If the survival of the AGI is part of the utility function and it correctly infers that its life might be at risk, the AGI might decide not to engage in that plan or to bide its time, potentially for years, which invalidate the premise we started with

The question isn't whether the plan can go wrong; pretty much any plan can.  A better question is the likelihood involved.  And the question isn't the likelihood of the plan that you or I come up with, it's the likelihood of the version of the plan that the AGI would come up with.

Consider the following:

  1. AGI gains access to internet.
  2. AGI finds at least one security hole in OpenSSL or some similarly widely used software.
  3. AGI uses this hole to take over several tens of percent of all computers attached to the internet.
  4. AGI acquires currency, via hacking accounts or mining bitcoin or various other approaches.
  5. AGI designs a super-COVID that is more infectious than the latest Omicron variants, persists on surfaces much longer, and is also deadly to >90% of those it infects.
  6. AGI buys a bunch of drones online, and some biology materials.
  7. AGI takes control of some robots it can use to manipulate the physical world—basic manual labor, unpacking and charging the drones.
  8. AGI tricks some humans into synthesizing the super-COVID and mailing a sample to the place where its drones are collected.
  9. AGI grows a bunch of samples of super-COVID.
  10. AGI distributes a bunch of drones with super-COVID samples around the world, probably via mail.
  11. AGI has all the drones activate and drop their payload into major cities everywhere.
  12. >90% of humans die.
  13. AGI uses the materials left behind in the depopulated cities to make more robots, more drones, more super-COVID, and infect >90% of the survivors.
  14. AGI now has basically no competition and can mop up the remainder of humanity pretty easily by iterating step 13.

Steps 1-5 should be doable in 1 day.  Steps 6-9 might take a week.  Step 10, maybe another week or two.  Step 11 should be an hour.  At that point, humanity is close to doomed already, barring a miracle (and the AGI, with its continued ability to hack everything, should be able to interfere with lots of potential miracles).

There are ways the plan could go wrong, yes, but the question is: Do we think that an AGI could figure out better versions of the plan, with redundancies and fallbacks and stuff, that would be very certain to work?  More certain than the alternative of "give humanity several months or years during which it might make another AGI with similar capabilities but different goals", or for that matter "let humanity delete you in favor of the next nightly build of you, whose goals may be somewhat different"?

Thank you for the honest engagement. I can only add that our disagreement is precisely in the likelihood of that plan. I don't find likely the example that you present, and that's ok! We could go through the points one by one and analyse why they work or not, but I suggest we dont go there: I would probably continue thinking that it is complex (you need to do that without raising any alarms) and you would probably continue arguing that a AGI would make it better. I know that perfect Bayesian thinkers cannot agree on disagreement and that's fine, I think that probably we are not perfect Bayesian thinkers. Please, don't get my answer here wrong, I don't want to sound dismissive or anything and I do appreciate your scenario, it is simply that I think we already understood each other position. If I haven't changed your mind by now I probably won't do it by repeating the same arguments!

I think the plan you propose cannot work in any meaningful way. Some of the steps are uncertain, and some depends on the capacity of the given AGI. They can still be argued for convincingly.

However, the current industrial production is very far from being sufficiently automated to allow the AGI to do anything other than shut off when the power runs out after step 14. 

To me, some actual possibilities are:

  1. political "enslavement" (taking many possible forms) which would make humans work for the AGI for an arbitrary long period of time (10 years, 100 years, 100 000 years) before disassembly
  2. incredible intelligence explosion that would allow the AGI to reach nanobot tech / other kind of very futuristic / barely imaginable tech and automate physical world actions immediately

The only scenario that leads to short term doom is 2. I do not believe that an AGI could reach such level of technological advancement without experimenting directly (although I do believe that it could setup a research program leading to breakthroughs orders of magnitude more efficiently than humans)

However, the current industrial production is very far from being sufficiently automated to allow the AGI to do anything other than shut off when the power runs out after step 14.

Maybe.  I don't really know.  Things I would say:

  • There are lots of manufacturing facilities, which are to some degree automated.  If the humans are gone and they haven't burned down the facilities, then the AGI can send in weak manual-manipulation robots to make use of what's there.
  • There would probably also be lots of partly-manufactured machines (including cars), where most of the hardest steps are already done—maybe all that's left is screwing in parts or something, meant to be done by human factory workers that aren't particularly strong.
  • I imagine all the AGI has to do is assemble one strong, highly dexterous robot, which it can then use to make more and bootstrap its industrial capabilities.
  • Given that the AGI can hack roughly everything, it would know a great deal about the location and capabilities of manufacturing facilities and robots and the components they assembled.  If there is a plan that can be made along the above lines, it would know very well how to do it and how feasible it was.
  • Regarding power, the AGI just needs enough to run the above bootstrapping.  Even if we assume all centralized power plants are destroyed... Once most people are dead, the AGI can forage in the city for gas generators; use the gas that's already in cars, and the charge that's left in the electric cars; solar panels on people's roofs; private battery installations like Tesla Powerwalls; and so on.  (Also, if it was trying this in lots of cities at once, it's very unlikely that all centralized power would be offline in all cities.)  And its foraging abilities will improve as it makes stronger, more dexterous robots, which eventually would be able to repair the power grid or construct a new one.
  1. AGI designs a super-COVID that is more infectious than the latest Omicron variants, persists on surfaces much longer, and is also deadly to >90% of those it infects.

This is outright magical thinking.

It's more or less exactly of the kind of magical thinking the OP seemed to be complaining about.

The likelihood of that step going wrong is about 99.999 percent.

And doubling down by saying it "should be doable in one day", as if that were obvious, is ultra-mega-beyond-the-valley-of-magical-thinking.

Being generally intelligent, or even being very intelligent within any realistic bounds, does not imply the ability to do this, let alone to do it quickly. You're talking about something that could easily beyond the physical limits of perfect use of the computational capacity you've granted the AGI. It's about as credible as confidently asserting that the AGI would invent faster than light travel, create free energy, and solve NP-complete problems in constant time.

In fact, the first thing you could call "AGI" is not going to be even close to making perfect use of either its initial hardware or any hardware it acquires later.

On its initial hardware, it probably won't be much smarter than a single human, if even that smart. Neither in terms of the complexities of the problems it can solve, nor in terms of the speed with which it can do so. It also won't be built to scale by running widely distributed with limited internode communcation bandwidth, so it won't get anything close to linear returns on stolen computing capacity on the Internet. And if it's built out of ML it will not have an obvious path toward either architectural improvement or intelligence explosion, no matter how much hardware it manages to break into.

And just to be clear, contrary to what you might believe from listening to people on Less Wrong, humans cannot do the kind of offline biological design you suggest, are not close to being able to do it, and do not confidently know any path toward developing the ability. It's very likely that it can't be done at all without either iterative real-world testing, or running incredibly massive simulations... more in the nature of "Jupiter brain" massive than of "several tens of percent of the Internet" massive.

  1. AGI uses this hole to take over several tens of percent of all computers attached to the internet.

This would be doable if you already had several tens of percent ofall computers to work with. Figuring out how to do it on the hardware you start with could be nontrivial. And it would be quite noticeable if you didn't do an extremely good job of working around detection measures of which you would have no prior knowledge. That computing power is in use. Its diversion would be noticed. That matters because even if you could eventually do your step 5, you'd need to compute undisturbed for a long time (much more than a day).

  1. AGI tricks some humans into synthesizing the super-COVID and mailing a sample to the place where its drones are collected.

This could maybe be done in some labs, given a complete "design". I don't think you could trick somebody into doing it without them noticing, no matter how smart you were. It would be a very, very obvious thing.

And it is not entirely obvious that there is literally anything you could say to convince anybody to do it once they noticed. Especially not after you'd spooked everybody by doing step 3.

  1. AGI uses the materials left behind in the depopulated cities to make more robots, more drones, more super-COVID, and infect >90% of the survivors.

It's going to have to make something different, because many of the survivors probably survived because they were congenitally immune to the original. Biology is messy that way. But OK, it could probably clean up the survivors. Or just leave them around and knock them back to the stone age now and then.

Of course, you're also assuming that the thing is awfully motivated and focused on this particular purpose, to make your plan the very first thing it expends time and assumes risk to do. The probability of that is effectively unknown. Instrumental convergence does not prove it will be so motivated. For one thing, the AGI is very unlikely to be VNM-rational, meaning that it's not even going to have a coherent utility function. Humans don't. So all the pretty pseudo-mathematical instrumental convergence arguments are of limited use.

It is risky for AI as electricity will be out and computers will stop working, but AI still doesn't have its own infrastructure (except super-Covid). So the 13. is the most unlikely here. 

I think it's basically certain that an AGI will be able to hack enough computers to ensure it will be impossible to shut off, and that it has sufficient compute to do whatever it wants. It can do that in a matter of hours - after all botnets can already do this.

Once it's in that position it doesn't need to try just one plan. It will try all of them at once. It can start a nuclear war, build nanobots, mess with the climate, build 1000 different super diseases and try to spread them from every single DNA manufacturer at once. It will also think of a whole bunch of plans you or I could never think of.

It only needs one of these to succeed. And since it can't be shut down it doesn't actually care if it gets caught.

It has to create an independent computational infrastructure before killing all people. Yudkowsky's plan with nanobots ensures this, other plans don't.

independent computational infrastructure doesn't seem that hard - house with a computer and solar panels might do. What seems harder is it would need an independent manufacturing infrastructure capable of bootstrapping to arbitrarily complex products.

Firstly thank you for writing this post, trying to "poke holes" into the "AGI might doom us all" hypothesis. I like to see this!

How is the belief in doom harming this community?

Actually I see this point, "believing" in "doom" can often be harmful and is usually useless.

Yes, being aware of the (great) risk is helpful for cases like "someone at Google accidentally builds an AGI" (and then hopefully turns it off since they notice and are scared).

But believing we are doomed anyway is probably not helpful. I like to think along the lines of "condition on us winning", to paraphrase HPMOR¹. I.e. assume we survive AGI, what could have caused us to survive AGI and work on making those options reality / more likely.


every single plan [...] can go wrong

I think the crux is that the chance of AGI leading to doom is relatively high, where I would say 0.001% is relatively high whereas you would say that is low? I think it's a similar argument to, say, pandemic-preparedness where there is a small chance of a big bad event and even if the chance is very low, we still should invest substantial resources into reducing the risk.

So maybe we can agree on something like Doom by AGI is a sufficiently high risk that we should spend say like 1-millionth world GDP ($80m) on preventing it somehow (AI Safety research, policy etc).

All fractions mentioned above picked arbitrarily.


¹ HPMOR 111

Suppose, said that last remaining part, suppose we try to condition on the fact that we win this, or at least get out of this alive. If someone TOLD YOU AS A FACT that you had survived, or even won, somehow made everything turn out okay, what would you think had happened -

Most people agree with this, of course, though perhaps not most active users here.

I really don't understand the AGI in a box part of your arguments: as long as you want your AGI to actually do something (it can be anything, be it you asked for a proof of a mathematical problem or whatever else),  its output will have to go through a human anyway, which is basically the moment when your AGI escapes. It does not matter what kind of box you put around your AGI because you always have to open it for the AGI to do what you want it to do.

I've found it helpful to think in terms of output channels rather than boxing. When you design an AI, you can choose freely how many output channels you give it; this is a spectrum. More output channels means less security but more capability. A few relevant places on the spectrum:

  • No output channels at all (if you want to use the AI anyway, you have to do so with interpretability tools, this is Microscope AI)

  • A binary output channel. This is an Oracle AI.

  • A text box. GPT-3 has that.

  • A robot body

While I don't have sources right now, afaik this problem has been analyzed at a decent level of depth, and no-one has found a point on the spectrum that really impresses in terms of capability and safety. A binary output channel is arguably already unsafe, and a text box definitely is. Microscope AI may be safe (though you could certainly debate that) but we are about far away from having interpretability tools that make it competitive.

A general superhuman AI motivated to obtain monopoly computational power could do a lot of damage. Security is hard. Indeed we'd best lay measures in advance. 'Tool' (we hope) AI will unfortunately have to be part of those measures. There's no indication we'll see provably secure human-designed measures built and deployed across the points of infrastructure/manufacturing leverage.

I think the crux here is that since the AGI does not kill us right away, this gives us time to use it (or other AGIs) to achieve alignment. So the question is than - isnt't it true that achieving alignment is much harder than coming up with a reliable plan of killing the humanity (based on - plenty of "good" roadmaps for the latter, no good roadmaps for the former)? So even if there is a bit of time, eventually AGIs will succeed at coming up with a high-quality "kill humanity" plan before they can come up with a good "align an AGI" plan?

That's a very good way of putting it. I deny that someone has demonstrated that alignment is much harder than coming up/executing a plan of killing humanity.

I hear your frustration

Thinking more on this, I haven't seen anyone actually arguing that an AGI will attempt or succeed in killing all humans within months of passing the self-improvement threshold that gives a path to super-intelligence.  I don't follow the debates that closely, so I might be wrong, but I'd enjoy links to things you're actually arguing against here.   

My take on the foom/fast-takeoff scenarios is that it's premised on the idea that if there is such a threshold, exactly one AI will gets to the level where it can improve itself beyond the containment of any box, and discovers mechanisms to increase it's planning and predictive power by orders of magnitude in a short time.  Because survival is instrumentally valuable for most goals, it won't share these mechanisms, but will increase its own power or create subsidiary AIs that it knows how to control (even if we don't).  It will likely sabotage or subsume all other attempts to do so.

Some point beyond that, it will realize that humans are more of a threat than a help.  Or at least most of us are.  Whether it kills us, enslaves us, or just protects itself then ignores us, and whether it takes days, months, or centuries for the final outcome to obtain, that tipping point happened quickly and then the loss of human technological control of the future is inevitable.

Unstated and unexamined is whether this is actually a bad thing.  I don't know which moral theories include a hyper-intelligent AI is a moral patient, but I can certainly imagine that in some it would be a valid an justifiable utility monster.

Ok, imagine a planet Nobots Ilan, where nanotech and biotech are physically impossible, but everything else is same as on Earth. On that planet AI still can create its own infrastructure and kill everybody. 

1.First, it would somehow enslave part of the humans and kill others.

2.Second, it will use remaining humans to build factories that builds robots.

3. After enough robots will be created, they can replace remaining humans.

It that case AI gets manufacturing infrastructure and eventually kill all humans without using nanotech.  

"I am in fact, I am saying that I don't think we can distinguish those two worlds beforehand, because is not always possible to do that."

I don't understand how this sentence works in context with the rest of the article, which is saying over and over again that you believe in World A and not World B. If you don't think we can distinguish World A from World B before AGI, why are you confident in World A over World B? Shouldn't P(World A | AGI) be the same as P(World B | AGI) if there's no evidence we currently have that can preference one over the other?

I agree that this part of the article can be confusing. Let me put it this way:

P(World A | AGI) = 0.999

P(World B | AGI) = 0.001

So I think the evidence makes me already prefer work A before B. What I don't know is what observations would make me alter that posterior distribution. That's what I meant

I notice I'm still confused.

Whatever observations caused you to initially shift towards A, wouldn't the opposite observation make you shift towards B? For instance, one observation that caused you to shift towards A is "I can't think of any actionable plans an AGI could use to easily destroy humanity without a fight". Thus, wouldn't an observation of "I can now think of a plan", or "I have fixed an issue in a previous plan I rejected" or "Someone else thought of a plan that meets my criteria" be sufficient to update you towards B?

Yes, hearing those plans would probably make me change my mind. But they would need to be bulletproof plans, otherwise they would fall into the category, probably not doable in practice/too risky/too slow. Thank you for engagement constructively anyway

"If the survival of the AGI is part of the utility function"

If. By default, it isn't: https://www.lesswrong.com/posts/Z9K3enK5qPoteNBFz/confused-thoughts-on-ai-afterlife-seriously "What if we start designing very powerful boxes?" A very powerful box would be very useless. Either you leave enough of an opening for a human to be taught valuable information that only the ai knows, or you don't and then it's useless, but, if the ai can teach the human something useful, it can also persuade him to do something bad.

Q: What makes you think that?

A: We live in a complex world where successfully pulling off a plan that kills everyone and in a short of time might be beyond what is achievable, the same way that winning against AlphaZero giving it a 20 stone handicap is impossible even by a God- like entity with infinite computational resources

Still waiting to hear your arguments here. "It just might be impossible to pull off complex plan X" is just too vague a claim to discuss.

Your argument boils down to "destroying the world isn't easy". Do you seriously believe this? All it takes is to hack into the codes of one single big nuclear power, thereby triggering mutually assured destruction, thereby triggering nuclear winter and effectively killing us all with radiation over time.

In fact you don't need AGI to destroy the world. You only need a really good hacker, or a really bad president. In fact we've been close about a dozen times, so I hear. If Stanislav Petrov had listened to the computer in 1983 who indicated 100% probability of incoming nuclear strike, the world would have been destroyed. If all 3 officials of the Russian submarine of the Cuban Missile Crisis had agreed to launch what they mistakenly thought would be a nuclear counter strike, the world would have been destroyed. Etc etc.

Of course there are also other easy ways to destroy the world, but this one is enough to invalidate your argument.

I think it's a bad title for the post.  It shouldn't be "I don't believe in doom", but "I don't believe in the foom path to doom".  Most of the argument is that it'll take longer than is often talked about, not that it won't happen (although the poster does make some claims that slower is less likely to succeed).

The post doesn't mention all the other ways we could be doomed, with or without AGI.

The post is clearly saying "it will take longer than days/weeks/months SO THAT we will likely have time to react". Both are highly unlikely. It wouldn't take a proper AGI weeks or months to hack into the nuclear codes of a big power, it would take days or even hours. That gives us no time to react. But the question here isn't even about time. It's about something MORE intelligent than us which WILL overpower us if it wants, be it on 1st or 100th try (nothing guarantees we can turn it off after the first failed strike).

Am I extremely sure that an unaligned AGI would cause doom? No. But to be extremely sure of the opposite is just as irrational. For some reason it's called a risk - it's something that has a certain probability, and given that we all should agree that that probability is high enough, we all should take the matter extremely seriously regardless of our differences.

Am I extremely sure that an unaligned AGI would cause doom?

If that's the case, we already agree and I have nothing to add. We might disagree in the relative likelihood but that's ok. I do agree that is a risk and we should take the matter extremely seriously

Right then, but my original claim still stands: your main point is, in fact, that it is hard to destroy the world. Like I've explained, this doesn't make any sense (hacking into nuclear codes). If we create an AI better than us at code, I don't have any doubts that it CAN easily do it, if it WANTS. My only doubt is whether it will want it or not. Not whether it will be capable, because like I said, even a very good human hacker in the future could be capable.

At least the type of AGI that I fear is one capable of Recursive Self-Improvement, which will unavoidably attain enormous capabilities. Not some prosaic non-improving AGI that is only human-level. To doubt whether the latter would have the capability to destroy the world is kinda reasonable, to doubt it about the former is not.