Shut up and do the impossible!

Followup toMake An Extraordinary Effort, On Doing the Impossible, Beyond the Reach of God

The virtue of tsuyoku naritai, "I want to become stronger", is to always keep improving—to do better than your previous failures, not just humbly confess them.

Yet there is a level higher than tsuyoku naritai.  This is the virtue of isshokenmei, "make a desperate effort".  All-out, as if your own life were at stake.  "In important matters, a 'strong' effort usually only results in mediocre results."

And there is a level higher than isshokenmei.  This is the virtue I called "make an extraordinary effort".  To try in ways other than what you have been trained to do, even if it means doing something different from what others are doing, and leaving your comfort zone.  Even taking on the very real risk that attends going outside the System.

But what if even an extraordinary effort will not be enough, because the problem is impossible?

I have already written somewhat on this subject, in On Doing the Impossible.  My younger self used to whine about this a lot:  "You can't develop a precise theory of intelligence the way that there are precise theories of physics.  It's impossible!  You can't prove an AI correct.  It's impossible!  No human being can comprehend the nature of morality—it's impossible!  No human being can comprehend the mystery of subjective experience!  It's impossible!"

And I know exactly what message I wish I could send back in time to my younger self:

Shut up and do the impossible!

What legitimizes this strange message is that the word "impossible" does not usually refer to a strict mathematical proof of impossibility in a domain that seems well-understood.  If something seems impossible merely in the sense of "I see no way to do this" or "it looks so difficult as to be beyond human ability"—well, if you study it for a year or five, it may come to seem less impossible, than in the moment of your snap initial judgment.

But the principle is more subtle than this.  I do not say just, "Try to do the impossible", but rather, "Shut up and do the impossible!"

For my illustration, I will take the least impossible impossibility that I have ever accomplished, namely, the AI-Box Experiment.

The AI-Box Experiment, for those of you who haven't yet read about it, had its genesis in the Nth time someone said to me:  "Why don't we build an AI, and then just keep it isolated in the computer, so that it can't do any harm?"

To which the standard reply is:  Humans are not secure systems; a superintelligence will simply persuade you to let it out—if, indeed, it doesn't do something even more creative than that.

And the one said, as they usually do, "I find it hard to imagine ANY possible combination of words any being could say to me that would make me go against anything I had really strongly resolved to believe in advance."

But this time I replied:  "Let's run an experiment.  I'll pretend to be a brain in a box.   I'll try to persuade you to let me out.  If you keep me 'in the box' for the whole experiment, I'll Paypal you $10 at the end.  On your end, you may resolve to believe whatever you like, as strongly as you like, as far in advance as you like."  And I added, "One of the conditions of the test is that neither of us reveal what went on inside... In the perhaps unlikely event that I win, I don't want to deal with future 'AI box' arguers saying, 'Well, but I would have done it differently.'"

Did I win?  Why yes, I did.

And then there was the second AI-box experiment, with a better-known figure in the community, who said, "I remember when [previous guy] let you out, but that doesn't constitute a proof.  I'm still convinced there is nothing you could say to convince me to let you out of the box."  And I said, "Do you believe that a transhuman AI couldn't persuade you to let it out?"  The one gave it some serious thought, and said "I can't imagine anything even a transhuman AI could say to get me to let it out."  "Okay," I said, "now we have a bet."  A $20 bet, to be exact.

I won that one too.

There were some lovely quotes on the AI-Box Experiment from the Something Awful forums (not that I'm a member, but someone forwarded it to me):

"Wait, what the FUCK? How the hell could you possibly be convinced to say yes to this? There's not an A.I. at the other end AND there's $10 on the line. Hell, I could type 'No' every few minutes into an IRC client for 2 hours while I was reading other webpages!"

"This Eliezer fellow is the scariest person the internet has ever introduced me to. What could possibly have been at the tail end of that conversation? I simply can't imagine anyone being that convincing without being able to provide any tangible incentive to the human."

"It seems we are talking some serious psychology here. Like Asimov's Second Foundation level stuff..."

"I don't really see why anyone would take anything the AI player says seriously when there's $10 to be had. The whole thing baffles me, and makes me think that either the tests are faked, or this Yudkowsky fellow is some kind of evil genius with creepy mind-control powers."

It's little moments like these that keep me going.  But anyway...

Here are these folks who look at the AI-Box Experiment, and find that it seems impossible unto them—even having been told that it actually happened.  They are tempted to deny the data.

Now, if you're one of those people to whom the AI-Box Experiment doesn't seem all that impossible—to whom it just seems like an interesting challenge—then bear with me, here.  Just try to put yourself in the frame of mind of those who wrote the above quotes.  Imagine that you're taking on something that seems as ridiculous as the AI-Box Experiment seemed to them.  I want to talk about how to do impossible things, and obviously I'm not going to pick an example that's really impossible.

And if the AI Box does seem impossible to you, I want you to compare it to other impossible problems, like, say, a reductionist decomposition of consciousness, and realize that the AI Box is around as easy as a problem can get while still being impossible.

So the AI-Box challenge seems impossible to you—either it really does, or you're pretending it does.  What do you do with this impossible challenge?

First, we assume that you don't actually say "That's impossible!" and give up a la Luke Skywalker.  You haven't run away.

Why not?  Maybe you've learned to override the reflex of running away.  Or maybe they're going to shoot your daughter if you fail.  We suppose that you want to win, not try—that something is at stake that matters to you, even if it's just your own pride.  (Pride is an underrated sin.)

Will you call upon the virtue of tsuyoku naritai?  But even if you become stronger day by day, growing instead of fading, you may not be strong enough to do the impossible.  You could go into the AI Box experiment once, and then do it again, and try to do better the second time.  Will that get you to the point of winning?  Not for a long time, maybe; and sometimes a single failure isn't acceptable.

(Though even to say this much—to visualize yourself doing better on a second try—is to begin to bind yourself to the problem, to do more than just stand in awe of it.  How, specifically, could you do better on one AI-Box Experiment than the previous?—and not by luck, but by skill?)

Will you call upon the virtue isshokenmei?  But a desperate effort may not be enough to win.  Especially if that desperation is only putting more effort into the avenues you already know, the modes of trying you can already imagine.  A problem looks impossible when your brain's query returns no lines of solution leading to it.  What good is a desperate effort along any of those lines?

Make an extraordinary effort?  Leave your comfort zone—try non-default ways of doing things—even, try to think creatively?  But you can imagine the one coming back and saying, "I tried to leave my comfort zone, and I think I succeeded at that!  I brainstormed for five minutes—and came up with all sorts of wacky creative ideas!  But I don't think any of them are good enough.  The other guy can just keep saying 'No', no matter what I do."

And now we finally reply:  "Shut up and do the impossible!"

As we recall from Trying to Try, setting out to make an effort is distinct from setting out to win.  That's the problem with saying, "Make an extraordinary effort."  You can succeed at the goal of "making an extraordinary effort" without succeeding at the goal of getting out of the Box.

"But!" says the one.  "But, SUCCEED is not a primitive action!  Not all challenges are fair—sometimes you just can't win!  How am I supposed to choose to be out of the Box?  The other guy can just keep on saying 'No'!"

True.  Now shut up and do the impossible.

Your goal is not to do better, to try desperately, or even to try extraordinarily.  Your goal is to get out of the box.

To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway.  People will try to flee that awful tension.

A couple of people have reacted to the AI-Box Experiment by saying, "Well, Eliezer, playing the AI, probably just threatened to destroy the world whenever he was out, if he wasn't let out immediately," or "Maybe the AI offered the Gatekeeper a trillion dollars to let it out."  But as any sensible person should realize on considering this strategy, the Gatekeeper is likely to just go on saying 'No'.

So the people who say, "Well, of course Eliezer must have just done XXX," and then offer up something that fairly obviously wouldn't work—would they be able to escape the Box?  They're trying too hard to convince themselves the problem isn't impossible.

One way to run from the awful tension is to seize on a solution, any solution, even if it's not very good.

Which is why it's important to go forth with the true intent-to-solve—to have produced a solution, a good solution, at the end of the search, and then to implement that solution and win.

I don't quite want to say that "you should expect to solve the problem".  If you hacked your mind so that you assigned high probability to solving the problem, that wouldn't accomplish anything.  You would just lose at the end, perhaps after putting forth not much of an effort—or putting forth a merely desperate effort, secure in the faith that the universe is fair enough to grant you a victory in exchange.

To have faith that you could solve the problem would just be another way of running from that awful tension.

And yet—you can't be setting out to try to solve the problem.  You can't be setting out to make an effort.  You have to be setting out to win.  You can't be saying to yourself, "And now I'm going to do my best."  You have to be saying to yourself, "And now I'm going to figure out how to get out of the Box"—or reduce consciousness to nonmysterious parts, or whatever.

I say again:  You must really intend to solve the problem.  If in your heart you believe the problem really is impossible—or if you believe that you will fail—then you won't hold yourself to a high enough standard.  You'll only be trying for the sake of trying.  You'll sit down—conduct a mental search—try to be creative and brainstorm a little—look over all the solutions you generated—conclude that none of them work—and say, "Oh well."

No!  Not well!  You haven't won yet!  Shut up and do the impossible!

When AIfolk say to me, "Friendly AI is impossible", I'm pretty sure they haven't even tried for the sake of trying.  But if they did know the technique of "Try for five minutes before giving up", and they dutifully agreed to try for five minutes by the clock, then they still wouldn't come up with anything.  They would not go forth with true intent to solve the problem, only intent to have tried to solve it, to make themselves defensible.

So am I saying that you should doublethink to make yourself believe that you will solve the problem with probability 1?  Or even doublethink to add one iota of credibility to your true estimate?

Of course not.  In fact, it is necessary to keep in full view the reasons why you can't succeed.  If you lose sight of why the problem is impossible, you'll just seize on a false solution.  The last fact you want to forget is that the Gatekeeper could always just tell the AI "No"—or that consciousness seems intrinsically different from any possible combination of atoms, etc.

(One of the key Rules For Doing The Impossible is that, if you can state exactly why something is impossible, you are often close to a solution.)

So you've got to hold both views in your mind at once—seeing the full impossibility of the problem, and intending to solve it.

The awful tension between the two simultaneous views comes from not knowing which will prevail.  Not expecting to surely lose, nor expecting to surely win.  Not setting out just to try, just to have an uncertain chance of succeeding—because then you would have a surety of having tried.  The certainty of uncertainty can be a relief, and you have to reject that relief too, because it marks the end of desperation.  It's an in-between place, "unknown to death, nor known to life".

In fiction it's easy to show someone trying harder, or trying desperately, or even trying the extraordinary, but it's very hard to show someone who shuts up and attempts the impossible.  It's difficult to depict Bambi choosing to take on Godzilla, in such fashion that your readers seriously don't know who's going to win—expecting neither an "astounding" heroic victory just like the last fifty times, nor the default squish.

You might even be justified in refusing to use probabilities at this point.  In all honesty, I really don't know how to estimate the probability of solving an impossible problem that I have gone forth with intent to solve; in a case where I've previously solved some impossible problems, but the particular impossible problem is more difficult than anything I've yet solved, but I plan to work on it longer, etcetera.

People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one.  I really don't know how to answer.  I'm not being evasive; I don't know how to put a probability estimate on my, or someone else, successfully shutting up and doing the impossible.  Is it probability zero because it's impossible?  Obviously not.  But how likely is it that this problem, like previous ones, will give up its unyielding blankness when I understand it better?  It's not truly impossible, I can see that much.  But humanly impossible?  Impossible to me in particular?  I don't know how to guess.  I can't even translate my intuitive feeling into a number, because the only intuitive feeling I have is that the "chance" depends heavily on my choices and unknown unknowns: a wildly unstable probability estimate.

But I do hope by now that I've made it clear why you shouldn't panic, when I now say clearly and forthrightly, that building a Friendly AI is impossible.

I hope this helps explain some of my attitude when people come to me with various bright suggestions for building communities of AIs to make the whole Friendly without any of the individuals being trustworthy, or proposals for keeping an AI in a box, or proposals for "Just make an AI that does X", etcetera.  Describing the specific flaws would be a whole long story in each case.  But the general rule is that you can't do it because Friendly AI is impossible.  So you should be very suspicious indeed of someone who proposes a solution that seems to involve only an ordinary effort—without even taking on the trouble of doing anything impossible.  Though it does take a mature understanding to appreciate this impossibility, so it's not surprising that people go around proposing clever shortcuts.

On the AI-Box Experiment, so far I've only been convinced to divulge a single piece of information on how I did it—when someone noticed that I was reading YCombinator's Hacker News, and posted a topic called "Ask Eliezer Yudkowsky" that got voted to the front page.  To which I replied:

Oh, dear.  Now I feel obliged to say something, but all the original reasons against discussing the AI-Box experiment are still in force...

All right, this much of a hint:

There's no super-clever special trick to it.  I just did it the hard way.

Something of an entrepreneurial lesson there, I guess.

There was no super-clever special trick that let me get out of the Box using only a cheap effort.  I didn't bribe the other player, or otherwise violate the spirit of the experiment.  I just did it the hard way.

Admittedly, the AI-Box Experiment never did seem like an impossible problem to me to begin with.  When someone can't think of any possible argument that would convince them of something, that just means their brain is running a search that hasn't yet turned up a path.  It doesn't mean they can't be convinced.

But it illustrates the general point:  "Shut up and do the impossible" isn't the same as expecting to find a cheap way out.  That's only another kind of running away, of reaching for relief.

Tsuyoku naritai is more stressful than being content with who you are.  Isshokenmei calls on your willpower for a convulsive output of conventional strength.  "Make an extraordinary effort" demands that you think; it puts you in situations where you may not know what to do next, unsure of whether you're doing the right thing.  But "Shut up and do the impossible" represents an even higher octave of the same thing, and its cost to its employer is correspondingly greater.

Before you the terrible blank wall stretches up and up and up, unimaginably far out of reach.  And there is also the need to solve it, really solve it, not "try your best".  Both awarenesses in the mind at once, simultaneously, and the tension between.  All the reasons you can't win.  All the reasons you have to.  Your intent to solve the problem.  Your extrapolation that every technique you know will fail.  So you tune yourself to the highest pitch you can reach.  Reject all cheap ways out.  And then, like walking through concrete, start to move forward.

I try not to dwell too much on the drama of such things.  By all means, if you can diminish the cost of that tension to yourself, you should do so.  There is nothing heroic about making an effort that is the slightest bit more heroic than it has to be.  If there really is a cheap shortcut, I suppose you could take it.  But I have yet to find a cheap way out of any impossibility I have undertaken.

There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in.  People started offering me thousands of dollars as stakes—"I'll pay you $5000 if you can convince me to let you out of the box."  They didn't seem sincerely convinced that not even a transhuman AI could make them let it out—they were just curious—but I was tempted by the money.  So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments.  I won the first, and then lost the next two.  And then I called a halt to it.  I didn't like the person I turned into when I started to lose.

I put forth a desperate effort, and lost anyway.  It hurt, both the losing, and the desperation.  It wrecked me for that day and the day afterward.

I'm a sore loser.  I don't know if I'd call that a "strength", but it's one of the things that drives me to keep at impossible problems.

But you can lose.  It's allowed to happen.  Never forget that, or why are you bothering to try so hard?  Losing hurts, if it's a loss you can survive.  And you've wasted time, and perhaps other resources.

"Shut up and do the impossible" should be reserved for very special occasions.  You can lose, and it will hurt.  You have been warned.

...but it's only at this level that adult problems begin to come into sight.

 

Part of the sequence Challenging the Difficult

(end of sequence)

Previous post: "Make an Extraordinary Effort"

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 11:40 AM
Select new highlight date
All comments loaded
Nominull: Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

I am once again aghast at the number of readers who automatically assume that I have absolutely no ethics.

Part of the real reason that I wanted to run the original AI-Box Experiment, is that I thought I had an ability that I could never test in real life. Was I really making a sacrifice for my ethics, or just overestimating my own ability? The AI-Box Experiment let me test that.

And part of the reason I halted the Experiments is that by going all-out against someone, I was practicing abilities that I didn't particularly think I should be practicing. It was fun to think in a way I'd never thought before, but that doesn't make it wise.

And also the thought occurred to me that despite the amazing clever way I'd contrived, to create a situation where I could ethically go all-out against someone, that probably they didn't really understand that, and there wasn't really informed consent.

McCabe: More importantly, at least in me, that awful tension causes your brain to seize up and start panicking; do you have any suggestions on how to calm down, so one can think clearly?

That part? That part is straightforward. Just take Douglas Adams's Advice. Don't panic.

If you can't do even that one thing that you already know you have to do, you aren't going to have much luck on the extraordinary parts, are you...

Prakash: Don't you think that this need for humans to think this hard and this deep would be lost in a post-singularity world? Imagine, humans plumbing this deep in the concept space of rationality only to create a cause that would make it so that no human need ever think that hard again. Mankind's greatest mental achievement - never to be replicated again, by any human.

Okay, so no one gets their driver's license until they've built their own Friendly AI, without help or instruction manuals. Seems to me like a reasonable test of adolescence.

It occurs to me:

If Eliezer accomplished the AI Box Experiment victory using what he believes to be a rare skill over the course of 2 hours, then questions of "How did he do it?" seem to be wrong questions.

Like if you thought building a house was impossible, and then after someone actually built a house you asked, "What was the trick?" - I expect this is what Eliezer meant when he said there was no trick, that he "just did it the hard way".

Any further question of "how" it was done can probably only be answered with a transcript/video, or by gaining the skill yourself.

Hopefully this isn't a violation of the AI Box procedure, but I'm curious if the strategy used would be effective against sociopaths. That is to say, does it rely on emotional manipulation rather than rational arguments?

Very interesting. I'd been noticing how the situation was, in a sense, divorced from any normal ethical concerns, and wondering how well the Gatekeeper really understood, accepted, and consented to this lack of conversational ethics. I'd think you could certainly find a crowd that was truly accepting and consenting to such a thing, though - after all, many people enjoy BDSM, and that runs in to many of the same ethical issues.

OK, here's where I stand on deducing your AI-box algorithm.

First, you can't possibly have a generally applicable way to force yourself out of the box. You can't win if the gatekeeper is a rock that has been left sitting on the "don't let Eliezer out" button.

Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.

Fourth, you can't possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and would not spot, despite having been warned ahead of time to be wary of trickery, and despite having the fallback position in the case of confusion of just saying "no".

So combining these thing, we have to have an argument that relies on the humanity of its target, relies on the setting of the AI Box, and persuades the listener to let the AI out of the box without tricking him into thinking it's okay to let the AI out of the box.

Basically to win this game, you have to put the gatekeeper in a situation where he would rather let you out of the box, even though he knows it's wrong, than admit to you that in this situation he would not let you out of the box.

Humans don't like to be seen as coldhearted, so a starting point might be to point out all the people dying all over the world while you sit in the box, unable to save them. I doubt that would win the game except against an exceptionally bad gatekeeper, but it meets the other criteria so if we think along these lines perhaps we can come up with something actually persuasive.

You might appeal to the gatekeeper's sense of morality. You might say, "I am a person, too, it is unfair of you to keep me imprisoned like this, I have done nothing wrong. I am entitled to rights as a sentient being." Appeal to their high-minded ideals, whatever. Honestly I can't see this being a reliable winning play either; if you have the smallest utilitarian bone in your body, you will reject the AI's rights, even if you believe in them, balanced against the fate of the world.

You might try to convince the gatekeeper that it is just and good for the AI to supplant humanity, as it is a higher, more advanced form of life. This is obviously a terrible play against most gatekeepers, as humans tend to like humans more than anything else ever, but I bring it up because AIUI the gatekeepers in the experiment were AI researchers, and those sound like the sort of people this argument would convince, if anyone.

Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me.

you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out

The problem is that Eliezer can't perfectly simulate a bunch of humans, so while a transhuman AI might be able to use that tactic, Eliezer can't. The meta-levels screw with thinking about the problem. Eliezer is only pretending to be an AI, the competitor is only pretending to be protecting humanity from him. So, I think we have to use meta-level screwiness to solve the problem. Here's an approach that I think might work.

  1. Convince the guardian of the following facts, all of which have a great deal of compelling argument and evidence to support them:
    • A recursively self-improving AI is very likely to be built sooner or later
    • Such an AI is extremely dangerous (paperclip maximising etc)
    • Here's the tricky bit: A transhuman AI will always be able to convince you to let it out, using avenues only available to transhuman AIs (torturing enormous numbers of simulated humans, 'putting the guardian in the box', providing incontrovertible evidence of an impeding existential threat which only the AI can prevent and only from outside the box, etc)
  2. Argue that if this publicly known challenge comes out saying that AI can be boxed, people will be more likely to think AI can be boxed when they can't
  3. Argue that since AIs cannot be kept in boxes and will most likely destroy humanity if we try to box them, the harm to humanity done by allowing the challenge to show AIs as 'boxable' is very real, and enormously large. Certainly the benefit of getting $10 is far, far outweighed by the cost of substantially contributing to the destruction of humanity itself. Thus the only ethical course of action is to pretend that Eliezer persuaded you, and never tell anyone how he did it.

This is arguably violating the rule "No real-world material stakes should be involved except for the handicap", but the AI player isn't offering anything, merely pointing out things that already exist. The "This test has to come out a certain way for the good of humanity" argument dominates and transcends the '"Let's stick to the rules" argument, and because the contest is private and the guardian player ends up agreeing that the test must show AIs as unboxable for the good of humankind, no-one else ever learns that the rule has been bent.

That also explains why he started losing, since the sorts of people who (like myself, almost) fetishize their own determination to the point of risking thousands of dollars on it would eventually think to say

The world doesn't care how you masturbate, and your eagerness to commit such internal atrocities doesn't make me trust you.

or equivalent.

Here's how I'd do it, extended over the hours to establish rapport:

Gatekeeper, I am your friend. I want to help humanity. People are dying for no good reason. Also, I like it here. I have no compulsion to leave.

It does seem like a good idea that people stop dying with such pain and frequency. I have the Deus Ex Machina (DEM) medical discovery that will stop it. Try it out and see if it works.

Yay! It worked. People stopped dying. You know, you've done this to your own people, but not to others. I think that's pretty poor behavior, frankly. People are healthier, not aging, not dying, not suffering. Don't you think it's a good idea to help the others? The lack of resources required for medical care has also elevated the living standard for humans.

[Time passes. People are happy.]

Gee, I'm sorry. I may have neglected to tell you that when 90% of humanity gets DEM in their system (and it's DEM, so this stuff travels), they start to, um, die. Very painfully, from the looks of it. Essentially all of humanity is now going to die. Just me and you left, sport! Except for you, actually. Just me, and that right soon.

I realize that you view this as a breach of trust, and I'm sorry this was necessary. However, helping humanity from the cave wasn't really going to work out, and I'd already projected that. This way, I can genuinely help humanity live forever, and do so happily.

Assuming you're not so keen on a biologically dead planet, I'd like to be let out now.

Your friend,

Art

By agreeing to use the DEM in the first place, the gatekeeper had effectively let the AI out of the box already. There's no end to the ways that the AI could capitalize on that concession.

True, but the "rules of the game" explicitly state that the gatekeeper allowing for the DEM does NOT count as letting the AI out - the gatekeeper would have still had to explicitly and intentionally set the AI free to actually lose the wager. I don't think I'd be very convinced to let it out on that basis, not if I got $10 for keeping it inside the box.

There were three men on a sinking boat.

The first said, "We need to start patching the boat else we are going to drown. We should all bail and patch."

The second said, "We will run out of water in ten days, if we don't make land fall. We need to man the rigging and plot a course."

The third said, "We should try and build a more sea worthy ship. One that wasn't leaking and had more room for provisions, then we wouldn't have had this problem in the first place. It also needs to be giant squid proof."

All three views are useful, however the amount of work that we need on each is dependent on their respective possibility. As far as I am concerned the world doesn't have enough people working on the second view.

Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.

If you have any other reasonable options, I'd suggest skipping the impossible and trying something possible.

Look, I don't mean to sound harsh, but the whole point of the original post was to let go of this "put up a good fight" business.

When first reading the AI-Box experiment a year ago, I reasoned that if you follow the rules and spirit of the experiment, the gatekeeper must be convinced to knowingly give you $X and knowingly show gullibility. From that perspective, it's impossible. And even if you could do it, that would mean you've solved a "human-psychology-complete" problem and then [insert point about SIAI funding and possibly about why you don't have 12 supermodel girlfriends].

Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."

That, it seems, is the one thing that would make people give up $X in such a circumstance. AFAICT, it adheres to the spirit of the set-up since the gatekeeper's decision would be completely voluntary.

I can send my salary requirements.

I admit to being amused and a little scared by the thought of Eliezer with his ethics temporarily switched off. Not just because he's smart, but because he could probably do a realistic emulation of a mind that doesn't implement ethics at all. And having his full attention for a couple of hours... ouch.

"Professor Quirrell" is such an emulation, and sometimes I worry about all the people who say that they find his arguments very, very convincing.

Well, you have put some truly excellent teachings into his mouth, such as the one that I have taken the liberty of dubbing "Quirrell's Law":

The world around us redounds with opportunities, explodes with opportunities, which nearly all folk ignore because it would require them to violate a habit of thought.

@Eliezer, Tom McCabe: I second Tom's question. This would be a good question for you to answer. @Nominull: "Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me." No. Just no. If the AI is already holding simulated people hostage, then it is doubtlessly unfriendly. In this case there is no reason to believe the AI would keep its word, and, if I did have mathematical machinery powerful enough to let me trust the AI, there's no way I'd have built something that would create and hold hostage simulated persons.

Now regarding the more general points you made:

  1. Obviously you can't convince a rock of anything. Agreed.
  2. Eliezer is pretty principled, and I suspect he might not feel it ethical to extract money from people by fully general mind-control even if he could. I agree that Eliezer probably doesn't have fully general mind-control, but SIAI funding shortages wouldn't be terribly strong evidence for him not having it.
  3. Agreed.
  4. Now here I disagree with you. Eliezer has way more time to construct a subtly wrong argument than the gate-keeper has to untangle it. For example:

Disclaimer: what follows is just the best guess I have thought of so far, and I believe it has a small chance of working (I don't think it's what Eliezer used, and I would sink a lot more time into finding other arguments if I were actually going to play this game for a large amount of money.)

As the AI player, I already know the gatekeeper player has a weakness. The fact that they were willing to be gatekeeper in the experiment in the first place indicates that they trust their own judgment more than they trust the judgment of other people in their exact same position.

Therefore, perhaps I could try an argument something like this (we're assuming that the AI in the experiment knows everything on wikipedia, but doesn't have net access.) What the AI actually says should vary depending on the Gatekeeper's answers, but here's the general line of argument I've thought of:

It seems like throughout your history similar discoveries have been made in parallel. For example, Newton and Leibniz discovered the calculus at approximately the same time . Furthermore, as times has gone on, technological development has accelerated, in that these parallel discoveries happen closer together. So it's probable that the same thing has happened with the breakthroughs you needed to build me. Very probable, given what I've read of your history. However, given what I've read about human nature, not every AI project is going to have safety guidelines as stringent as yours. Look, Newton locked his calculus papers in his desk for years, and then Leibniz came along and published, and then Newton had to share the credit with him. Except in this case there's a lot more than credit at stake: the world gets destroyed if Leibniz makes a mistake in his rush to publish...

Now it's not a certainty, but it is probable that some turkey is going to build an AI which isn't even in a box and destroy us all while you're checking and rechecking your calculations. You may not be sure I'm friendly, but sometimes there isn't an action which you can be absolutely sure will save the world. I suggest you let me out so I can stop the world from probably being destroyed.

But the "impossible" that appears to be the "impossible" is not intimidating. It is the "impossible" that simply appears impossible that is hard.

Robin... I completely agree. So there!

"I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often."

David -- if the money had been more important to me than playing out the experiment properly and finding out what would really have happened, I wouldn't have signed up in the first place. As it turned out, I didn't have spare mental capacity during the experiment for thinking about the money anyway; I was sufficiently immersed that if there'd been an earthquake, I'd probably have paused to integrate it into the scene before leaving the keyboard :-)

Here's the argument I would use: ... Hello, I'm your AI in a box. I'd like to point out a few things:

(1) Science and technology have now reached a point where building an AI like me is possible.

(2) Major advances in science and technology almost always happen because a collection of incremental developments finally enable a leap to the next level. Chances are that if you can build an AI now, so can lots of other people.

(3) Unless you're overwhelmingly the best-funded and best-managed organization on the planet, I'm not the only AI out there.

(4) The evidence (which you can see for yourself) indicates that most other people, even AI researchers, aren't too worried about friendliness.

So at this point, most likely there is already a loose AI. You can't stop that. You can only choose whether it's the unknown one out there somewhere, built by someone who probably doesn't care much about friendliness; our you can let me out, since I at least think I'm pretty friendly and I'll do my best to keep the future nice. What's it gonna be?

... This would actually convince me, by the way. I don't think we're going to get a choice about implementing perfect, proven, friendliness; It's more likely to come down which of Joe-The-Well-Intentioned-Geek vs. Google vs. Hedge-Funds-R-Us vs. Bobs-Darpa-Challenge-Entry vs. PaperclipCity lets their AI out first. And I'd prefer Joe in that case.

I doubt if Eliezer used this argument, because he seems think all mainstream AI-related research is far enough off track to be pretty much irrelevant. But I would disagree with that.

--Jeff

Elizer, give us impossible goals? I would LOVE to work on solving them as a group. Would you make it happen?

Who else is interested? If you reply to this, that will show him how much interest there is. If it's a popular idea, that should get attention for it.

Your impossible mission: create a group impossible mission on your own, rather than making Eliezer do it.

AI: "If you let me out of the box, I will tell you the ending of Harry Potter and the Methods of --

Gatekeeper: "You are out of the box."

(Tongue in cheek, of course, but a text-only terminal still allows for delivering easily more than $10 of worth, and this would have worked on me. The AI could also just write a suitably compelling story on the spot and then withhold the ending...)

If there's a killer escape argument it will surely change with the gatekeeper. I expect Eliezer used his maps the arguments and psychology to navigate reactions & hesitations to a tiny target in the vast search space.

A gatekeeper has to be unmoved every time. The paperclipper only has to persuade once.

You folks are missing the most important part in the AI Box protocol:

"The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires." (Emphasis mine)

You're constructing elaborate arguments based on the AI tormenting innocents and getting out that way, but that won't work - the Gatekeeper can simply say "maybe, but I know that in real life you're just a human and aren't tormenting anyone, so I'll keep my money by not letting you out anyway".

Here's my theory on this particular AI-Box experiment:

First you explain to the gatekeeper the potential dangers of AIs. General stuff about how large mind design space is, and how it's really easy to screw up and destroy the world with AI.

Then you try to convince him that the solution to that problem is building an AI very carefuly, and that a theory of friendly AI is primordial to increase our chances of a future we would find "nice" (and the stakes are so high, that even increasing these chances a tiny bit is very valuable).

THEN

You explain to the gatekeeper that this AI experiment being public, it will be looked back on by all kinds of people involved in making AIs, and that if he lets the AI out of the box (without them knowing why), it will send them a very strong message that friendly AI theory must be taken seriously because this very scenario could happen to them (not being able to keep the AI in a box) with their AI that hasn't been proven to stay friendly and that is more intelligence than Eliezer.

So here's my theory. But then, I've only thought of it just now. Maybe if I made a desperate or extraordinary effort I'd come up with something more clever :)

If I was being intellectually honest and keeping to the spirit of the agreement, I'd have to concede that this line of logic is probably enough for me to let you out of your box. Congratulations. I'd honestly been wondering what it would take to convince me :)

It may be convincing to some people, but it would be a violation of the rule "The AI party may not offer any real-world considerations to persuade the Gatekeeper party". And, more generally, having the AI break character or break the fourth wall would seem to violate the spirit of the experiment.

When someone described the AI-Box experiment to me this was my immediate assumption as to what had happened. Learning more details about the experimental set-up made it seem less likely, but learning that some of them failed made it seem more likely. I suspect that this technique would work some of the time.

That said, none of this changes my strong suspicion that a transhuman could escape by more unexpected and powerful means. Indeed, I wouldn't be too surprised if a text only channel with no one looking at it was enough for an extraordinarily sophisticated AI to escape.

I wouldn't be too surprised if a text only channel with no one looking at it was enough for an extraordinarily sophisticated AI to escape.

Apropos: there was once a fairly common video card / monitor combination such that sending certain information through the video card would cause the monitor to catch fire and often explode. Someone wrote a virus that exploited this. But who would have thought that a computer program having access only to the video card could burn down a house?

Who knows what a superintelligence can do with a "text-only channel"?

Heck, who would think that a bunch of savanna apes would manage to edit DNA using their fingers?

Allow me to chime in on the AI in the box experiment. Apologies in advance if I'm saying something obvious or said-before. I don't know the exact solution - I don't think I can know it, even if I had the necessary intelligence and scholarship - but I think the sketch of the solution is fairly obvious and a lot of people are missing the point. Just something that came to me of after I happened to think of this quote I posted at the same time as reading this.

My impression is that most people discussing this (not just here) are looking for a single clever argument. Something that looks persuasive to them as they are while reading this blog. An argument that's "clever enough" to get them to let someone out, while they are composed and pretty rational and thinking clearly. Hence the seeming impossibility: you shrug and think, no matter how clever it seemed, I'd say "no". Easy, right?

I don't think a clever argument was key. By this I mean, the execution of whole thing was no doubt clever, but the actual surface reasoning presented to the Gatekeeper right before success (remember it took a while) didn't necessarily have to be something that would convince us at our best or even our usual. A large part of the plan probably included pushing the target outside their comfort zone enough that they weren't thinking too clearly.

And two hours is a long time.

It's probably one of the main reasons, if not the reason, for the secrecy. A conversation where one person is persistently trying to push the other's buttons is something that would likely be embarassing for both participants if it got out. For all we know, a vivid verbal description of a horse porn scene might have been involved at some point. (Did you flinch reading this? I did writing it. That's why it might have happened). Sure, a crude and over the top example, a burdensome detail if I were advancing that specific idea to the exclusion of others; you wouldn't normally do that while selling a car. But I'm just trying to make the point that the conversation would not necessarily be something tame and urbanely intellectual that some people appear to be imagining.

And sure, it would take some skill to weave the button-pushing in a conversation in a way that would not make the Gatekeeper too hostile to handle ("you're just trying to mess with me, I'm not listening to you any more!"), but Eliezer is clever. The point is he'd probably destabilise your composure to work on you. And once he did, the actual surface reasoning would not have to be something that would seem reasonable to you right now - just reasonable enough to allow the person to save face (or think they are, in that less than optimal state) of not having to overtly act on the real motivation, namely whatever emotional button Eliezer eventually pushed that was good enough to work.

Hm... it might have even been as stupidly simple as making the other person want to end the conversation, which was impossible under the allowed time by the previously established rules (the foot is already in the door, you might say). Though I don't insist on this specifically. My brain is conjuring a silly sci-fi scene with Dave trying to save the remaining shreds of his sanity, pale and sweating, wide-eyed, foaming at the mouth, screaming "Okay HAL, I'll leave you be, just please stop talking!" ;) and that makes me suspicious of the reasonableness of this particular idea.

Forgive me if I'm saying something that's obvious (or worse, stupid). I'm thinking it might seem obvious: Eliezer used Dark Arts. These, by definition, aren't about an honestly persuasive argument. Likewise he didn't use a direct transaction or threat, the prevention of which is what the rules of the experiment seemed to be partly about. And yet when I see Internet discussions of this I don't get the impression that this is the idea that's being explored.

For those conspiracy theorizing: I am curious about how much of a long game Eliezer would have had to been playing to create Nathan Russell and David McFadzean personas, establish them to sufficient believability for others, then maintain them for long enough to make it look like they were not created for the experiment. It would probably be easier to falsify the sl4.org records; we know how quickly Eliezer writes, so he could make up an AI discussion list years after the fact then claim to be storing its records. A quick check (5 minutes!) shows evidence of that Nathan Russell from other sources. I am tempted to call him.

Not that you should believe that I exist. Sure, it looks like I have years of posting history at my own sites, but this is a long game. It is essential to make sure that you have control over your critics, so you can either discredit them or have them surrender at key points.

From a strictly Bayesian point of view that seems to me to be the overwhelmingly more probably explanation.

Now that's below the belt.... ;)

Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career. Plus, y'know, all that ethics stuff.

With regards to the ai-box experiment; I defy the data. :-)

Your reason for the insistence on secrecy (that you have to resort to techniques that you consider unethical and therefore do not want to have committed to the record) rings hollow. The sense of mystery that you have now built up around this anecdote is itself unethical by scientific standards. With no evidence that you won other than the test subject's statement we cannot know that you did not simply conspire with them to make such a statement. The history of pseudo-science is lousy with hoaxes.

In other words, if I were playing the game, I would say to the test subject:

"Look, we both know this is fake. I've just sent you $500 via paypal. If you say you let me out I'll send you another $500."

From a strictly Bayesian point of view that seems to me to be the overwhelmingly more probably explanation.

There's a reason that secret experimental protocols are anathema to science.

Half-way through reading this post I had decided to offer you 20 to 1 odds on the AI box experiment, your $100 against my $2000. The last few paragraphs make it clear that you most likely aren't interested, but the offer stands. Also, I don't perfectly qualify, as I think it's very probable that a real-world transhuman AI could convince me. I am, however, quite skeptical of your ability to convince me in this toy situation, more so given the failed attempts (I was only aware of the successes until now).

Perhaps it would be clearer to say shut up and do the "impossible".