Pascal's Mugging: Tiny Probabilities of Vast Utilities

The most common formalizations of Occam's Razor, Solomonoff induction and Minimum Description Length, measure the program size of a computation used in a hypothesis, but don't measure the running time or space requirements of the computation.  What if this makes a mind vulnerable to finite forms of Pascal's Wager?  A compactly specified wager can grow in size much faster than it grows in complexity.  The utility of a Turing machine can grow much faster than its prior probability shrinks.

Consider Knuth's up-arrow notation:

  • 3^3 = 3*3*3 = 27
  • 3^^3 = (3^(3^3)) = 3^27 = 3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3 = 7625597484987
  • 3^^^3 = (3^^(3^^3)) = 3^^7625597484987 = 3^(3^(3^(... 7625597484987 times ...)))

In other words:  3^^^3 describes an exponential tower of threes 7625597484987 layers tall.  Since this number can be computed by a simple Turing machine, it contains very little information and requires a very short message to describe.  This, even though writing out 3^^^3 in base 10 would require enormously more writing material than there are atoms in the known universe (a paltry 10^80).

Now suppose someone comes to me and says, "Give me five dollars, or I'll use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people."

Call this Pascal's Mugging.

"Magic powers from outside the Matrix" are easier said than done - we have to suppose that our world is a computing simulation run from within an environment that can afford simulation of arbitrarily large finite Turing machines, and that the would-be wizard has been spliced into our own Turing tape and is in continuing communication with an outside operator, etc.

Thus the Kolmogorov complexity of "magic powers from outside the Matrix" is larger than the mere English words would indicate.  Therefore the Solomonoff-inducted probability, two to the negative Kolmogorov complexity, is exponentially tinier than one might naively think.

But, small as this probability is, it isn't anywhere near as small as 3^^^^3 is large.  If you take a decimal point, followed by a number of zeros equal to the length of the Bible, followed by a 1, and multiply this unimaginably tiny fraction by 3^^^^3, the result is pretty much 3^^^^3.

Most people, I think, envision an "infinite" God that is nowhere near as large as 3^^^^3.  "Infinity" is reassuringly featureless and blank.  "Eternal life in Heaven" is nowhere near as intimidating as the thought of spending 3^^^^3 years on one of those fluffy clouds.  The notion that the diversity of life on Earth springs from God's infinite creativity, sounds more plausible than the notion that life on Earth was created by a superintelligence 3^^^^3 bits large.  Similarly for envisioning an "infinite" God interested in whether women wear men's clothing, versus a superintelligence of 3^^^^3 bits, etc.

The original version of Pascal's Wager is easily dealt with by the gigantic multiplicity of possible gods, an Allah for every Christ and a Zeus for every Allah, including the "Professor God" who places only atheists in Heaven.   And since all the expected utilities here are allegedly "infinite", it's easy enough to argue that they cancel out.  Infinities, being featureless and blank, are all the same size.

But suppose I built an AI which worked by some bounded analogue of Solomonoff induction - an AI sufficiently Bayesian to insist on calculating complexities and assessing probabilities, rather than just waving them off as "large" or "small".

If the probabilities of various scenarios considered did not exactly cancel out, the AI's action in the case of Pascal's Mugging would be overwhelmingly dominated by whatever tiny differentials existed in the various tiny probabilities under which 3^^^^3 units of expected utility were actually at stake.

You or I would probably wave off the whole matter with a laugh, planning according to the dominant mainline probability:  Pascal's Mugger is just a philosopher out for a fast buck.

But a silicon chip does not look over the code fed to it, assess it for reasonableness, and correct it if not.  An AI is not given its code like a human servant given instructions.  An AI is its code.  What if a philosopher tries Pascal's Mugging on the AI for a joke, and the tiny probabilities of 3^^^^3 lives being at stake, override everything else in the AI's calculations?   What is the mere Earth at stake, compared to a tiny probability of 3^^^^3 lives?

How do I know to be worried by this line of reasoning?  How do I know to rationalize reasons a Bayesian shouldn't work that way?  A mind that worked strictly by Solomonoff induction would not know to rationalize reasons that Pascal's Mugging mattered less than Earth's existence.  It would simply go by whatever answer Solomonoff induction obtained.

It would seem, then, that I've implicitly declared my existence as a mind that does not work by the logic of Solomonoff, at least not the way I've described it.  What am I comparing Solomonoff's answer to, to determine whether Solomonoff induction got it "right" or "wrong"?

Why do I think it's unreasonable to focus my entire attention on the magic-bearing possible worlds, faced with a Pascal's Mugging?  Do I have an instinct to resist exploitation by arguments "anyone could make"?  Am I unsatisfied by any visualization in which the dominant mainline probability leads to a loss?  Do I drop sufficiently small probabilities from consideration entirely?  Would an AI that lacks these instincts be exploitable by Pascal's Mugging?

Is it me who's wrong?  Should I worry more about the possibility of some Unseen Magical Prankster of very tiny probability taking this post literally, than about the fate of the human species in the "mainline" probabilities?

It doesn't feel to me like 3^^^^3 lives are really at stake, even at very tiny probability.  I'd sooner question my grasp of "rationality" than give five dollars to a Pascal's Mugger because I thought it was "rational".

Should we penalize computations with large space and time requirements?  This is a hack that solves the problem, but is it true? Are computationally costly explanations less likely?  Should I think the universe is probably a coarse-grained simulation of my mind rather than real quantum physics, because a coarse-grained human mind is exponentially cheaper than real quantum physics?  Should I think the galaxies are tiny lights on a painted backdrop, because that Turing machine would require less space to compute?

Given that, in general, a Turing machine can increase in utility vastly faster than it increases in complexity, how should an Occam-abiding mind avoid being dominated by tiny probabilities of vast utilities?

If I could formalize whichever internal criterion was telling me I didn't want this to happen, I might have an answer.

I talked over a variant of this problem with Nick Hay, Peter de Blanc, and Marcello Herreshoff in summer of 2006.  I don't feel I have a satisfactory resolution as yet, so I'm throwing it open to any analytic philosophers who might happen to read Overcoming Bias.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 11:17 AM
Select new highlight date
All comments loaded

Why would not giving him $5 make it more likely that people would die, as opposed to less likely? The two would seem to cancel out. It's the same old "what if we are living in a simulation?" argument- it is, at least, possible that me hitting the sequence of letters "QWERTYUIOP" leads to a near-infinity of death and suffering in the "real world", due to AGI overlords with wacky programming. Yet I do not refrain from hitting those letters, because there's no entanglement which drives the probabilities in that direction as opposed to some other random direction; my actions do not alter the expected future state of the universe. You could just as easily wind up saving lives as killing people.

Because he said so, and people tend to be true to their word more often than dictated by chance.

The mugger claims to not be a 'person' in the conventional sense, but rather an entity with outside-Matrix powers. If this statement is true, then generalized observations about the reference class of 'people' cannot necessarily be considered applicable.

Conversely, if it is false, then this is not a randomly-selected person, but rather someone who has started off the conversation with an outrageous profit-motivated lie, and as such cannot be trusted.

They claim to not be a human. They're still a person, in the sense of a sapient being. As a larger class, you'd expect lower correlation, but it would still be above zero.

"Would any commenters care to mug Tiiba? I can't quite bring myself to do it, but it needs doing."

If you don't donate $5 to SIAI, some random guy in China will die of a heart attack because we couldn't build FAI fast enough. Please donate today.

Tom and Andrew, it seems very implausible that someone saying "I will kill 3^^^^3 people unless X" is literally zero Bayesian evidence that they will kill 3^^^^3 people unless X. Though I guess it could plausibly be weak enough to take much of the force out of the problem.

Andrew, if we're in a simulation, the world containing the simulation could be able to support 3^^^^3 people. If you knew (magically) that it couldn't, you could substitute something on the order of 10^50, which is vastly less forceful but may still lead to the same problem.

Andrew and Steve, you could replace "kill 3^^^^3 people" with "create 3^^^^3 units of disutility according to your utility function". (I respectfully suggest that we all start using this form of the problem.)

Michael Vassar has suggested that we should consider any number of identical lives to have the same utility as one life. That could be a solution, as it's impossible to create 3^^^^3 distinct humans. But, this also is irrelevant to the create-3^^^^3-disutility-units form.

IIRC, Peter de Blanc told me that any consistent utility function must have an upper bound (meaning that we must discount lives like Steve suggests). The problem disappears if your upper bound is low enough. Hopefully any realistic utility function has such a low upper bound, but it'd still be a good idea to solve the general problem.

I see a similarity to the police chief example. Adopting a policy of paying attention to any Pascalian muggings would encourage others to manipulate you using them. At first it doesn't seem like this would have nearly enough disutility to justify ignoring muggings, but it might when you consider that it would interfere with responding to any real threat (unlikely as it is) of 3^^^^3 deaths.

create 3^^^^3 units of disutility according to your utility function

For all X:

If your utility function assigns values to outcomes that differ by a factor of X, then you are vulnerable to becoming a fanatic who banks on scenarios that only occur with probability 1/X. As simple as that.

If you think that banking on scenarios that only occur with probability 1/X is silly, then you have implicitly revealed that your utility function only assigns values in the range [1,Y], where Y<X, and where 1 is the lowest utility you assign.

If you think that banking on scenarios that only occur with probability 1/X is silly, then you have implicitly revealed that your utility function only assigns values in the range [1,Y], where Y<X, and where 1 is the lowest utility you assign.

... or your judgments of silliness are out of line with your utility function.

However clever your algorithm, at that level, something's bound to confuse it.

Odd, I've been reading moral paradoxes for many years and my brain never crashed once, nor have I turned evil. I've been confused but never catastrophically so (though I have to admit my younger self came close). My algorithm must be "beyond clever".

That's a remarkable level of resilience for a brain design which is, speaking professionally, a damn ugly mess. If I can't do aspire to do at least that well, I may as well hang up my shingle and move in with the ducks.

The modern human nervous system is the result of upwards of a hundred thousand years of brutal field-testing. The basic components, and even whole submodules, can be traced back even further. A certain amount of resiliency is to be expected. If you want to start from scratch and aspire to the same or higher standards of performance, it might be sensible to be prepared to invest the same amount of time and capital that the BIG did.

That you have not yet been crippled by a moral paradox or other standard rhetorical trick is comparable to saying that a server remains secure after a child spent an afternoon poking around with it and trying out lists of default passwords: a good sign, certainly, and a test many would fail, but not in itself proof of perfection.

It seems to me that the cancellation is an artifact of the particular example, and that it would be easy to come up with an example in which the cancellation does not occur. For example, maybe you have previous experience with the mugger. He has mugged you before about minor things and sometimes you have paid him and sometimes not. In all cases he has been true to his word. This would seem to tip the probabilities at least slightly in favor of him being truthful about his current much larger threat.

3. Even if you don't accept 1 and 2 above, there's no reason to expect that the person is telling the truth. He might kill the people even if you give him the $5, or conversely he might not kill them even if you don't give him the $5.

But if a Bayesian AI actually calculates these probabilities by assessing their Kolmogorov complexity - or any other technique you like, for that matter - without desiring that they come out exactly equal, can you rely on them coming out exactly equal? If not, an expected utility differential of 2 to the negative googolplex times 3^^^^3 still equals 3^^^^3, so whatever tiny probability differences exist will dominate all calculations based on what we think of as the "real world" (the mainline of probability with no wizards).

if you have the imagination to imagine X to be super-huge, you should be able to have the imagination to imagine p to be super-small

But we can't just set the probability to anything we like. We have to calculate it, and Kolmogorov complexity, the standard accepted method, will not be anywhere near that super-small.

Tom and Andrew, it seems very implausible that someone saying "I will kill 3^^^^3 people unless X" is literally zero Bayesian evidence that they will kill 3^^^^3 people unless X. Though I guess it could plausibly be weak enough to take much of the force out of the problem.

Nothing could possibly be that weak.

Tom is right that the possibility that typing QWERTYUIOP will destroy the universe can be safely ignored; there is no evidence either way, so the probability equals the prior, and the Solomonoff prior that typing QWERTYUIOP will save the universe is, as far as we know, exactly the same.

Exactly the same? These are different scenarios. What happens if an AI actually calculates the prior probabilities, using a Solomonoff technique, without any a priori desire that things should exactly cancel out?

I'm hereby anti-mugging you all. If any of you give in to a Pascal's Mugging scenario, I'll do something much worse than whatever the mugger threatened. Consider yourself warned!

Konrad: In computational terms, you can't avoid using a 'hack'. Maybe not the hack you described, but something, somewhere has to be hard-coded.

Well, yes. The alternative to code is not solipsism, but a rock, and even a rock can be viewed as being hard-coded as a rock. But we would prefer that the code be elegant and make sense, rather than using a local patch to fix specific problems as they come to mind, because the latter approach is guaranteed to fail if the AI becomes more powerful than you and refuses to be patched.

Andrew: You're saying that your priors have to come from some rigorous procedure

The priors have to come from some computable procedure. We would prefer it to be a good one, as agents with nonsense priors will not attain sensible posteriors.

but your utility comes from simply transcribing what some dude says to you.

No. Certain hypothetical scenarios, which we describe using the formalism of Turing machines, have fixed utilities - that is, if some description of the universe is true, it has a certain utility.

The problem with this scenario is not that we believe everything the dude tells us. The problem is that the description of a certain very large universe with a very large utility, does not have a correspondingly tiny prior probability if we use Solmonoff's prior. And then as soon as we see any evidence, no matter how tiny, anything whose entanglement is not as tiny as the very large universe is large, that expected utility differential instantly wipes out all other factors in our decision process.

Second, even if for some reason you really want to work with the utility of 3^^^^3, there's no good reason for you not to consider the possibility that it's really -3^^^^3, and so you should be doing the opposite.

A Solomonoff inductor might indeed consider it, though there's the problem of any bounded rationalist not being able to consider all computations. It seems "reasonable" for a bounded mind to consider it here; you did, after all.

The issue is not that two huge numbers will exactly cancel out; the point is that you're making up all the numbers here but are artificially constraining the expected utility differential to be positive.

Let the differential be negative. Same problem. If the differential is not zero, the AI will exhibit unreasonable behavior. If the AI literally thinks in Solomonoff induction (as I have described), it won't want the differential to be zero, it will just compute it.

It might be more promising to assume that states with many people hurt have a low correlation with what any random person claims to be able to effect.

Robin: Great point about states with many people having low correlations with what one random person can effect. This is fairly trivially provable.

Aha!

For some reason, that didn't click in my mind when Robin said it, but it clicked when Vassar said it. Maybe it was because Robin specified "many people hurt" rather than "many people", or because Vassar's part about being "provable" caused me to actually look for a reason. When I read Robin's statement, it came through as just "Arbitrarily penalize probabilities for a lot of people getting hurt."

But, yes, if you've got 3^^^^3 people running around they can't all have sole control over each other's existence. So in a scenario where lots and lots of people exist, one has to penalize by a proportional factor the probability that any one person's binary decision can solely control the whole bunch.

Even if the Matrix-claimant says that the 3^^^^3 minds created will be unlike you, with information that tells them they're powerless, if you're in a generalized scenario where anyone has and uses that kind of power, the vast majority of mind-instantiations are in leaves rather than roots.

This seems to me to go right to the root of the problem, not a full-fledged formal answer but it feels right as a starting point. Any objections?

People have been talking about assuming that states with many people hurt have a low (prior) probability. It might be more promising to assume that states with many people hurt have a low correlation with what any random person claims to be able to effect.

g: killing 3^^^^^3 puppies doesn't feel much worse than killing 3^^^^3 puppies

...

..........................

I hereby award G the All-Time Grand Bull Moose Prize for Non-Extensional Reasoning and Scope Insensitivity.

Clough: On the contrary, I think it is not only that weak but actually far weaker. If you are willing to consider the existance of things like 3^^^3 units of disutility without considering the existence of chances like 1/4^^^4 then I believe that is the problem that is causing you so much trouble.

I'm certainly willing to consider the existence of chances like that, but to arrive at such a calculation, I can't be using Solomonoff induction.

Consider the plight of the first nuclear physicists, trying to calculate whether an atomic bomb could ignite the atmosphere. Yes, they had to do this calculation! Should they have not even bothered, because it would have killed so many people that the prior probability must be very low? The essential problem is that the universe doesn't care one way or the other and therefore events do not in fact have probabilities that diminish with increasing disutility.

Likewise, physics does not contain a clause prohibiting comparatively small events from having large effects. Consider the first replicator in the seas of ancient Earth.

Tiiba: You don't want an AI to think like this because you don't want it to kill you. Meanwhile, to a true altruist, it would make perfect sense.

So you're biting the bullet and saying that, faced with a Pascal's Mugger, you should give him the five dollars?

Would any commenters care to mug Tiiba? I can't quite bring myself to do it, but it needs doing.

Krishnaswami: Utility functions have to be bounded basically because genuine martingales screw up decision theory -- see the St. Petersburg Paradox for an example.

One deals with the St. Petersburg Paradox by observing that the resources of the casino are finite; it is not necessary to bound the utility function itself when you can bound the game within your world-model.

This is an instance of the general problem of attaching a probability to matrix scenarios. And you can pascal-mug yourself, without anyone showing up to assert or demand anything - just think: what if things are set up so that whether I do, or do not do, something, determines whether those 3^^^^3 people will be created and destroyed? It's just as possible as the situation in which a messenger from Outside shows up and tells you so.

The obvious way to attach probabilities to matrix scenarios is to have a unified notion of possible world capacious enough to encompass both matrix worlds and worlds in which your current experiences are veridical; and then you look at relative frequencies or portions of world-measure for the two classes of possibility. For example, you could assume the correctness of our current physics across all possible worlds, and then make a Drake/Bostrom-like guesstimate of the frequency of matrix construction across all those universes, and of the "demography" and "political character" of those simulations. Garbage in, garbage out; but you really can get an answer if you make enough assumptions. In that regard, it is not too different to any other complicated decision made against a background of profound uncertainty.

[Late edit: I have since retracted this solution as wrong, see comments below; left here for completeness. The ACTUAL solution that really works I've written in a different comment :) ]

I do believe I've solved this. Don't know if anyone is still reading or not after all this time, but here goes.

Eliezer speaks of the symmetry of Pascal's wager; I'm going to use something very similar here to solve the issue. The number of things that could happen next - say, in the next nanosecond - is infinite, or at the very least incalculable. A lot of mundane things could happen, or a lot of unforeseen things could happen. It could happen that a car would go through my living room and kill me. Or it could happen that the laws of energy conservation were violated and the whole world would turn into bleu cheese. Each of these possibilities could, in theory, have a probability assigned to it, given our priors.

But! We only have enough computing power to calculate a finite number of outcomes at any given moment. That means that we CANNOT go around assigning probabilities by calculation. Rather, we're going to need some heuristic to deal with all the probabilities we do NOT calculate.

Suppose our AI is very good at predicting things. It manages to assign SOME probability to what will happen next about 99% of the time (Note: My solution works equally well for anything from 0% to 100% minus epsilon - and I shouldn't have to explain why a Bayesian AI should never be 100% certain that it got an answer right). That means that 1% of the time, something REALLY surprises it; it just did not assign any probability at all. Now, because the number of things that could be in that category is infinite, they cancel out. Sure, we could all turn to cheese if it says "abracadabra". Or we could turn to cheese UNLESS it says so. The utility functions will always end in 0 for the uncalculated mass of probabilities.

That means that the AI always works under the assumptions that "or something I didn't see coming will happen; but I must be neutral regarding such an outcome until I know more about it".

Now. Say the AI manages to consider 1 million possibilities per prediction it makes (how it still gets 1% of them wrong is beyond me but again, the exact number doesn't matter for my solution). So any outcome that has NOT been calculated could, in fact, be considered to have a probability of 1%/ 1 million - not because there are only a million possibilities the AI hasn't considered, but because that is how many it could TRY to consider.

This number is your cutoff. Before you multiply a probability with a utility function, you subtract this number from the probability, first. So now if someone comes up to you and says it'll kill 3^^^^3 people and you decide to actually spend the cycles to consider how likely that is, and you get 1/googol, that number is LESS than the background noise of everything you don't have time to calculate. You round it down to zero, not because it is arbitrarily small enough, but because anything you have not considered for calculation must be considered to have higher probability - and like in Pascal's wager, those options' utility is infinite and can counter any number that Pascal's Mugger can throw at me. You subtract, not an arbitrary number, but rather a number depending on how long the AI is thinking about the problem; how many possibilities it takes into account.

Does this solve the problem? I think it does.

(By the way: ChrisA's way also works against this problem, except that coding your AI so that it may disregard value and morality if certain conditions are met seems like a pretty risky proposition).

Benquo, replace "kill 3^^^^3 people" with "create 3^^^^3 disutility units" and the problem reappears.

Michael, do you really think the mugger's statement is zero evidence?

Very interesting thought experiment!

One place where it might fall down is that our disutility for causing deaths is probably not linear in the number of deaths, just as our utility for money flattens out as the amount gets large. In fact, I could imagine that its value is connected to our ability to intuitively grasp the numbers involved. The disutility might flatten out really quickly so that the disutility of causing the death of 3^^^^3 people, while large, is still small enough that the small probabilities from the induction are not overwhelmed by it.

The guy with the button could threaten to make an extra-planar factory farm containing 3^^^^^3 pigs instead of killing 3^^^^3 humans. If utilities are additive, that would be worse.

Congratulations, you made my brain asplode.

I think that Robin's point solves this problem, but doesn't solve the more general problem of an AGI's reaction to low probability high utility possibilities and the attendant problems of non-convergence.
The guy with the button could threaten to make an extra-planar factory farm containing 3^^^^^3 pigs instead of killing 3^^^^3 humans. If utilities are additive, that would be worse.

Robin: Great point about states with many people having low correlations with what one random person can effect. This is fairly trivially provable.

Utilitarian: Equal priors due to complexity, equal posteriors due to lack of entanglement between claims and facts.

Wei Dai, Eliezer, Stephen, g: This is a great thread, but it's getting very long, so it seems likely to be lost to posterity in practice. Why don't the three of you read the paper Neel Krishnaswami referenced, have a chat, and post it on the blog, possibly edited, as a main post?

"The paper I referenced:

Vann McGee (1999)
An airtight Dutch book
Analysis 59 (264), 257–265.

Posted by: Neel Krishnaswami | October 20, 2007 at 06:29 PM"

Is the value of my existence steadily shrinking as the universe expands and it requires more information to locate me in space?

If I make a large uniquely structured arrow pointing at myself from orbit so that a very simple Turing machine can scan the universe and locate me, does the value of my existence go up?

I am skeptical that this solution makes moral sense, however convenient it might be as a patch to this particular problem.

You could always just give up being a consequentialist and ontologically refuse to give in to the demands of anyone taking part in a Pascal mugging because consistently doing so would lead to the breakdown of society.

pdf23ds, under certain straightforward physical assumptions, 3^^^^3 people wouldn't even fit in anyone's future light-cone, in which case the probability is literally zero. So the assumption that our apparent physics is the physics of the real world too, really could serve to decide this question. The only problem is that that assumption itself is not very reasonable.

Lacking for the moment a rational way to delimit the range of possible worlds, one can utilize what I'll call a Chalmers prior, which simply specifies directly how much time you will spend thinking about matrix scenarios. (I name it for David Chalmers because I once heard him give an estimate of the odds that we are in a matrix; I think it was about 10%.) The rationality of having a Chalmers prior can be justified by observing one's own cognitive resource-boundedness, and the apparently endless amount of time one could spend thinking about matrix scenarios. (Is there a name for this sort of scheduling meta-heuristic, in which one limits the processing time available for potentially nonterminating lines of thought?)

This case seems to suggest the existence of new interesting rationality constraints, which would go into choosing rational probabilities and utilities. It might be worth working out what constraints one would have to impose to make an agent immune to such a mugging.

Incidentally: How would it affect your intuition if you instead could participate in the Intergalactic Utilium Lottery, where probabilities and payoffs are the same but where you trust the organizers that they do what they promise?

Wei: You do not treat people in these two branches as equals, but instead value the people in the higher-weight branch more, right? Can you answer why you consider that to be the right thing to do?

Robin Hanson's guess about mangled worlds seems very elegant to me, since it means that I can run a (large) computer with conventional quantum mechanics programmed into it, no magic in its transistors, and the resulting simulation will contain sentient beings who experience the same probabilities we do.

Even so, I'd have to confess myself confused about why I find myself in a simple universe rather than a noisy one.