Related to: Can Counterfactuals Be True?, Newcomb's Problem and Regret of Rationality.

Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, Omega tells you that if the coin came up heads instead of tails, it'd give you $10000, but only if you'd agree to give it $100 if the coin came up tails.

Omega can predict your decision in case it asked you to give it $100, even if that hasn't actually happened, it can compute the counterfactual truth. Omega is also known to be absolutely honest and trustworthy, no word-twisting, so the facts are really as it says, it really tossed a coin and really would've given you $10000.

From your current position, it seems absurd to give up your $100. Nothing good happens if you do that, the coin has already landed tails up, you'll never see the counterfactual $10000. But look at this situation from your point of view before Omega tossed the coin. There, you have two possible branches ahead of you, of equal probability. On one branch, you are asked to part with $100, and on the other branch, you are conditionally given $10000. If you decide to keep $100, the expected gain from this decision is $0: there is no exchange of money, you don't give Omega anything on the first branch, and as a result Omega doesn't give you anything on the second branch. If you decide to give $100 on the first branch, then Omega gives you $10000 on the second branch, so the expected gain from this decision is

-$100 * 0.5 + $10000 * 0.5 = $4950

So, this straightforward calculation tells that you ought to give up your $100. It looks like a good idea before the coin toss, but it starts to look like a bad idea after the coin came up tails. Had you known about the deal in advance, one possible course of action would be to set up a precommitment. You contract a third party, agreeing that you'll lose $1000 if you don't give $100 to Omega, in case it asks for that. In this case, you leave yourself no other choice.

But in this game, explicit precommitment is not an option: you didn't know about Omega's little game until the coin was already tossed and the outcome of the toss was given to you. The only thing that stands between Omega and your 100$ is your ritual of cognition. And so I ask you all: is the decision to give up $100 when you have no real benefit from it, only counterfactual benefit, an example of winning?

P.S. Let's assume that the coin is deterministic, that in the overwhelming measure of the MWI worlds it gives the same outcome. You don't care about a fraction that sees a different result, in all reality the result is that Omega won't even consider giving you $10000, it only asks for your $100. Also, the deal is unique, you won't see Omega ever again.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 6:50 PM
Select new highlight date
All comments loaded

The counterfactual anti-mugging: One day No-mega appears. No-mega is completely trustworthy etc. No-mega describes the counterfactual mugging to you, and predicts what you would have done in that situation not having met No-mega, if Omega had asked you for $100.

If you would have given Omega the $100, No-mega gives you nothing. If you would not have given Omega $100, No-mega gives you $10000. No-mega doesn't ask you any questions or offer you any choices. Do you get the money? Would an ideal rationalist get the money?

Okay, next scenario: you have a magic box with a number p inscribed on it. When you open it, either No-mega comes out (probability p) and performs a counterfactual anti-mugging, or Omega comes out (probability 1-p), flips a fair coin and proceeds to either ask for $100, give you $10000, or give you nothing, as in the counterfactual mugging.

Before you open the box, you have a chance to precommit. What do you do?

Imagine that one day you come home to see your neighbors milling about your house and the Publisher's Clearinghouse (PHC) van just pulling away. You know that PHC has been running a new schtick recently of selling $100 lottery tickets to win $10,000 instead of just giving money away. In fact, you've used that very contest as a teachable moment with your kids to explain how once the first ticket of the 100 printed was sold, scratched, and determined not to be the winner -- that the average expected value of the remaining tickets was greater than their cost and they were therefore increasingly worth buying. Now, it's weeks later, most of the tickets have been sold, scratched, and not winners and they came to your house. In fact, there were only two tickets remaining. And you weren't home. Fortunately, your neighbor and best friend Bob asked if he could buy the ticket for you. Sensing a great human interest story (and lots of publicity), PHC said yes. Unfortunately, Bob picked the wrong ticket. After all your neighbors disperse and Bob and you are alone, Bob says that he'd really appreciate it if he could get his hundred dollars back. Is he mugging you? Or, do you give it to him?

Philosopher Kenny Easwaran reported in 2007 that:

Josh von Korff, a physics grad student here at Berkeley, and versions of Newcomb’s problem. He shared my general intuition that one should choose only one box in the standard version of Newcomb’s problem, but that one should smoke in the smoking lesion example. However, he took this intuition seriously enough that he was able to come up with a decision-theoretic protocol that actually seems to make these recommendations. It ends up making some other really strange predictions, but it seems interesting to consider, and also ends up resembling something Kantian!

The basic idea is that right now, I should plan all my future decisions in such a way that they maximize my expected utility right now, and stick to those decisions. In some sense, this policy obviously has the highest expectation overall, because of how it’s designed.

Korff also reinvents counterfactual mugging:

Here’s another situation that Josh described that started to make things seem a little more weird. In Ancient Greece, while wandering on the road, every day one either encounters a beggar or a god. If one encounters a beggar, then one can choose to either give the beggar a penny or not. But if one encounters a god, then the god will give one a gold coin iff, had there been a beggar instead, one would have given a penny. On encountering a beggar, it now seems intuitive that (speaking only out of self-interest), one shouldn’t give the penny. But (assuming that gods and beggars are randomly encountered with some middling probability distribution) the decision protocol outlined above recommends giving the penny anyway.

In a sense, what’s happening here is that I’m giving the penny in the actual world, so that my closest counterpart that runs into a god will receive a gold coin. It seems very odd to behave like this, but from the point of view before I know whether or not I’ll encounter a god, this seems to be the best overall plan. But as Josh points out, if this was the only way people got food, then people would see that the generous were doing well, and generosity would spread quickly.

And he looks into generalizing to the algorithmic version:

If we now imagine a multi-agent situation, we can get even stronger (and perhaps stranger) results. If two agents are playing in a prisoner’s dilemma, and they have common knowledge that they are both following this decision protocol, then it looks like they should both cooperate. In general, if this decision protocol is somehow constitutive of rationality, then rational agents should always act according to a maxim that they can intend (consistently with their goals) to be followed by all rational agents. To get either of these conclusions, one has to condition one’s expectations on the proposition that other agents following this procedure will arrive at the same choices.

Korff is now an Asst. Prof. at Georgie State.

Hi,

My name is Omega. You may have heard of me.

Anyway, I have just tossed a fair coin, and given that the coin came up tails, I'm gonna have to ask each of you to give me $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, if the coin came up heads instead of tails, I'd have given you each $10000, but only to those that would agree to give me $100 if the coin came up tails.

You forgot to add that we have sufficient reason to believe everything you say.

My two bits: Omega's request is unreasonable.

Precommitting is something that you can only do before the coin is flipped. That's what the "pre" means. Omega's game rewards a precommitment, but Omega is asking for a commitment.

Precommitting is a rational thing to do because before the coin toss, the result is unknown and unknowable, even by Omega (I assume that's what "fair coin" means). This is a completely different course of action than committing after the coin toss is known! The utility computation for precommitment is not and should not be the same as the one for commitment.

In the example, you have access to information that pre-you doesn't (the outcome of the flip). If rationalists are supposed to update on new information, then it is irrational for you to behave like pre-you.

We're assuming Omega is trustworthy? I'd give it the $100, of course.

Had the coin come up differently, Omega might have explained the secrets of friendly artificial general intelligence. However, he now asks that you murder 15 people.

Omega remains completely trustworthy, if a bit sick.

Ha, I'll re-raise: Had the coin come up differently, Omega would have filled ten Hubble volumes with CEV-output. However, he now asks that you blow up this Hubble volume.

(Not only do you blow up the universe (ending humanity for eternity) you're glad that Omega showed to offer this transparently excellent deal. Morbid, ne?)

Can you please explain the reasoning behind this? Given all of the restrictions mentioned (no iterations, no possible benefit to this self) I can't see any reason to part with my hard earned cash. My "gut" says "Hell no!" but I'm curious to see if I'm missing something.

There are various intuition pumps to explain the answer.

The simplest is to imagine that a moment from now, Omega walks up to you and says "I'm sorry, I would have given you $10000, except I simulated what would happen if I asked you for $100 and you refused". In that case, you would certainly wish you had been the sort of person to give up the $100.

Which means that right now, with both scenarios equally probable, you should want to be the sort of person who will give up the $100, since if you are that sort of person, there's half a chance you'll get $10000.

If you want to be the sort of person who'll do X given Y, then when Y turns up, you'd better bloody well do X.

If you want to be the sort of person who'll do X given Y, then when Y turns up, you'd better bloody well do X.

Well said. That's a lot of the motivation behind my choice of decision theory in a nutshell.

Thanks, it's good to know I'm on the right track =)

I think this core insight is one of the clearest changes in my thought process since starting to read OB/LW -- I can't imagine myself leaping to "well, I'd hand him $100, of course" a couple years ago.

If you want to be the sort of person who'll do X given Y, then when Y turns up, you'd better bloody well do X.

I think this describes one of the core principles of virtue theory under any ethical system.

I wonder how much it depends upon accidents of human psychology, like our tendency to form habits, and how much of it is definitional (if you don't X when Y, then you're simply not the sort of person who Xes when Y)

I work on AI. In particular, on decision systems stable under self-modification. Any agent who does not give the $100 in situations like this will self-modify to give $100 in situations like this. I don't spend a whole lot of time thinking about decision theories that are unstable under reflection. QED.

I really fail to see why you're all so fascinated by Newcomb-like problems. When you break causality, all logic based on causality doesn't function any more. If you try to model it mathematically, you will get inconsistent model always.

There's no need to break causality. You are a being implemented in chaotic wetware. However, there's no reason to think we couldn't have rational agents implemented in much more predictable form, as python routines for example, so that any being with superior computation power could simply inspect the source and determine what the output would be.

In such a case, Newcomb-like problems would arise, perfectly lawfully, under normal physics.

In fact, Newcomb-like problems fall naturally out of any ability to simulate and predict the actions of other agents. Omega as described is essentially the limit as predictive power goes to infinity.

You know, if Omega is truly doing a full simulation of my cognitive algorithm, then it seems my interactions with him should be dominated by my desire for him to stop it, since he is effectively creating and murdering copies of me.

The decision doesn't need to be read off from a straightforward simulation, it can be an on-demand, so to say, reconstruction of the outcome from the counterfactual. I believe it should be possible to calculate just your decision, without constructing a morally significant computation. Knowing your decision may be as simple as checking whether you adhere a certain decision theory.

I convinced myself to one-box in Newcomb by simply treating it as if the contents of the boxes magically change when I made my decision. Simply draw the decision tree and maximize u-value.

I convinced myself to cooperate in the Prisoner's Dilemma by treating it as if whatever decision I made the other person would magically make too. Simply draw the decision tree and maximize u-value.

It seems that Omega is different because I actually have the information, where in the others I don't.

For example, In Newcomb, if we could see the contents of both boxes, then I should two-box, no? In the Prisoner's Dilemma, if my opponent decides before me and I observe the decision, then I should defect, no?

I suspect that this means that my thought process in Newcomb and the Prisoner's Dilemma is incorrect. That there is a better way to think about them that makes them more like Omega. Am I correct? Does this make sense?

Yes, the objective in designing this puzzle was to construct an example where according to my understanding of the correct way to make decision, the correct decision looks like losing. In other cases you may say that you close your eyes, pretend that your decision determines the past or other agents' actions, and just make the decision that gives the best outcome. In this case, you choose the worst outcome. The argument is that on reflection it still looks like the best outcome, and you are given an opportunity to think about what's the correct perspective from which it's the best outcome. It binds the state of reality to your subjective perspective, where in many other thought experiments you may dispense with this connection and focus solely on the reality, without paying any special attention to the decision-maker.