Newcomb's Problem: A problem for Causal Decision Theories

This is part of a sequence titled, "Introduction to decision theory"


The previous post is "An introduction to decision theory"


In the previous post I introduced evidential and causal decision theories. The principle question that needs resolving with regards to these is whether using these decision theories leads to making rational decisions. The next two posts will show that both causal and evidential decision theories fail to do so and will try to set the scene so that it’s clear why there’s so much focus given on Less Wrong to developing new decision theories.

 

Newcomb’s Problem

 

Newcomb’s Problem asks us to imagine the following situation:

 

Omega, an unquestionably honest, all knowing agent with perfect powers of prediction, appears, along with two boxes.  Omega tells you that it has placed a certain sum of money into each of the boxes. It has already placed the money and will not now change the amount.  You are then asked whether you want to take just the money that is in the left hand box or whether you want to take the money in both boxes.

 

However, here’s where it becomes complicated. Using its perfect powers of prediction, Omega predicted whether you would take just the left box (called “one boxing”) or whether you would take both boxes (called “two boxing”).Either way, Omega put $1000 in the right hand box but filled the left hand box as follows:

 

 If he predicted you would take only the left hand box, he put $1 000 000 in the left hand box.

 

If he predicted you would take both boxes, he put $0 in the left hand box.

 

Should you take just the left hand box or should you take both boxes?

 

An answer to Newcomb’s Problem

 

One argument goes as follows: By the time you are asked to choose what to do, the money is already in the boxes. Whatever decision you make, it won’t change what’s in the boxes. So the boxes can be in one of two states:

  1. Left box, $0. Right box, $1000.
  2. Left box, $1 000 000. Right box, $1000.

Whichever state the boxes are in, you get more money if you take both boxes than if you take one. In game theoretic terms, the strategy of taking both boxes strictly dominates the strategy of taking only one box. You can never lose by choosing both boxes.


The only problem is, you do lose. If you take two boxes then they are in state 1 and you only get $1000. If you only took the left box you would get $1 000 000.

 

To many people, this may be enough to make it obvious that the rational decision is to take only the left box. If so, you might want to skip the next paragraph.

 

Taking only the left box didn’t seem rational to me for a long time. It seemed that the reasoning described above to justify taking both boxes was so powerful that the only rational decision was to take both boxes. I therefore saw Newcomb’s Problem as proof that it was sometimes beneficial to be irrational. I changed my mind when I realized that I’d been asking the wrong question. I had been asking which decision would give the best payoff at the time and saying it was rational to make that decision. Instead, I should have been asking which decision theory would lead to the greatest payoff. From that perspective, it is rational to use a decision theory that suggests you only take the left box because that is the decision theory that leads to the highest payoff. Taking only the left box lead to a higher payoff and it’s also a rational decision if you ask, “What decision theory is it rational for me to use?” and then make your decision according to the theory that you have concluded it is rational to follow.

 

What follows will presume that a good decision theory should one box on Newcomb’s problem.

 

Causal Decision Theory and Newcomb’s Problem

 

Remember that decision theory tells us to calculate the expected utility of an action by summing the utility of each possible outcome of that action multiplied by its probability. In Causal Decision Theory, this probability is defined causally (something that we haven’t formalized and won’t formalise in this introductory sequence but which we have at least some grasp of). So Causal Decision Theory will act as if the probability that the boxes are in state 1 or state 2 above is not influenced by the decision made to one or two box (so let’s say that the probability that the boxes are in state 1 is P and the probability that they’re in state 2 is Q regardless of your decision).

 

So if you undertake the action of choosing only the left box your expected utility will be equal to: (0 x P) + (1 000 000 x Q) = 1 000 000 x Q

 

And if you choose both boxes, the expected utility will be equal to: (1000 x P) + (1 001 000 x Q).

 

So Causal Decision Theory will lead to the decision to take both boxes and hence, if you accept that you should one box on Newcomb’s Problem, Causal Decision Theory is flawed.

 

Evidential Decision Theory and Newcomb’s Problem

 

Evidential Decision Theory, on the other hand, will take your decision to one box as evidence that Omega put the boxes in state 2, to give an expected utility of (1 x 1 000 000) + (0 x 0) = 1 000 000.

 

It will similarly take your decision to take both boxes as evidence that Omega put the boxes into state 1, to give an expected utility of (0 x (1 000 000 + 1000)) + (1 x (0 + 1000)) = 1000

 

As such, Evidential Decision Theory will suggest that you one box and hence it passes the test posed by Newcomb’s Problem. We will look at a more challenging scenario for Evidential Decision Theory in the next post. For now, we’re part way along the route of realising that there’s still a need to look for a decision theory that makes the logical decision in a wide range of situations.

 

Appendix 1: Important notes

 

While the consensus on Less Wrong is that one boxing on Newcomb’s Problem is the rational decision, my understanding is that this opinion is not necessarily held uniformly amongst philosophers (see, for example, the Stanford Encyclopedia of Philosophy’s article on Causal Decision Theory). I’d welcome corrections on this if I’m wrong but otherwise it does seem important to acknowledge where the level of consensus differs on Less Wrong compared to the broader community.

 

For more details on this, see the results of the PhilPapers Survey where 61% of respondents who specialised in decision theory chose to two box and only 26% chose to one box (the rest were uncertain). Thanks to Unnamed for the link.

 

If Newcomb's Problem doesn't seem realistic enough to be worth considering then read the responses to this comment.

 

Appendix 2: Existing posts on Newcomb's Problem

 

Newcomb's Problem has been widely discussed on Less Wrong, generally by people with more knowledge on the subject than me (this post is included as part of the sequence because I want to make sure no-one is left behind and because it is framed in a slightly different way). Good previous posts include:

 

A post by Eliezer introducing the problem and discussing the issue of whether one boxing is irrational.

 

A link to Marion Ledwig's detailed thesis on the issue.

 

An exploration of the links between Newcomb's Problem and the prisoner's dillemma.

 

A post about formalising Newcomb's Problem.


And a Less Wrong wiki article on the problem with further links.

 

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 10:34 PM
Select new highlight date
All comments loaded

While the consensus on Less Wrong is that one boxing on Newcomb’s Problem is the rational decision, my understanding is that this opinion is not necessarily held uniformly amongst philosophers

That's correct. See, for instance, the PhilPapers Survey of 931 philosophy professors, which found that only 21% favored one boxing vs. 31% who favored two boxing; 43% said other (mostly undecided or insufficiently familiar with the issue). Among the 31 philosophers who specialize in decision theory, there was a big shift from other (down to 13%) to two boxing (up to 61%), and still only 26% favored one boxing.

An issue that often occurs to me when discussing these questions. I one-box, cooperate in one-shot PD's, and pay the guy who gave me a lift out of the desert. I've no idea what decision theory I'm using when I decide to do these things, but I still know that that I'd do them. I'm pretty sure that's what most other people would do as well.

Does anyone actually know how Human Decision Theory works? I know there are examples of problems where HDT fails miserably and CDT comes out better, but is there a coherent explanation for how HDT manages to get all of these problems right? Has anyone attempted to develop a decision theory which successfully solves these sorts of problems by mimicking the way in which people successfully solve them?

I don't think most people one-box. Maybe most LW readers one-box.

I don't think most people one-box. Maybe most LW readers one-box.

I have two real boxes, labelled with Newcomb's problem and using 1 and 4 quarters in place of the $10k and $1M. I have shown them to people at Less Wrong meetups, and also to various friends of mine, a total of about 20 people.

Almost everyone I've tried it on has one-boxed. Even though I left out the part in the description about being a really accurate predictor, and pre-seeded the boxes before I even knew who would be the one choosing. Maybe it would be different with $10k instead of $0.25. Maybe my friends are unusual and a different demographic would two-box. Maybe it's due to a quirk of how I present them. But unless someone presents contrary evidence, I have to conclude that most people are one-boxers.

Almost everyone I've tried it on has one-boxed. Even though I left out the part in the description about being a really accurate predictor, and pre-seeded the boxes before I even knew who would be the one choosing.

What?!? You offer people two boxes with essentially random amounts of money in them, and they choose to take one of the boxes instead of both? And these people are otherwise completely sane?

Could you maybe give us details of how exactly you present the problem? I can't imagine any presentation that would make anyone even slightly tempted to one-box this variant. (Maybe if I knew I'd get to play again one day...)

That seems bizarre to me too. But if Jimrandomh is filling his boxes on the basis of what most people would do, and most people do one-box, then perhaps they are just behaving as rational, highly correlated, timeless decisionmakers.

A signalling explanation might explain this behavior: people would rather be seen as having gotten the problem correct, or signal non-greediness, than get an extra $0.25. As evidence for this conclusion, some people turn down the $1.00 in box one.

No one's given the real correct solution, which is "inspect the boxes more thoroughly". One of them has an extra label on the bottom, offering an extra $1.00 for finding it if you haven't opened any boxes yet, which I've never had to pay out on. The moral is supposed to be that theory is hard to transfer into the real world and to question assumptions.

You let people inspect the boxes? Wouldn't they be distinguishable by weight?

Reminds me of a story, set in a lazy Mark Twain river town. Two friends walking down the street. First says to second, "See that kid? He is really stupid." Second asks, "Why do you say that?" First answers, "Watch". Approaches kid. Holds out nickel in one hand and dime in the other. Asks kid which he prefers. "I'll take the nickel. It's bigger". Man hands nickel to kid with smirk, and the two friends continue on.

Later the second man comes back and attempts to instruct the kid. "A dime is worth twice the value, that is it buys more candy", says he, "even though the nickel looks bigger." The kid gives the man a pitying look. "Ok, if you say so. But I've made seven nickels so far this month. How many dimes have you made?"

Which brings me to my real point - empirical research, I'm sure you have seen it, in which player 1 is asked to specify a split of $10 between himself and player 2. Player 2 then chooses to accept or reject. If he rejects, neither player gets anything. As I recall, when greedy player 1 specifies more than about 70% for himself, player 2 frequently rejects even though he is costing himself money. This can only be understood in classical "rational agent" game theory by postulating that player 2 does not believe researcher claims that the game is a one-shot.

What is the point? Well, perhaps people who have read about Newcomb problems are assuming (like most people in the research) that, somehow or other, greed will be punished.

empirical research, I'm sure you have seen it, in which player 1 is asked to specify a split of $10 between himself and player 2.

Punishing unfair behavior even when it costs to do so is called altruistic punishment, and this particular experiment is called the Ultimatum Game.

pre-seeded the boxes before I even knew who would be the one choosing

If I met someone in real life who was doing this trick (at least before I started spilling my opinions to the universe through my comments to this blog), I would strongly suspect that you were doing exactly this. And then I would definitely pick both boxes. (Well, first I'd try to figure out if you're likely to offer me any more games, and I'd pick two boxes if I was fairly confident that you would not.) And I would get all of the money, since you would have predicted that I would pick only one box (assuming that you really seed them based on your honest best prediction).

On the other hand, if the situation is not presented as a game (even when I still don't expect any iteration), I pretty consistently cooperate on all of the standard examples (prisoner's dilemma, etc). But since feeling like a moral and cooperative person (except when playing games, of course) has high utility for me, I'm not really playing prisoner's dilemma (etc) after all, so never mind.

is there a coherent explanation for how HDT manages to get all of these problems right?

Unilateraly cooperating in one-shot true PD's is not right. If the other player's decision is not correlated with yours, you should defect.

Thanks for a great post Adam, I'm looking forward to the rest of the series.

This might be missing the point, but I just can't get past it. How does a rational agent come to believe that the being they're facing is "an unquestionably honest, all knowing agent with perfect powers of prediction"?

I have the suspicion that a lot of the bizarreness of this problem comes out of transporting our agent into an epistemologically unattainable state.

Is there a way to phrase a problem of this type in a way that does not require such a state?

It's not perfect, per se, but try this:

There's a fellow named James Omega who (with the funding of certain powerful philosophy departments), travels around the country offering random individuals the chance to participate in Newcomb's problem, with James as Omega. Rather than scanning your brain with his magic powers, he spends a day observing you in your daily life, and uses this info to make his decision. Here's the catch: he's done this 300 times, and never once mis-predicted. He's gone up against philosophers and lay-people, people that knew they were being observed and people that didn't, but it makes no difference: he just has an intuition that good. When it comes time to do the experiment, it's set up in such a way that you can be totally sure (and other very prestigous parties have verified) that the amounts in the box do not change after your decision.

So when you're selected, what do you do? Nothing quite supernatural is going on, we just have the James fellow with an amazing track record, and you with no particular reason to believe that you'll be his first failure. Even if he is just human, isn't it rational to assume the ridiculously likely thing (301/302 chance according to Laplace's Law) that he'll guess you correctly? Even if we adjust for the possibility of error, the payoff matrix is still so lopsided that it seems crazy to two-box.

See if that helps, and of course everyone else is free to offer improvements if I've missed something. You know, help get this Least Convenient Possible World going.

Now I want to read a series of stories starring James Omega in miscellaneous interesting situations. The kind of ability implied by accuracy at Newcomb's Dilemma would seem to imply capability in other situations as well. (If nothing else, he would kill at rock-paper-scissors.)

How does a rational agent come to believe that the being they're facing is "an unquestionably honest, all knowing agent with perfect powers of prediction"?

Let's make things clearer by asking the meta-question: is the predictor's implementation, and the process by which we learn of it, relevant to the problem? Let's unpack "relevant": should the answer to Newcomb's Problem depend on these extraneous details about the predictor? And let's unpack "should": if decision theory A tells you to one-box in approximately-Newcomb-like scenarios without requiring further information, and decision theory B says the problem is "underspecified" and the answer is "unstable" and you can't pin it down without learning more about the real-world situation... which decision theory do you like more?

Newcomb's Problem still holds in much more realistic situations. So say someone who knows you really, really well comes up to you and makes the same offer. Imagine you don't mind taking their money and you reckon they know you well enough that they're 80% likely to be correct in their bet. One boxing is still the right decision because you have the following gain from one boxing:

(.8 x 1 000 000) + (.2 x 0) = 800 000

and for two boxing:

(.8 x 1000) + (.2 x 1 001 000) = 800+ 200 200 = 201 000

But Causal Decision Theory will still undertake the same reasoning because your decision still doesn't have a causal influence on whether the boxes are in state 1 or 2. So Causal Decision Theory will still two box.

So Newcomb's Problem still holds in more realistic situations.

Is that the sort of thing you were looking for or have I missed the point?

Please link to previous discussions of Newcomb's Problem on LW. They contain many valuable insights that new readers will otherwise have to regenerate (possibly poorly).

A kind of funny way in which something like this might (just about) happen in reality occurs to me: Possible time delay in human awareness of decision making. Suppose when you make a conscious decision, your brain starts to become committed to that decision before you become aware of it, so if you suddenly decide to press a button then your brain was going through the process of committing you to pressing it before you actually knew you were going to press it. That would mean that every time you took a conscious decision to act, based on some rational grounds, you should really have been wanting to be the person who had been predisposed to act in that way a short time ago, when the neural machinery was pushing you towards that decision. I'm not saying this resolves any big issues, but maybe it can be amusingly uncomfortable for a few people - especially given some (admittedly controversial) experiments. In fact, with some brainwave monitoring equipment, a clever experiment design, and a very short experiment duration, you might even be able to set up something slightly resembling Newcomb's paradox!

I have a description here of a practical demonstration of Newcomb's paradox that might just be possible, with current or near-future technology. It would rely, simply, on the brain being more predictable over a short span of time. I would be interested to see what people think about the feasibility.

A test subject sits at a desk. On the desk are two buttons. On button "O" corresponds to opening one box. The other button "B" corresponds to opening both boxes. There is a computer, with a display screen. The boxes are going to be computer simulated: A program in the computer has a variable for the amount of money in the each box.

This is how an experimental run proceeds.

The subject sits at the desk for some random amount of time, during which nothing happens.

A "Decision Imminent" message appears on the computer screen. This is to warn the subject that his/her decision is about to be demanded imminently.

A short time after (maybe a second or two, or a few seconds), the computer program decides how much money will go in each box, and it sets the variables accordingly, without showing the user. As soon as that is done, a "Select a box NOW" message appears on the computer screen. The subject now has a (very) limited amount of time to press either button "O" or "B" to select one or both boxes. The subject will have to press one of the buttons almost immediately before the offer is withdrawn.

The subject is then shown the amount of money that was in each box.

Now, here is the catch (and everyone here will have guessed it).

The subject is also wired up to brain monitoring equipment, which is connected to the computer. When the "Decision imminent" message appeared, the computer started to examine the subject's brainwaves, to try to see the decision to press being formed. Just before the "Select a box NOW" message appeared, it used this information to load the simulated boxes according to the rules of the Newcomb's paradox game being discussed here.

I have no idea what level of accuracy could be achieved now, but it may be that some people could be made to have a worrying experience.