Newcomb's Problem and Regret of Rationality

Followup toSomething to Protect

The following may well be the most controversial dilemma in the history of decision theory:

A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game.  In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.

Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.

You can take both boxes, or take only box B.

And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.

Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars.  (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)

Before you make your choice, Omega has flown off and moved on to its next game.  Box B is already empty or already full.

Omega drops two boxes on the ground in front of you and flies off.

Do you take both boxes, or only box B?

And the standard philosophical conversation runs thusly:

One-boxer:  "I take only box B, of course.  I'd rather have a million than a thousand."

Two-boxer:  "Omega has already left.  Either box B is already full or already empty.  If box B is already empty, then taking both boxes nets me $1000, taking only box B nets me $0.  If box B is already full, then taking both boxes nets $1,001,000, taking only box B nets $1,000,000.  In either case I do better by taking both boxes, and worse by leaving a thousand dollars on the table - so I will be rational, and take both boxes."

One-boxer:  "If you're so rational, why ain'cha rich?"

Two-boxer:  "It's not my fault Omega chooses to reward only people with irrational dispositions, but it's already too late for me to do anything about that."

There is a large literature on the topic of Newcomblike problems - especially if you consider the Prisoner's Dilemma as a special case, which it is generally held to be.  "Paradoxes of Rationality and Cooperation" is an edited volume that includes Newcomb's original essay.  For those who read only online material, this PhD thesis summarizes the major standard positions.

I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions.  This dominant view goes by the name of "causal decision theory".

As you know, the primary reason I'm blogging is that I am an incredibly slow writer when I try to work in any other format.  So I'm not going to try to present my own analysis here.  Way too long a story, even by my standards.

But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb's Problem, then you should do so.  If you can precommit yourself before Omega examines you; then you are directly causing box B to be filled.

Now in my field - which, in case you have forgotten, is self-modifying AI - this works out to saying that if you build an AI that two-boxes on Newcomb's Problem, it will self-modify to one-box on Newcomb's Problem, if the AI considers in advance that it might face such a situation.  Agents with free access to their own source code have access to a cheap method of precommitment.

What if you expect that you might, in general, face a Newcomblike problem, without knowing the exact form of the problem?  Then you would have to modify yourself into a sort of agent whose disposition was such that it would generally receive high rewards on Newcomblike problems.

But what does an agent with a disposition generally-well-suited to Newcomblike problems look like?  Can this be formally specified?

Yes, but when I tried to write it up, I realized that I was starting to write a small book.  And it wasn't the most important book I had to write, so I shelved it.  My slow writing speed really is the bane of my existence.  The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems.  It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis.  But that's pretty much what it would take to make me unshelve the project.  Otherwise I can't justify the time expenditure, not at the speed I currently write books.

I say all this, because there's a common attitude that "Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes" - coherent math which one-boxes on Newcomb's Problem without producing absurd results elsewhere.  So I do understand that, and I did set out to develop such a theory, but my writing speed on big papers is so slow that I can't publish it.  Believe it or not, it's true.

Nonetheless, I would like to present some of my motivations on Newcomb's Problem - the reasons I felt impelled to seek a new theory - because they illustrate my source-attitudes toward rationality.  Even if I can't present the theory that these motivations motivate...

First, foremost, fundamentally, above all else:

Rational agents should WIN.

Don't mistake me, and think that I'm talking about the Hollywood Rationality stereotype that rationalists should be selfish or shortsighted.  If your utility function has a term in it for others, then win their happiness.  If your utility function has a term in it for a million years hence, then win the eon.

But at any rate, WIN.  Don't lose reasonably, WIN.

Now there are defenders of causal decision theory who argue that the two-boxers are doing their best to win, and cannot help it if they have been cursed by a Predictor who favors irrationalists.  I will talk about this defense in a moment.  But first, I want to draw a distinction between causal decision theorists who believe that two-boxers are genuinely doing their best to win; versus someone who thinks that two-boxing is the reasonable or the rational thing to do, but that the reasonable move just happens to predictably lose, in this case.  There are a lot of people out there who think that rationality predictably loses on various problems - that, too, is part of the Hollywood Rationality stereotype, that Kirk is predictably superior to Spock.

Next, let's turn to the charge that Omega favors irrationalists.  I can conceive of a superbeing who rewards only people born with a particular gene, regardless of their choices.  I can conceive of a superbeing who rewards people whose brains inscribe the particular algorithm of "Describe your options in English and choose the last option when ordered alphabetically," but who does not reward anyone who chooses the same option for a different reason.  But Omega rewards people who choose to take only box B, regardless of which algorithm they use to arrive at this decision, and this is why I don't buy the charge that Omega is rewarding the irrational.  Omega doesn't care whether or not you follow some particular ritual of cognition; Omega only cares about your predicted decision.

We can choose whatever reasoning algorithm we like, and will be rewarded or punished only according to that algorithm's choices, with no other dependency - Omega just cares where we go, not how we got there.

It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way - without attachment to any particular ritual of cognition, apart from our belief that it wins.  Every rule is up for grabs, except the rule of winning.

As Miyamoto Musashi said - it's really worth repeating:

"You can win with a long weapon, and yet you can also win with a short weapon.  In short, the Way of the Ichi school is the spirit of winning, whatever the weapon and whatever its size."

(Another example:  It was argued by McGee that we must adopt bounded utility functions or be subject to "Dutch books" over infinite times.  But:  The utility function is not up for grabs.  I love life without limit or upper bound:  There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever.  This is a sufficient condition to imply that my utility function is unbounded.  So I just have to figure out how to optimize for that morality.  You can't tell me, first, that above all I must conform to a particular ritual of cognition, and then that, if I conform to that ritual, I must change my morality to avoid being Dutch-booked.  Toss out the losing ritual; don't change the definition of winning.  That's like deciding to prefer $1000 to $1,000,000 so that Newcomb's Problem doesn't make your preferred ritual of cognition look bad.)

"But," says the causal decision theorist, "to take only one box, you must somehow believe that your choice can affect whether box B is empty or full - and that's unreasonable!  Omega has already left!  It's physically impossible!"

Unreasonable?  I am a rationalist: what do I care about being unreasonable?  I don't have to conform to a particular ritual of cognition.  I don't have to take only box B because I believe my choice affects the box, even though Omega has already left.  I can just... take only box B.

I do have a proposed alternative ritual of cognition which computes this decision, which this margin is too small to contain; but I shouldn't need to show this to you.  The point is not to have an elegant theory of winning - the point is to win; elegance is a side effect.

Or to look at it another way:  Rather than starting with a concept of what is the reasonable decision, and then asking whether "reasonable" agents leave with a lot of money, start by looking at the agents who leave with a lot of money, develop a theory of which agents tend to leave with the most money, and from this theory, try to figure out what is "reasonable".  "Reasonable" may just refer to decisions in conformance with our current ritual of cognition - what else would determine whether something seems "reasonable" or not?

From James Joyce (no relation), Foundations of Causal Decision Theory:

Rachel has a perfectly good answer to the "Why ain't you rich?" question.  "I am not rich," she will say, "because I am not the kind of person the psychologist thinks will refuse the money.  I'm just not like you, Irene.  Given that I know that I am the type who takes the money, and given that the psychologist knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in my account.  The $1,000 was the most I was going to get no matter what I did.  So the only reasonable thing for me to do was to take it."

Irene may want to press the point here by asking, "But don't you wish you were like me, Rachel?  Don't you wish that you were the refusing type?"  There is a tendency to think that Rachel, a committed causal decision theorist, must answer this question in the negative, which seems obviously wrong (given that being like Irene would have made her rich).  This is not the case.  Rachel can and should admit that she does wish she were more like Irene.  "It would have been better for me," she might concede, "had I been the refusing type."  At this point Irene will exclaim, "You've admitted it!  It wasn't so smart to take the money after all."  Unfortunately for Irene, her conclusion does not follow from Rachel's premise.  Rachel will patiently explain that wishing to be a refuser in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is.  When Rachel wishes she was Irene's type she is wishing for Irene's options, not sanctioning her choice.

It is, I would say, a general principle of rationality - indeed, part of how I define rationality - that you never end up envying someone else's mere choices.  You might envy someone their genes, if Omega rewards genes, or if the genes give you a generally happier disposition.  But Rachel, above, envies Irene her choice, and only her choice, irrespective of what algorithm Irene used to make it.  Rachel wishes just that she had a disposition to choose differently.

You shouldn't claim to be more rational than someone and simultaneously envy them their choice - only their choice.  Just do the act you envy.

I keep trying to say that rationality is the winning-Way, but causal decision theorists insist that taking both boxes is what really wins, because you can't possibly do better by leaving $1000 on the table... even though the single-boxers leave the experiment with more money.  Be careful of this sort of argument, any time you find yourself defining the "winner" as someone other than the agent who is currently smiling from on top of a giant heap of utility.

Yes, there are various thought experiments in which some agents start out with an advantage - but if the task is to, say, decide whether to jump off a cliff, you want to be careful not to define cliff-refraining agents as having an unfair prior advantage over cliff-jumping agents, by virtue of their unfair refusal to jump off cliffs.  At this point you have covertly redefined "winning" as conformance to a particular ritual of cognition.  Pay attention to the money!

Or here's another way of looking at it:  Faced with Newcomb's Problem, would you want to look really hard for a reason to believe that it was perfectly reasonable and rational to take only box B; because, if such a line of argument existed, you would take only box B and find it full of money?  Would you spend an extra hour thinking it through, if you were confident that, at the end of the hour, you would be able to convince yourself that box B was the rational choice?  This too is a rather odd position to be in.  Ordinarily, the work of rationality goes into figuring out which choice is the best - not finding a reason to believe that a particular choice is the best.

Maybe it's too easy to say that you "ought to" two-box on Newcomb's Problem, that this is the "reasonable" thing to do, so long as the money isn't actually in front of you.  Maybe you're just numb to philosophical dilemmas, at this point.  What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her?  What if there was an asteroid rushing toward Earth, and box A contained an asteroid deflector that worked 10% of the time, and box B might contain an asteroid deflector that worked 100% of the time?

Would you, at that point, find yourself tempted to make an unreasonable choice?

If the stake in box B was something you could not leave behind?  Something overwhelmingly more important to you than being reasonable?  If you absolutely had to win - really win, not just be defined as winning?

Would you wish with all your power that the "reasonable" decision was to take only box B?

Then maybe it's time to update your definition of reasonableness.

Alleged rationalists should not find themselves envying the mere decisions of alleged nonrationalists, because your decision can be whatever you like.  When you find yourself in a position like this, you shouldn't chide the other person for failing to conform to your concepts of reasonableness.  You should realize you got the Way wrong.

So, too, if you ever find yourself keeping separate track of the "reasonable" belief, versus the belief that seems likely to be actually true.  Either you have misunderstood reasonableness, or your second intuition is just wrong.

Now one can't simultaneously define "rationality" as the winning Way, and define "rationality" as Bayesian probability theory and decision theory.  But it is the argument that I am putting forth, and the moral of my advice to Trust In Bayes, that the laws governing winning have indeed proven to be math.  If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window.  "Rationality" is just the label I use for my beliefs about the winning Way - the Way of the agent smiling from on top of the giant heap of utility.  Currently, that label refers to Bayescraft.

I realize that this is not a knockdown criticism of causal decision theory - that would take the actual book and/or PhD thesis - but I hope it illustrates some of my underlying attitude toward this notion of "rationality".

You shouldn't find yourself distinguishing the winning choice from the reasonable choice.  Nor should you find yourself distinguishing the reasonable belief from the belief that is most likely to be true.

That is why I use the word "rational" to denote my beliefs about accuracy and winning - not to denote verbal reasoning, or strategies which yield certain success, or that which is logically provable, or that which is publicly demonstrable, or that which is reasonable.

As Miyamoto Musashi said:

"The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy's cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him."

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 2:33 PM
Select new highlight date
All comments loaded

This dilemma seems like it can be reduced to:

  1. If you take both boxes, you will get $1000
  2. If you only take box B, you will get $1M Which is a rather easy decision.

There's a seemingly-impossible but vital premise, namely, that your action was already known before you acted. Even if this is completely impossible, it's a premise, so there's no point arguing it.

Another way of thinking of it is that, when someone says, "The boxes are already there, so your decision cannot affect what's in them," he is wrong. It has been assumed that your decision does affect what's in them, so the fact that you cannot imagine how that is possible is wholly irrelevant.

In short, I don't understand how this is controversial when the decider has all the information that was provided.

People seem to have pretty strong opinions about Newcomb's Problem. I don't have any trouble believing that a superintelligence could scan you and predict your reaction with 99.5% accuracy.

I mean, a superintelligence would have no trouble at all predicting that I would one-box... even if I hadn't encountered the problem before, I suspect.

Ultimately you either interpret "superintelligence" as being sufficient to predict your reaction with significant accuracy, or not. If not, the problem is just a straightforward probability question, as explained here, and becomes uninteresting.

Otherwise, if you interpret "superintelligence" as being sufficient to predict your reaction with significant accuracy (especially a high accuracy like >99.5%), the words of this sentence...

And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.

...simply mean "One-box to win, with high confidence."

Summary: After disambiguating "superintelligence" (making the belief that Omega is a superintelligence pay rent), Newcomb's problem turns into either a straightforward probability question or a fairly simple issue of rearranging the words in equivalent ways to make the winning answer readily apparent.

I think the two box person is confused about what it is to be rational, it does not mean "make a fancy argument," it means start with the facts, abstract from them, and reason about your abstractions.

In this case if you start with the facts you see that 100% of people who take only box B win big, so rationally, you do the same. Why would anyone be surprised that reason divorced from facts gives the wrong answer?

Either box B is already full or already empty.

I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of "causal decision theory".

I suppose causal decision theory assumes causality only works in one temporal direction. Confronted with a predictor that was right 100 out of 100 times, I would think it very likely that backward-in-time causation exists, and take only B. I assume this would, as you say, produce absurd results elsewhere.

Decisions aren't physical.

The above statement is at least hard to defend. Your decisions are physical and occur inside of you... So these two-boxers are using the wrong model amongst these two (see the drawings....) http://lesswrong.com/lw/r0/thou_art_physics/

If you are a part of physics, so is your decision, so it must account for the correlation between your thought processes and the superintelligence. Once it accounts for that, you decide to one box, because you understood the entanglement of the computation done by omega and the physical process going inside your skull.

If the entanglement is there, you are not looking at it from the outside, you are inside the process.

Our minds have this quirk that makes us think there are two moments, you decide, and then you cheat, you get to decide again. But if you are only allowed to decide once, which is the case, you are rational by one-boxing.

Well, I fail to see any need for backward-in-time causation to get the prediction right 100 out of 100 times.

As far as I understand, similar experiments have been performed in practice and homo sapiens are quite split in two groups 'one-boxers' and 'two-boxers' who generally have strong preferences towards one or other due to whatever differences in their education, logic experience, genetics, reasoning style or whatever factors that are somewhat stable specific to that individual.

Having perfect predictive power (or even the possibility of it existing) is implied and suggested, but it's not really given, it's not really necessary, and IMHO it's not possible and not useful to use this 'perfect predictive power' in any reasoning here.

From the given data in the situation (100 out of 100 that you saw), you know that Omega is a super-intelligent sorter who somehow manages to achieve 99.5% or better accuracy in sorting people into one-boxers and two-boxers.

This accuracy seems also higher than the accuracy of most (all?) people in self-evaluation, i.e., as in many other decision scenarios, there is a significant difference in what people believe they would decide in situation X, and what they actually decide if it happens. [citation might be needed, but I don't have one at the moment, I do recall reading papers about such experiments]. The 'everybody is a perfect logician/rationalist and behaves as such' assumption often doesn't hold up in real life even for self-described perfect rationalists who make strong conscious effort to do so.

In effect, data suggests that probably Omega knows your traits and decision chances (taking into account you taking into account all this) better than you do - it's simply smarter than homo sapiens. Assuming that this is really so, it's better for you to choose option B. Assuming that this is not so, and you believe that you can out-analyze Omega's perception of yourself, then you should choose the opposite of whatever Omega would think of you (gaining 1.000.000 instead of 1.000 or 1.001.000 instead of 1.000.000). If you don't know what Omega knows about you - then you don't get this bonus.

There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded.

Wait a second, the following bounded utility function can explain the quoted preferences:

  • U(live googolplex years) = 99
  • limit as N goes to infinity of U(live N years) = 100
  • U(live forever) = 101

Benja Fallenstein gave an alternative formulation that does imply an unbounded utility function:

For all n, there is an even larger n' such that (p+q)*u(live n years) < p*u(live n' years) + q*(live a googolplex years).

But these preferences are pretty counter-intuitive to me. If U(live n years) is unbounded, then the above must hold for any nonzero p, q, and with "googolplex" replaced by any finite number. For example, let p = 1/3^^^3, q = .8, n = 3^^^3, and replace "googolplex" with "0". Would you really be willing to give up .8 probability of 3^^^3 years of life for a 1/3^^^3 chance at a longer (but still finite) one? And that's true no matter how many up-arrows we add to these numbers?

I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency, where you'd like to commit ahead of time to something that later you'd like to violate if you could. You want to commit to taking the one box, but you also want to take the two boxes later if you could. A more familiar example is that we'd like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we'd rather avoid spending that effort as the harm is already done.

Hanson: I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency

In my motivations and in my decision theory, dynamic inconsistency is Always Wrong. Among other things, it always implies an agent unstable under reflection.

A more familiar example is that we'd like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we'd rather avoid spending that effort as the harm is already done.

But a self-modifying agent would modify to not rather avoid it.

Gowder: If a), then practical reason is meaningless anyway: you'll do what you'll do, so stop stressing about it.

Deterministic != meaningless. Your action is determined by your motivations, and by your decision process, which may include your stressing about it. It makes perfect sense to say: "My future decision is determined, and my stressing about it is determined; but if-counterfactual I didn't stress about it, then-counterfactual my future decision would be different, so it makes perfect sense for me to stress about this, which is why I am deterministically doing it."

The past can't change - does not even have the illusion of potential change - but that doesn't mean that people who, in the past, committed a crime, are not held responsible just because their action and the crime are now "fixed". It works just the same way for the future. That is: a fixed future should present no more problem for theories of moral responsibility than a fixed past.

Fascinating. A few days after I read this, it struck me that a form of Newcomb's Problem actually occurs in real life--voting in a large election. Here's what I mean.

Say you're sitting at home pondering whether to vote. If you decide to stay home, you benefit by avoiding the minor inconvenience of driving and standing in line. (Like gaining $1000.) If you decide to vote, you'll fail to avoid the inconvenience, meanwhile you know your individual vote almost certainly won't make a statistical difference in getting your candidate elected. (Which would be like winning $1000000.) So rationally, stay at home and hope your candidate wins, right? And then you'll have avoided the inconvenience too. Take both boxes.

But here's the twist. If you muster the will to vote, it stands to reason that those of a similar mind to you (a potentially statistically significant number of people) would also muster the will to vote, because of their similarity to you. So knowing this, why not stay home anyway, avoid the inconvenience, and trust all those others to vote and win the election? They're going to do what they're going to do. Your actions can't change that. The contents of the boxes can't be changed by your actions. Well, if you don't vote, perhaps that means neither will the others, and so it goes. Therein lies the similarity to Newcomb's problem.

A very good point. I'm the type to stay home from the polls. But I'd also one-box..... hm.

I think it may have to do with the very weak correlation between my choice to vote and the choice of those of a similar mind to me to vote as opposed to the very strong correlation between my choice to one-box and Omega's choice to put $1,000,000 in box B.

I don't know the literature around Newcomb's problem very well, so excuse me if this is stupid. BUT: why not just reason as follows:

  1. If the superintelligence can predict your action, one of the following two things must be the case:

a) the state of affairs whether you pick the box or not is already absolutely determined (i.e. we live in a fatalistic universe, at least with respect to your box-picking)

b) your box picking is not determined, but it has backwards causal force, i.e. something is moving backwards through time.

If a), then practical reason is meaningless anyway: you'll do what you'll do, so stop stressing about it.

If b), then you should be a one-boxer for perfectly ordinary rational reasons, namely that it brings it about that you get a million bucks with probability 1.

So there's no problem!

I'd just like to note that as with most of the rationality material in Eliezer's sequences, the position in this post is a pretty common mainstream position among cognitive scientists. E.g. here is Jonathan Baron on page 61 of his popular textbook Thinking and Deciding:

the best kind of thinking, which we shall call rational thinking, is whatever kind of thinking best helps people achieve their goals. If it should turn out that following the rules of formal logic leads to eternal happiness, then it is rational thinking to follow the laws of logic (assuming that we all want eternal happiness). If it should turn out, on the other hand, that carefully violating the laws of logic at every turn leads to eternal happiness, then it is these violations that we shall call rational.

This view is quoted and endorsed in, for example, Stanovich 2010, p. 3.

To quote E.T. Jaynes:

"This example shows also that the major premise, “If A then B” expresses B only as a logical consequence of A; and not necessarily a causal physical consequence, which could be effective only at a later time. The rain at 10 AM is not the physical cause of the clouds at 9:45 AM. Nevertheless, the proper logical connection is not in the uncertain causal direction (clouds =⇒ rain), but rather (rain =⇒ clouds) which is certain, although noncausal. We emphasize at the outset that we are concerned here with logical connections, because some discussions and applications of inference have fallen into serious error through failure to see the distinction between logical implication and physical causation. The distinction is analyzed in some depth by H. A. Simon and N. Rescher (1966), who note that all attempts to interpret implication as expressing physical causation founder on the lack of contraposition expressed by the second syllogism (1–2). That is, if we tried to interpret the major premise as “A is the physical cause of B,” then we would hardly be able to accept that “not-B is the physical cause of not-A.” In Chapter 3 we shall see that attempts to interpret plausible inferences in terms of physical causation fare no better."

It took me a week to think about it. Then I read all the comments, and thought about it some more. And now I think I have this "problem" well in hand. I also think that, incidentally, I arrived at Eliezer's answer as well, though since he never spelled it out I can't be sure.

To be clear - a lot of people have said that the decision depends on the problem parameters, so I'll explain just what it is I'm solving. See, Eliezer wants our decision theory to WIN. That implies that we have all the relevant information - we can think of a lot of situations where we make the wisest decision possible based on available information and it turns out to be wrong; the universe is not fair, we know this already. So I will assume we have all the relevant information needed to win. We will also assume that Omega does have the capability to accurately predict my actions; and that causality is not violated (rationality cannot be expected to win if causality is violated!).

Assuming this, I can have a conversation with Omega before it leaves. Mind you, it's not a real conversation, but having sufficient information about the problem means I can simulate its part of the conversation even if Omega itself refuses to participate and/or there isn't enough time for such a conversation to take place. So it goes like this...

Me: "I do want to gain as much as possible in this problem. For that effect I will want you to put as much money in the box as possible. How do I do that?"

Omega: "I will put 1M$ in the box if you take only it; and nothing if you take both."

Me: "Ah, but we're not violating causality here, are we? That would be cheating!"

Omega: "True, causality is not violated. To rephrase, my decision on how much money to put in the box will depend on my prediction of what you will do. Since I have this capacity, we can consider these synonymous."

Me: "Suppose I'm not convinced that they are truly synonymous. All right then. I intend to take only the one box".

Omega: "Remember that I have the capability to predict your actions. As such I know if you are sincere or not."

Me: "You got me. Alright, I'll convince myself really hard to take only the one box."

Omega: "Though you are sincere now, in the future you will reconsider this decision. As such, I will still place nothing in the box."

Me: "And you are predicting all this from my current state, right? After all, this is one of the parameters in the problem - that after you've placed money in the boxes, you are gone and can't come back to change it".

Omega: "That is correct; I am predicting a future state from information on your current state".

Me: "Aha! That means I do have a choice here, even before you have left. If I change my state so that I am unable or unwilling to two-box once you've left, then your prediction of my future "decision" will be different. In effect, I will be hardwired to one-box. And since I still want to retain my rationality, I will make sure that this hardwiring is strictly temporary."

fiddling with my own brain a bit

Omega: "I have now determined that you are unwilling to take both boxes. As such, I will put the 1,000,000$ in the box."

Omega departs

I walk unthinkingly toward the boxes and take just the one

Voila. Victory is achieved.

My main conclusion is here is that any decision theory that does not allow for changing strategies is a poor decision theory indeed. This IS essentially the Friendly AI problem: You can rationally one-box, but you need to have access to your own source code in order to do so. Not having that would so inflexible as to be the equivalent of an Iterative Prisoner's Dilemma program that can only defect or only cooperate; that is, a very bad one.

The reason this is not obvious is that the way the problem is phrased is misleading. Omega supposedly leaves "before you make your choice", but in fact there is not a single choice here (one-box or two-box). Rather, there are two decisions to be made, if you can modify your own thinking process:

  1. Whether or not to have the ability and inclination to make decision #2 "rationally" once Omega has left, and
  2. Whether to one-box or two-box.

...Where decision #1 can and should be made prior to Omega's leaving, and obviously DOES influence what's in the box. Decision #2 does not influence what's in the box, but the state in which I approach that decision does. This is very confusing initially.

Now, I don't really know CDT too well, but it seems to me that presented as these two decisions, even it would be able to correctly one-box on Newcomb's problem. Am I wrong?

Eliezer - if you are still reading these comments so long after the article was published - I don't think it's an inconsistency in the AI's decision making if the AI's decision making is influenced by its internal state. In fact I expect that to be the case. What am I missing here?