Why do theists, undergrads, and Less Wrongers favor one-boxing on Newcomb?

Follow-up to: Normative uncertainty in Newcomb's problem

Philosophers and atheists break for two-boxing; theists and Less Wrong break for one-boxing
Personally, I would one-box on Newcomb's Problem. Conditional on one-boxing for lawful reasons, one boxing earns $1,000,000, while two-boxing, conditional on two-boxing for lawful reasons, would deliver only a thousand. But this seems to be firmly a minority view in philosophy, and numerous heuristics about expert opinion suggest that I should re-examine the view.

In the PhilPapers survey, Philosophy undergraduates start off divided roughly evenly between one-boxing and two-boxing:

Newcomb's problem: one box or two boxes?

Other 142 / 217 (65.4%)
Accept or lean toward: one box 40 / 217 (18.4%)
Accept or lean toward: two boxes 35 / 217 (16.1%)

But philosophy faculty, who have learned more (less likely to have no opinion), and been subject to further selection, break in favor of two-boxing:

Newcomb's problem: one box or two boxes?

Other 441 / 931 (47.4%)
Accept or lean toward: two boxes 292 / 931 (31.4%)
Accept or lean toward: one box 198 / 931 (21.3%)

Specialists in decision theory (who are also more atheistic, more compatibilist about free will, and more physicalist than faculty in general) are even more convinced:

Newcomb's problem: one box or two boxes?

Accept or lean toward: two boxes 19 / 31 (61.3%)
Accept or lean toward: one box 8 / 31 (25.8%)
Other 4 / 31 (12.9%)

Looking at the correlates of answers about Newcomb's problem, two-boxers are more likely to believe in physicalism about consciousness, atheism about religion, and other positions generally popular around here (which are also usually, but not always, in the direction of philosophical opinion). Zooming in one correlate, most theists with an opinion are one-boxers, while atheists break for two-boxing:

Newcomb's problem:two boxes 0.125
  one box two boxes
atheism
28.6% (145/506)
48.8% (247/506)
theism
40.8% (40/98)
31.6% (31/98)
Response pairs: 655   p-value: 0.001

Less Wrong breaks overwhelmingly for one-boxing in survey answers for 2012:

NEWCOMB'S PROBLEM
One-box: 726, 61.4%
Two-box: 78, 6.6%
Not sure: 53, 4.5%
Don't understand: 86, 7.3%
No answer: 240, 20.3%

When I elicited LW confidence levels in a poll, a majority indicated 99%+ confidence in one-boxing, and 77% of respondents indicated 80%+ confidence.

What's going on?

I would like to understand what is driving this difference of opinion. My poll was a (weak) test of the hypothesis that Less Wrongers were more likely to account for uncertainty about decision theory: since on the standard Newcomb's problem one-boxers get $1,000,000, while two-boxers get $1,000, even a modest credence in the correct theory recommending one-boxing could justify the action of one-boxing.

If new graduate students read the computer science literature on program equilibrium, including some local contributions like Robust Cooperation in the Prisoner's Dilemma and A Comparison of Decision Algorithms on Newcomblike Problems, I would guess they would tend to shift more towards one-boxing. Thinking about what sort of decision algorithms it is rational to program, or what decision algorithms would prosper over numerous one-shot Prisoner's Dilemmas with visible source code, could also shift intuitions. A number of philosophers I have spoken with have indicated that frameworks like the use of causal models with nodes for logical uncertainty are meaningful contributions to thinking about decision theory. However, I doubt that for those with opinions, the balance would swing from almost 3:1 for two-boxing to 9:1 for one-boxing, even concentrating on new decision theory graduate students.

On the other hand, there may be an effect of unbalanced presentation to non-experts. Less Wrong is on average less philosophically sophisticated than professional philosophers. Since philosophical training is associated with a shift towards two-boxing, some of the difference in opinion could reflect a difference in training. Then, postings on decision theory have almost all either argued for or assumed one-boxing as the correct response on Newcomb's problem. It might be that if academic decision theorists were making arguments for two-boxing here, or if there was a reduction in pro one-boxing social pressure, there would be a shift in Less Wrong opinion towards two-boxing.

Less Wrongers, what's going on here? What are the relative causal roles of these and other factors in this divergence?

ETA: The SEP article on Causal Decision Theory.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 1:01 AM
Select new highlight date
All comments loaded

What's going on is that Eliezer Yudkowsky has argued forcefully for one-boxing, in terms of his "way of winning" thing, which, after reading the other stuff he wrote about that (like the "nameless virtue"), probably created a "why aren't you winning" alarm bell in people's heads.

Most philosophers haven't been introduced to the problem by Eliezer Yudkowsky.

To me, Newcomb's problem seemed like a contrived trick to punish CDT, and it seemed that any other decision theory was just as likely to run into some other strange scenario to punish it, until I started thinking about AIs that could simulate you accurately, something else that differentiates LessWrong from professional philosophers.

When I realized the only criteria by which a "best decision theory" could be crowned was winning in as many realistic scenarios as possible, and stopped caring that "acausal control" sounded like an oxymoron, and that there could potentially be Newcomblike problems to face in real life, and that there were decision theories that could win on Newcomb's problem without bungling the smoker's lesion problem, and read this:

What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her?

that convinced me to one-box.

Addendum:

About atheists vs theists and undergrads vs philosophers, I think two-boxing is a position that preys on your self-image as a rationalist. It feels like you are getting punished for being rational, like you are losing not because of your choice, but because of who you are (I would say your choice is embedded in who you are, so there is no difference). One-boxing feels like magical thinking. Atheists and philosophers have stronger self-images as rationalists. Most haven't grokked this:

How can you improve your conception of rationality? Not by saying to yourself, “It is my duty to be rational.” By this you only enshrine your mistaken conception. Perhaps your conception of rationality is that it is rational to believe the words of the Great Teacher, and the Great Teacher says, “The sky is green,” and you look up at the sky and see blue. If you think: “It may look like the sky is blue, but rationality is to believe the words of the Great Teacher,” you lose a chance to discover your mistake.

Will's link has an Asimov quote that supports the "self-image vs right answer" idea, at least for Asimov:

I would, without hesitation, take both boxes . . . I am myself a determinist, but it is perfectly clear to me that any human being worthy of being considered a human being (including most certainly myself) would prefer free will, if such a thing could exist. . . Now, then, suppose you take both boxes and it turns out (as it almost certainly will) that God has foreseen this and placed nothing in the second box. You will then, at least, have expressed your willingness to gamble on his nonomniscience and on your own free will and will have willingly given up a million dollars for the sake of that willingness-itself a snap of the finger in the face of the Almighty and a vote, however futile, for free will. . . And, of course, if God has muffed and left a million dollars in the box, then not only will you have gained that million, but far more imponant you will have demonstrated God's nonomniscience.9

Seems like Asimov isn't taking the stakes seriously enough. Maybe we should replace "a million dollars" with "your daughter here gets to live."

And only coincidentally signalling that his status is worth more than a million dollars.

But losing the million dollars also shoves in your face your ultimate predictability.

Voluntarily taking a loss in order to insult yourself doesn't seem rational to me.

Plus, that's not a form of free will I even care about. I like that my insides obey laws. I'm not fond of the massive privacy violation, but that'd be there or not regardless of my choice.

Adding to your story, it's not just Eliezer Yudkowsky's introduction to Newcomb's problem. It's the entire Bayesian / Less Wrong mindset. Here, Eliezer wrote:

That was when I discovered that I was of the type called 'Bayesian'. As far as I can tell, I was born that way.

I felt something similar when I was reading through the sequences. Everything "clicked" for me - it just made sense. I couldn't imagine thinking another way.

Same with Newcomb's problem. I wasn't introduced to it by Eliezer, but I still thought one-boxing was obvious; it works.

Many Less Wrongers that have stuck around probably have had a similar experience; the Bayesian standpoint seems intuitive. Eliezer's support certainly helps to propagate one-boxing, but LessWrongers seem to be a self-selecting group.

It may well be the strength of argument. It could also be the lead of a very influential/respected figure and the power of groupthink. In my experience, 2 forums with similar mission statements ('political debating sites', say, or 'atheist sites') often end up having distinct positions on all sorts of things that most of their posters converge around. The same is true of any group, although if 'theists' was a genuine survey of theists the world over it's at least a far more representative group.

It would be very interesting to add a control group in some way: confront someone with the issue who was of a typical LessWrong demographic but hadn't read anything about Newcomb on Less Wrong for instance.

If it's not just a quality of this sort of group-think, my best guess is that it's to do with the greater practical focus (or at least theoretical belief in practical focus!) on LessWrong. I suspect most people automatically parse this sort of philosophical question as 'what is more abstractly logical' whereas people on here probably parse it more as 'what should I do to win'. But I think these sort of 'our inherent group qualities' explanations are almost always locatable but often unnecessary in light of group-think.

I've been reading a little of the philosophical literature on decision theory lately, and at least some two-boxers have an intuition that I hadn't thought about before that Newcomb's problem is "unfair." That is, for a wide range of pairs of decision theories X and Y, you could imagine a problem which essentially takes the form "Omega punishes agents who use decision theory X and rewards agents who use decision theory Y," and this is not a "fair" test of the relative merits of the two decision theories.

The idea that rationalists should win, in this context, has a specific name: it's called the Why Ain'cha Rich defense, and I think what I've said above is the intuition powering counterarguments to it.

I'm a little more sympathetic to this objection than I was before delving into the literature. A complete counterargument to it should at least attempt to define what fair means and argue that Newcomb is in fact a fair problem. (This seems related to the issue of defining what a fair opponent is in modal combat.)

TDT's reply to this is a bit more specific.

Informally: Since Omega represents a setup which rewards agents who make a certain decision X, and reality doesn't care why or by what exact algorithm you arrive at X so long as you arrive at X, the problem is fair. Unfair would be "We'll examine your source code and punish you iff you're a CDT agent, but we won't punish another agent who two-boxes as the output of a different algorithm even though your two algorithms had the same output." The problem should not care whether you arrive at your decisions by maximizing expected utility or by picking the first option in English alphabetical order, so long as you arrive at the same decision either way.

More formally: TDT corresponds to maximizing on the class of problems whose payoff is determined by 'the sort of decision you make in the world that you actually encounter, having the algorithm that you do'. CDT corresponds to maximizing over a fair problem class consisting of scenarios whose payoff is determined only by your physical act, and would be a good strategy in the real world if no other agent ever had an algorithm similar to yours (you must be the only CDT-agent in the universe, so that your algorithm only acts at one physical point) and where no other agent could gain any info about your algorithm except by observing your controllable physical acts (tallness being correlated with intelligence is not allowed). UDT allows for maximizing over classes of scenarios where your payoff can depend on actions you would have taken in universes you could have encountered but didn't, i.e., the Counterfactual Mugging. (Parfit's Hitchhiker is outside TDT's problem class, and in UDT, because the car-driver asks "What will this hitchhiker do if I take them to town? so that a dishonorable hitchhiker who is left in the desert is getting a payoff which depends on what they would have done in a situation they did not actually encounter. Likewise the transparent Newcomb's Box. We can clearly see how to maximize on the problem but it's in UDT's class of 'fair' scenarios, not TDT's class.)

If the scenario handed to the TDT algorithm is that only one copy of your algorithm exists within the scenario, acting at one physical point, and no other agent in the scenario has any knowledge of your algorithm apart from acts you can maximize over, then TDT reduces to CDT and outputs the same action as CDT, which is implied by CDT maximizing over its problem class and TDT's class of 'fair' problems strictly including all CDT-fair problems.

If Omega rewards having particular algorithms independently of their outputs, by examining the source code without running it, the only way to maximize is to have the most rewarded algorithm regardless of its output. But this is uninteresting.

If a setup rewards some algorithms more than others because of their different outputs, this is just life. You might as well claim that a cliff punishes people who rationally choose to jump off it.

This situation is interestingly blurred in modal combat where an algorithm may perhaps do better than another because its properties were more transparent (more provable) to another algorithm examining it. Of this I can only say that if, in real life, we end up with AIs examining each other's source code and trying to prove things about each other, calling this 'unfair' is uninteresting. Reality is always the most important domain to maximize over.

I'd just like to say that this comparison of CDT, TDT, and UDT was a very good explanation of the differences. Thanks for that.

Agreed. Found the distinction between TDT and UDT especially clear here.

This explanation makes UDT seem strictly more powerful than TDT (if UDT can handle Parfit's Hitchhiker and TDT can't).

If that's the case, then is there a point in still focusing on developing TDT? Is it meant as just a stepping stone to an even better decision theory (possibly UDT itself) down the line? Or do you believe UDT's advantages to be counterbalanced by disadvantages?

UDT doesn't handle non-base-level maximization vantage points (previously "epistemic vantage points") for blackmail - you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn't realize you're only blackmailing it because you're simulating it being blackmailable. As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don't actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).

I expect the ultimate theory to look more like "TDT modded to handle UDT's class of problems and blackmail and anything else we end up throwing at it" than "UDT modded to be naturalistic and etc", but I could be wrong - others have different intuitions about this.

As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don't actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).

UDT was designed to move away from the kind of Cartesian dualism as represented in AIXI. I don't understand where it's assuming its own Cartesian bubble. Can you explain?

(Relevant. (hint hint commenters you should read this before speculating about the origins of theists' intuitions about newcomb's problem))

The most charitable interpretation would just be that there happened to be a convincing technical theory which said you should two-box, because it took an even more technical theory to explain why you should one-box and this was not constructed, along with the rest of the edifice to explain what one-boxing means in terms of epistemic models, concepts of instrumental rationality, the relation to traditional philosophy's 'free will problem', etcetera. In other words, they simply bad-lucked onto an edifice of persuasive, technical, but ultimately incorrect argument.

We could guess other motives for people to two-box, like memetic pressure for partial counterintuitiveness, but why go to that effort now? Better TDT writeups are on the way, and eventually we'll get to see what the field says about the improved TDT writeups. If it's important to know what other hidden motives might be at work, we'll have a better idea after we negate the usually-stated motive of, "The only good technical theory we have says you should two-box." Perhaps the field will experience a large conversion once presented with a good enough writeup and then we'll know there weren't any other significant motives.

Better TDT writeups are on the way, and eventually we'll get to see what the field says about the improved TDT writeups.

Do you have an ETA on that? All my HPMoR anticipations combined don't equal my desire to see this published and discussed.

If you are actually wondering, most Lesswrongers one-box because Eliezer promotes one-boxing. That's it.

This should be taken seriously as as hypothesis. However, it can be broken down a bit:

1. LW readers one-box more because they are more likely to have read strong arguments in favor of one-boxing — namely Eliezer's — than most philosophers are.
2. LW readers one-box more because LW disproportionately attracts or retains people who already had a predilection for one-boxing, because people like to affiliate with those who will confirm their beliefs.
3. LW readers one-box more because they are guessing the teacher's password (or, more generally, parroting a "charismatic leader" or "high-status individual") by copying Eliezer's ideas.

To these I'll add some variants:

4. LW readers one-box more than most atheists because for many atheists, two-boxing is a way of saying that they are serious about their atheism, by denying Omega's godlike predictive ability; but LWers distinguish godlike AI from supernatual gods due to greater familiarity with Singularity ideas (or science fiction).
5. LW readers one-box to identify as (meta)contrarians among atheists / materialists.
6. LW readers one-box because they have absorbed the tribal belief that one-boxing makes you a better person.

The hypothesis that we don't dare take seriously I may as well explicitly state:

7. LW readers one-box more because one-boxing is the right answer.

Good breakdown. #7 is not an explanation unless coupled with a hypothesis on why LW readers are more adept than mainstream philosophers and decision theorists at spotting the right answer on this problem. Unless one claims that LWers just have a generally higher IQ (implausible) an explanation for this would probably go back to #1 or something like it.

Personally, I think the answer is a combination of #1, #2, #3. I'm not sure about the relative roles played by each of them (which have a decreasing level of "rationality") but here is an analogy:

Suppose you know that there is a controversy between two views A and B in philosophy (or economics, or psychology, or another area which is not a hard science), that University X has in its department a leading proponent of theory A, and that bunch of theorists have clustered around her. It is surely not surprising that there are more A proponents among this group than among the general discipline. As possible explanations, the same factors apply in this general case: we could hypothesize that philosophers in X are exposed to unusually strong arguments for A, or that B-proponents disproportionately go to other universities, or that philosophers in X are slavishly following their leader. I contend that the question about LW is no different in essence from this general one, and that whatever view about the interplay of sociology, memetic theory and rationality you have as your explanation of "many A-ers at X" also should apply for "many 1-boxers at LW".

Anecdotally, there are two probability games that convinced me to one-box: The Monty Hall game and playing against the rock-paper-scissors bot at the NY Times.

The RPS bot is a good real world example of how it is theoretically possible to have an AI (or "Omega") who accurately predicts my decisions. The RPS bot predicted my decision about 2 out of 3 times so I don't see any conceptual reason why an even better designed robot/AI would beat me 999/1000 times at RPS. I tried really hard to outsmart the RPS bot and even still I lost more than I won. It was only when I randomized my choices using a hashing algorithm of sorts that I started to win.

The only reason I knew about the RPS game at the NYT was due to participation on Less Wrong, so maybe anecdotes like mine are the reason for the link. I also don't have any emotional attachment to the idea of free will.

I just recently really worked through this, and I'm a firm one-boxer. After a few discussions with two-boxer people, I came to understand why: I consider myself predictable and deterministic. Two-boxers do not.

For me, the idea that Omega can predict my behaviour accurately is pretty much a no-brainer. I already think it possible to upload into digital form and make multiple copies of myself (which are all simultaneously "me"), and running bulk numbers of predictions using simulations seems perfectly reasonable. Two-boxers, on the other hand, think of conciousness and their sense of self as some mystical, magical thing that can't be reliably predicted.

The reason I would pick only one box is roughly: the more strongly I want to pick one box, the more I convince myself to only pick one box, the more likely it is that simulations of me will also pick one box.

Note that by reasoning this out in advance, prior to being presented with the actual decision, I have in all probability raised my odds of walking away with the million dollar box. I now have an established pattern of cached thoughts with a preference for selecting one box, which may improve the odds that simulated copies will also one-box.

This note also implies another side effect: if Omega has a high accuracy rate even when people are caught flat footed (without prior exposure to the problem), then my estimation of Omega's predictive powers increases dramatically.

The high accuracy rate itself implies something, though I'm not quite sure what: with an extremely high accuracy rate, either people are disinclined to choose completely randomly, or Omega's predictor is good enough that true random number generation is very difficult for a human.

Is the Predictor omniscient or making a prediction?

A tangent: when I worked at a teen homeless shelter there would sometimes be a choice for clients to get a little something now or more later. Now won every time, later never. Anything close to a bird in hand was valued more than a billion ultra birds not in the hand. A lifetime of being betrayed by adults, or poor future skills, or both and more might be why that happened. Two boxes without any doubt for those guys. As Predictors they would always predict two boxes and be right.