Timeless Decision Theory: Problems I Can't Solve

Suppose you're out in the desert, running out of water, and soon to die - when someone in a motor vehicle drives up next to you.  Furthermore, the driver of the motor vehicle is a perfectly selfish ideal game-theoretic agent, and even further, so are you; and what's more, the driver is Paul Ekman, who's really, really good at reading facial microexpressions.  The driver says, "Well, I'll convey you to town if it's in my interest to do so - so will you give me $100 from an ATM when we reach town?"

Now of course you wish you could answer "Yes", but as an ideal game theorist yourself, you realize that, once you actually reach town, you'll have no further motive to pay off the driver.  "Yes," you say.  "You're lying," says the driver, and drives off leaving you to die.

If only you weren't so rational!

This is the dilemma of Parfit's Hitchhiker, and the above is the standard resolution according to mainstream philosophy's causal decision theory, which also two-boxes on Newcomb's Problem and defects in the Prisoner's Dilemma.  Of course, any self-modifying agent who expects to face such problems - in general, or in particular - will soon self-modify into an agent that doesn't regret its "rationality" so much.  So from the perspective of a self-modifying-AI-theorist, classical causal decision theory is a wash.  And indeed I've worked out a theory, tentatively labeled "timeless decision theory", which covers these three Newcomblike problems and delivers a first-order answer that is already reflectively consistent, without need to explicitly consider such notions as "precommitment".  Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

However, there are some other timeless decision problems for which I do not possess a general theory.

For example, there's a problem introduced to me by Gary Drescher's marvelous Good and Real (OOPS: The below formulation was independently invented by Vladimir Nesov; Drescher's book actually contains a related dilemma in which box B is transparent, and only contains $1M if Omega predicts you will one-box whether B appears full or empty, and Omega has a 1% error rate) which runs as follows:

Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:

"I just flipped a fair coin.  I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000.  And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads.  The coin came up heads - can I have $1000?"

Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who answers "Yes" to this sort of question - just like with Newcomb's Problem or Parfit's Hitchhiker.

But I don't have a general theory which replies "Yes".  At the point where Omega asks me this question, I already know that the coin came up heads, so I already know I'm not going to get the million.  It seems like I want to decide "as if" I don't know whether the coin came up heads or tails, and then implement that decision even if I know the coin came up heads.  But I don't have a good formal way of talking about how my decision in one state of knowledge has to be determined by the decision I would make if I occupied a different epistemic state, conditioning using the probability previously possessed by events I have since learned the outcome of...  Again, it's easy to talk informally about why you have to reply "Yes" in this case, but that's not the same as being able to exhibit a general algorithm.

Another stumper was presented to me by Robin Hanson at an OBLW meetup.  Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by majority vote.  Let's say that six of them form a coalition and decide to vote to divide the pie among themselves, one-sixth each.  But then two of them think, "Hey, this leaves four agents out in the cold.  We'll get together with those four agents and offer them to divide half the pie among the four of them, leaving one quarter apiece for the two of us.  We get a larger share than one-sixth that way, and they get a larger share than zero, so it's an improvement from the perspectives of all six of us - they should take the deal."  And those six then form a new coalition and redivide the pie.  Then another two of the agents think:  "The two of us are getting one-eighth apiece, while four other agents are getting zero - we should form a coalition with them, and by majority vote, give each of us one-sixth."

And so it goes on:  Every majority coalition and division of the pie, is dominated by another majority coalition in which each agent of the new majority gets more pie.  There does not appear to be any such thing as a dominant majority vote.

(Robin Hanson actually used this to suggest that if you set up a Constitution which governs a society of humans and AIs, the AIs will be unable to conspire among themselves to change the constitution and leave the humans out in the cold, because then the new compact would be dominated by yet other compacts and there would be chaos, and therefore any constitution stays in place forever.  Or something along those lines.  Needless to say, I do not intend to rely on such, but it would be nice to have a formal theory in hand which shows how ideal reflectively consistent decision agents will act in such cases (so we can prove they'll shed the old "constitution" like used snakeskin.))

Here's yet another problem whose proper formulation I'm still not sure of, and it runs as follows.  First, consider the Prisoner's Dilemma.  Informally, two timeless decision agents with common knowledge of the other's timeless decision agency, but no way to communicate or make binding commitments, will both Cooperate because they know that the other agent is in a similar epistemic state, running a similar decision algorithm, and will end up doing the same thing that they themselves do.  In general, on the True Prisoner's Dilemma, facing an opponent who can accurately predict your own decisions, you want to cooperate only if the other agent will cooperate if and only if they predict that you will cooperate.  And the other agent is reasoning similarly:  They want to cooperate only if you will cooperate if and only if you accurately predict that they will cooperate.

But there's actually an infinite regress here which is being glossed over - you won't cooperate just because you predict that they will cooperate, you will only cooperate if you predict they will cooperate if and only if you cooperate.  So the other agent needs to cooperate if they predict that you will cooperate if you predict that they will cooperate... (...only if they predict that you will cooperate, etcetera).

On the Prisoner's Dilemma in particular, this infinite regress can be cut short by expecting that the other agent is doing symmetrical reasoning on a symmetrical problem and will come to a symmetrical conclusion, so that you can expect their action to be the symmetrical analogue of your own - in which case (C, C) is preferable to (D, D).  But what if you're facing a more general decision problem, with many agents having asymmetrical choices, and everyone wants to have their decisions depend on how they predict that other agents' decisions depend on their own predicted decisions?  Is there a general way of resolving the regress?

On Parfit's Hitchhiker and Newcomb's Problem, we're told how the other behaves as a direct function of our own predicted decision - Omega rewards you if you (are predicted to) one-box, the driver in Parfit's Hitchhiker saves you if you (are predicted to) pay $100 on reaching the city.  My timeless decision theory only functions in cases where the other agents' decisions can be viewed as functions of one argument, that argument being your own choice in that particular case - either by specification (as in Newcomb's Problem) or by symmetry (as in the Prisoner's Dilemma).  If their decision is allowed to depend on how your decision depends on their decision - like saying, "I'll cooperate, not 'if the other agent cooperates', but only if the other agent cooperates if and only if I cooperate - if I predict the other agent to cooperate unconditionally, then I'll just defect" - then in general I do not know how to resolve the resulting infinite regress of conditionality, except in the special case of predictable symmetry.

You perceive that there is a definite note of "timelessness" in all these problems.

Any offered solution may assume that a timeless decision theory for direct cases already exists - that is, if you can reduce the problem to one of "I can predict that if (the other agent predicts) I choose strategy X, then the other agent will implement strategy Y, and my expected payoff is Z", then I already have a reflectively consistent solution which this margin is unfortunately too small to contain.

(In case you're wondering, I'm writing this up because one of the SIAI Summer Project people asked if there was any Friendly AI problem that could be modularized and handed off and potentially written up afterward, and the answer to this is almost always "No", but this is actually the one exception that I can think of.  (Anyone actually taking a shot at this should probably familiarize themselves with the existing literature on Newcomblike problems - the edited volume "Paradoxes of Rationality and Cooperation" should be a sufficient start (and I believe there's a copy at the SIAI Summer Project house.)))

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 10:35 PM
Select new highlight date
All comments loaded

There does not appear to be any such thing as a dominant majority vote.

Eliezer, are you aware that there's an academic field studying issues like this? It's called Social Choice Theory, and happens to be covered in chapter 4 of Hervé Moulin's Fair Division and Collective Welfare, which I recommended in my post about Cooperative Game Theory.

I know you're probably approaching this problem from a different angle, but it should still be helpful to read what other researchers have written about it.

A separate comment I want to make is that if you want others to help you solve problems in "timeless decision theory", you really need to publish the results you've got already. What you're doing now is like if Einstein had asked people to help him predict the temperature of black holes before having published the general theory of relativity.

As far as needing a long sequence, are you assuming that the reader has no background in decision theory? What if you just write to an audience of professional decision theorists, or someone who has at least read "The Foundations of Causal Decision Theory" or the equivalent?

Seconded. I, for one, would be perfectly OK with posts requiring a lot of unfamiliar background math as long as they're correct and give references. For example, Scott Aaronson isn't afraid of scary topics and I'm not afraid of using his posts as entry points into the maze.

There does not appear to be any such thing as a dominant majority vote.

I want to note that this is also know in Cooperative Game Theory as an "empty core". (In Social Choice Theory it's studied under "majority cycling".) See http://www.research.rmutp.ac.th/research/Game%20Theory%20and%20Economic%20Analysis.pdf#page=85 for a good explanation of how Cooperative Game Theory views the problem. Unfortunately it doesn't look like anyone has a really good solution.

Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

  • But it is the writeup most-frequently requested of you, and also, I think, the thing you have done that you refer to the most often.

  • Nobody's going to offer. You have to ask them.

In case you're wondering, I'm writing this up because one of the SIAI Summer Project people asked if there was any Friendly AI problem that could be modularized and handed off and potentially written up afterward, and the answer to this is almost always "No"

Does it mean that the problem isn't reduced enough to reasonably modularize? It would be nice if you written up the outline of state of research at SIAI (even a brief one with unexplained labels) or an explanation of why you won't.

Hanson's example of ten people dividing the pie seems to hinge on arbitrarily passive actors who get to accept and make propositions instead of being able to solicit other deals or make counter proposals, and it is also contingent on infinite and costless bargaining time. The bargaining time bit may be a fair (if unrealistic) assumption, but the passivity does not make sense. It really depends on the kind of commitments and bargains players are able to make and enforce, and the degree/order of proposals from outgroup and ingroup members.

When the first two defectors say, "Hey, you each get an eighth if you join us," the four could pick another two of the in-crowd and say, "Hey, they offered us Y apiece, but we'll join you instead if you each give us X, Y<X (which is actually profitable to the other four so long as X < 1/4 - they get cut out entirely if they can't bargain)." No matter how it is divided, there will always be a subgroup in the in-crowd that could profitably bargain with the out-crowd, and there will always be a different subgroup in the in-crowd that will be able to make a better offer. So long as there is an out-crowd, there are people who can bargain profitably, and so longer as the in-crowd is > 6, people can be profitably removed.

If bargaining time is finite (or especially if it has non-zero cost), I suspect, but can't prove (for lack of effort/technical proficiency, not saying it's unprovable) that each actor will opt for the even 10-person split (especially if risk-averse) because it is (statistically) equivalent (or superior) to the sum*probability of other potential arrangements.

For example, there's a problem introduced to me by Gary Drescher's marvelous Good and Real (OOPS: The below formulation was independently invented by Vladimir Nesov

For a moment I was wondering how the Optimally Ordered Problem Solver was relevant.

Here's a comment that took me way too long to formulate:

On the Prisoner's Dilemma in particular, this infinite regress can be cut short by expecting that the other agent is doing symmetrical reasoning on a symmetrical problem and will come to a symmetrical conclusion...

Eliezer, if such reasoning from symmetry is allowed, then we sure don't need your "TDT" to solve the PD!

TDT allows you to use whatever you can prove mathematically. If you can prove that two computations have the same output because their global structures are isomorphic, it doesn't matter if the internal structure is twisty or involves regresses you haven't yet resolved. However, you need a license to use that sort of mathematical reasoning in the first place, which is provided by TDT but not CDT.

Strategies are probability (density) functions over choices. Behaviors are the choices themselves. Proving that two strategies are identical (by symmetry, say) doesn't license you to assume that the behaviors are the same. And it is behaviors you seem to need here. Two random variables over the same PDF are not equal.

Seldin got a Nobel for re-introducing time into game theory (with the concept of subgame perfect equilibrium as a refinement of Nash equilibrium). I think he deserved the prize. If you think that you can overturn Seldin's work with your TDT, then I say "To hell with a PhD. Write it up and go straight to Stockholm."

"I believe X to be like me" => "whatever I decide, X will decide also" seems tenuous without some proof of likeness that is beyond any guarantee possible in humans.

I can accept your analysis in the context of actors who have irrevocably committed to some mechanically predictable decision rule, which, along with perfect information on all the causal inputs to the rule, gives me perfect predictions of their behavior, but I'm not sure such an actor could ever trust its understanding of an actual human.

Maybe you could aspire to such determinism in a proven-correct software system running on proven-robust hardware.

"I believe X to be like me" => "whatever I decide, X will decide also" seems tenuous without some proof of likeness that is beyond any guarantee possible in humans...

Maybe you could aspire to such determinism in a proven-correct software system running on proven-robust hardware.

Well, yeah, this is primarily a theory for AIs dealing with other AIs.

You could possibly talk about human applications if you knew that the N of you had the same training as rationalists, or if you assigned probabilities to the others having such training.

As a first off-the-cuff thought, the infinite regress of conditionality sounds suspiciously close to general recursion. Do you have any guarantee that a fully general theory that gives a decision wouldn't be equivalent to a Halting Oracle?

ETA: If you don't have such a guarantee, I would submit that the first priority should be either securing one, or proving isomorphism to the Entscheidungsproblem and, thus, the impossibility of the fully general solution.

Is Parfit's Hitchhiker essentially the same as Kavka's toxin, or is there some substantial difference between the two I'm missing?

Is your majority vote problem related to Condorcet's paradox? It smells so, but I can't put a handle on why.

I cheated the PD infinite regress problem with a quine trick in Re-formalizing PD. The asymmetric case seems to be hard because fair division of utility is hard, not because quining is hard. Given a division procedure that everyone accepts as fair, the quine trick seems to solve the asymmetric case just as well.

Post your "timeless decision theory" already. If it's correct, it shouldn't be that complex. With your intelligence you can always write a PhD on some other AI topic should the opportunity arise. But after conversations with Vladimir Nesov I was kinda under the impression that you could solve the asymmetric PD-like cases too; if not, I'm a little disappointed in advance. :-(

Here's a crack at the coin problem.

Firstly TDT seems to answer correctly under one condition, if P(some agent will use my choice as evidence about how I am going to act in these situations and make this offer.) = 0. Then certainly, our AI shouldn't give omega any money. On the other hand, if P(some agent will use my choice as evidence about how I am going to act in these situations and make this offer.) = 0.5, then the expected utility =-100 + 0.5 ( 0.5 (1,000,000) + 0.5(-100)) So my general solution is this, add a node that represents the probability of repeating one of these trials, keep track of its value like any other node, carefully and gradually. Giving money would only be winning if you had the opportunity to make more money later because omega or someone else knows you give money, otherwise you shouldn't give money.

Another stumper was presented to me by Robin Hanson at an OBLW meetup. Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by majority vote. Let's say that six of them form a coalition and decide to vote to divide the pie among themselves, one-sixth each. But then two of them think, "Hey, this leaves four agents out in the cold. We'll get together with those four agents and offer them to divide half the pie among the four of them, leaving one quarter apiece for the two of us. We get a larger share than one-sixth that way, and they get a larger share than zero, so it's an improvement from the perspectives of all six of us - they should take the deal." And those six then form a new coalition and redivide the pie. Then another two of the agents think: "The two of us are getting one-eighth apiece, while four other agents are getting zero - we should form a coalition with them, and by majority vote, give each of us one-sixth."

How I would approach this problem:

Suppose that it is easier to adjust the proportions within your existing coalitions than to switch coalitions. An agent will not consider switching coalitions until it cannot improve its share in its present coalition. Therefore, any coalition will reach a stable configuration before you need consider agents switching to another coalition. If you can show that the only stable configuration is an equal division, then there will be no coalition-switching.

You can probably show that any agent receiving less than its share can receive a larger share by switching to a different coalition. Assume the other agents know this proof. You may then be able to show that they can hold onto a larger share by giving that agent its fair share than by letting it quit the coalition. You may need to use derivatives to do this. Or not.

Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

Can someone tell me the matrix of pay-offs for taking on Eleizer as a PhD student?

Well, for starters you have a 1/3^^^3 chance of 3^^^3 utils...

If the ten pie-sharers is to be more than a theoretical puzzle, but something with applicability to real decision problems, then certain expansions of the problem suggest themselves. For example, some of the players might conspire to forcibly exclude the others entirely. And then a subset of the conspirators do the same.

This is the plot of "For a Few Dollars More".

How do criminals arrange these matters in real life?

I swear I'll give you a PhD if you write the thesis. On fancy paper and everything.

Would timeless decision theory handle negotiation with your future self? For example if a timeless decision agent likes paperclips today but you knows it is going to be modified to like apples tomorrow, (and not care a bit about paperclips,) will it abstain from destroying the apple orchard, and its future self abstain from destroying the paperclips in exchange?

And is negotiation the right way to think about reconciling the difference between what I now want and what a predicted smarter, grown up, more knowledgeable version of me would want? or am I going the wrong way?

to talk about turning a paperclip maximizer into an apple maximizer is needlessly confusing. Better to talk about destroying a paperclip maximizer and creating an apple maximizer. And yes, timeless decision theory should allow these two agents to negotiate, though it gets confusing fast.

But I don't have a general theory which replies "Yes" [to a counterfactual mugging].

You don't? I was sure you'd handled this case with Timeless Decision Theory.

I will try to write up a sketch of my idea, which involves using a Markov State Machine to represent world states that transition into one another. Then you distinguish evidence about the structure of the MSM, from evidence of your historical path through the MSM. And the best decision to make in a world state is defined as the decision which is part of a policy that maximizes expected utility for the whole MSM.

OK, I just tried for four hours but couldn't successfully describe a useful formalism that provides a good analysis of counterfactual mugging. Will keep trying later.

If I were forced to pay $100 upon losing, I'd have a net gain of $4950 each time I play the game, on average. Transitioning from this into the game as it currently stands, I've merely been given an additional option. As a rationalist, I should not regret being one. Even knowing I won't get the $10,000, as the coin came up heads, I'm basically paying $100 for the other quantum me to receive $10,000. As the other quantum me, who saw the coin come up tails, my desire to have had the first quantum me pay $100 outweighs the other quantum me's desire to not lose $100. I'm not going to defect against myself for want of $100, and if you (general you, not eliezer specifically) would you should probably carry a weapon and a means of defending yourself against it because you can't trust yourself.

I THINK I SOLVED ONE - EDIT - Sorry, not quite.

"Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says: "I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?" Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who answers "Yes" to this sort of question - just like with Newcomb's Problem or Parfit's Hitchhiker."

To me, this seems like losing. I disagree that a reflectively consistent agent would give Omega money in these instances.

Since Omega always tells the truth, we should program ourselves to give him $1000 if and only if we don't know that the coin came up heads. If we know that the coin came up heads, it no longer makes sense to give him $1000 dollars.

This in no way prevents us from maximizing utility. An opponent of this strategy would contend that this prevents us from receiving the larger cash prize. That assertion would be false, because this strategy only occurs in situations where we have literally zero possibility of receiving any money from Omega. "Not giving Omega $1000" in instance H does not mean that we wouldn't give Omega $1000 in instances where we don't know whether or not H.

The fact that Omega has already told us it came up heads completely precludes any possibility of a reward. The fact that we choose to not give Omega money in situations where we know the coin comes up heads in no way precludes us from giving him $1000 in scenarios where we have a chance that we'll receive the larger cash prize. By not giving $1000 when H, there is still no precedent set which precludes the larger reward. Therefore we should give Omega $1000 if and only if we do not know that the coin landed on heads.

Yay. That seems too easy, I'm kind of worried I made a super obvious logical mistake. But I think it's right.

Sorry for not using the quote feature, but I'm awful at editing. I even tried using the sandbox and couldn't get it right.

EDIT: So, unfortunately, I don't think this solves the issue. It technically does, but it really amounts to more of a reason that Omega is stupid and should rephrase his statements. Instead of Omega saying "I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads" he should say "I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads and I told you that the coin came up heads".

So it's a minor improvement but it's nothing really important.

Just prefix the quote with a single greater-than sign.

Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:

"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?"

Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who answers "Yes" to this sort of question - just like with Newcomb's Problem or Parfit's Hitchhiker.

Compute the probabilities P(0)..P(n) that this deal will be offered to you again n times in the future. Sum over 499500 P(n) (n) for all n and agree to pay if the sum is greater than 1,000.