I suspect that it looks like some version of TDT / UDT, where TDT corresponds to something like trying to update on "being the kind of agent who outputs this action in this situation" and UDT corresponds to something more mysterious that I haven't been able to find a good explanation of yet, but I haven't thought about this much.

I can try to explain UDT a bit more if you say what you find mysterious about it. Or if you just want to think about it some more, keep in mind that UDT was designed to solve a bunch of problems at the same time, so if you see some feature of it that seems unmotivated, it might be trying to solve a problem that you haven't focused on yet.

Another thing to keep in mind is that UDT is currently formulated mainly for AI rather than human use (whereas you seem to be thinking mostly in human terms). For example it assumes that the agent has full "bit-level" access to its own source code, memories and sensory data, which allows UDT to conceptualize a decision (the thing you're deriving consequences from, or conditioning upon) as a logical fact about the input/output map implemented by a certain piece of code. It avoids human concepts like "being the kind of", "agent", or "situation", which might be hard to fully specify and unambiguously translate to code. The downside is that it's hard for humans (who do not have full introspective access to their own minds and do think in terms of high level concepts) to apply UDT.

Evidential Decision Theory, Selection Bias, and Reference Classes

See also: Does Evidential Decision Theory really fail Solomon's Problem?, What's Wrong with Evidential Decision Theory?

It seems to me that the examples usually given of decision problems where EDT makes the wrong decisions are really examples of performing Bayesian updates incorrectly. The basic problem seems to be that naive EDT ignores a selection bias when it assumes that an agent that has just performed an action should be treated as a random sample from the population of all agents who have performed that action. Said another way, naive EDT agents make some unjustified assumptions about what reference classes they should put themselves into when considering counterfactuals. A more sophisticated Bayesian agent should make neither of these mistakes, and correcting them should not in principle require moving beyond EDT but just becoming less naive in applying it. 

Elaboration

Recall that an EDT agent attempts to maximize conditional expected utility. The main criticism of EDT is that naively computing conditional probabilities leads to the conclusion that you should perform actions which are good news upon learning that they happened, as opposed to actions which cause good outcomes (what CDT attempts to do instead). For a concrete example of the difference, let's take the smoking lesion problem:

Smoking is strongly correlated with lung cancer, but in the world of the Smoker's Lesion this correlation is understood to be the result of a common cause: a genetic lesion that tends to cause both smoking and cancer. Once we fix the presence or absence of the lesion, there is no additional correlation between smoking and cancer.

Suppose you prefer smoking without cancer to not smoking without cancer, and prefer smoking with cancer to not smoking with cancer. Should you smoke?

In the smoking lesion problem, smoking is bad news, but it doesn't cause a bad outcome: learning that someone smokes, in the absence of further information, increases your posterior probability that they have the lesion and therefore cancer, but choosing to smoke cannot in fact alter whether you have the lesion / cancer or not. Naive EDT recommends not smoking, but naive CDT recommends smoking, and in this case it seems that naive CDT's recommendation is correct and naive EDT's recommendation is not. 

The naive EDT agent's reasoning process involves considering the following counterfactual: "if I observe myself smoking, that increases my posterior probability that I have the lesion and therefore cancer, and that would be bad. Therefore I will not smoke." But it seems to me that in this counterfactual, the naive EDT agent -- who smokes and then glumly concludes that there is an increased probability that they have cancer -- is performing a Bayesian update incorrectly, and that the incorrectness of this Bayesian update, rather than any fundamental problem with making decisions based on conditional probabilities, is what causes the naive EDT agent to perform poorly. 

Here are some other examples of this kind of Bayesian update, all of which seem obviously incorrect to me. They lead to silly decisions because they are silly updates. 

  • "If I observe myself throwing away expensive things, that increases my posterior probability that I am rich and can afford to throw away expensive things, and that would be good. Therefore I will throw away expensive things." (This example requires that you have some uncertainty about your finances -- perhaps you never check your bank statement and never ask your boss what your salary is.)
  • "If I observe myself not showering, that increases my posterior probability that I am clean and do not need to shower, and that would be good. Therefore I will not shower." (This example requires that you have some uncertainty about how clean you are -- perhaps you don't have a sense of smell or a mirror.)
  • "If I observe myself playing video games, that increases my posterior probability that I don't have any work to do, and that would be good. Therefore I will play video games." (This example requires that you have some uncertainty about how much work you have to do -- perhaps you write this information down and then forget it.) 

Selection Bias

Earlier I said that in the absence of further information, learning that someone smokes increases your posterior probability that they have the lesion and therefore cancer in the smoking lesion problem. But when a naive EDT agent is deciding what to do, they have further information: in the counterfactual where they're smoking, they know that they're smoking because they're in a counterfactual about what would happen if they smoked (or something like that). This information should screen off inferences about other possible causes of smoking, which is perhaps clearer in the bulleted examples above. If you consider what would happen if you threw away expensive things, you know that you're doing so because you're considering what would happen if you threw away expensive things and not because you're rich. 

Failure to take this information into account is a kind of selection bias: a naive EDT agent considering the counterfactual where they perform some action treats itself as a random sample from the population of similar agents who have performed such actions, but it is not in fact such a random sample! The sampling procedure, which consists of actually performing an action, is undoubtedly biased. 

Reference Classes

Another way to think about the above situation is that a naive EDT agent chooses inappropriate reference classes: when an agent performs an action, the appropriate reference class is not all other agents who have performed that action. It's unclear to me exactly what it is, but at the very least it's something like "other sufficiently similar agents who have performed that action under sufficiently similar circumstances." 

This is actually very easy to see in the smoker's lesion problem because of the following observation (which I think I found in Eliezer's old TDT writeup): suppose the world of the smoker's legion is populated entirely with naive EDT agents who do not know whether or not they have the lesion. Then the above argument suggests that none of them will choose to smoke. But if that's the case, then where does the correlation between the lesion and smoking come from? Any agents who smoke are either not naive EDT agents or know whether they have the lesion. In either case, that makes them inappropriate members of the reference class any reasonable Bayesian agent should be using.

Furthermore, if the naive EDT agents collectively decide to become slightly less naive and restrict their reference class to each other, they now find that smoking no longer gives any information about whether they have the lesion or not! This is a kind of reflective inconsistency: the naive recommendation not to smoke in the smoker's lesion problem has the property that, if adopted by a population of naive EDT agents, it breaks the correlations upon which the recommendation is based. 

The Tickle Defense

As it happens, there is a standard counterargument in the decision theory literature to the claim that EDT recommends not smoking in the smoking lesion problem. It is known as the "tickle defense," and runs as follows: in the smoking lesion problem, what an EDT agent should be updating on is not the action of smoking but an internal desire, or "tickle," prompting it to smoke, and once the presence or absence of such a tickle has been updated on it screens off any information gained by updating on the act of smoking or not smoking. So EDT + Tickles smokes on the smoking lesion problem. (Note that this prescription also has the effect of breaking the correlation claimed in the setup of the smoking lesion problem among a population of EDT + Tickles agents who don't know whether hey have the lesion or not. So maybe there's just something wrong with the smoking lesion problem.) 

The tickle defense is good in that it encourages ignoring less information than naive EDT, but it strikes me as a patch covering up part of a more general problem, namely the problem of how to choose appropriate reference classes when performing Bayesian updates (or something like that). So I don't find it a satisfactory rescuing of EDT. It doesn't help that there's a more sophisticated version known as the "meta-tickle defense" that recommends two-boxing on Newcomb's problem.

Sophisticated EDT?

What does a more sophisticated version of EDT, taking the above observations into account, look like? I don't know. I suspect that it looks like some version of TDT / UDT, where TDT corresponds to something like trying to update on "being the kind of agent who outputs this action in this situation" and UDT corresponds to something more mysterious that I haven't been able to find a good explanation of yet, but I haven't thought about this much. If someone else has, let me know.

Here are some vague thoughts. First, I think this comment by Stuart_Armstrong is right on the money:

I've found that, in practice, most versions of EDT are underspecified, and people use their intuitions to fill the gaps in one direction or the other.

A "true" EDT agent needs to update on all the evidence they've ever observed, and it's very unclear to me how to do this in practice. So it seems that it's difficult to claim with much certainty that EDT will or will not do a particular thing in a particular situation.

CDT-via-causal-networks and TDT-via-causal-networks seem like reasonable candidates for more sophisticated versions of EDT in that they formalize the intuition above about screening off possible causes of a particular action. TDT seems like it better captures this intuition in that it better attempts to update on the cause of an action in a hypothetical about that action (the cause being that TDT outputs that action). My intuition here is that it should be possible to see causal networks as arising naturally out of Bayesian considerations, although I haven't thought about this much either. 

AIXI might be another candidate. Unfortunately, AIXI can't handle the smoking lesion problem because it models itself as separate from the environment, whereas a key point in the smoking lesion problem is that an agent in the world of the smoking lesion has some uncertainty about its innards, regarded as part of its environment. Fully specifying sophisticated EDT might involve finding a version of AIXI that models itself as part of its environment. 

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 5:46 PM
Select new highlight date
All comments loaded

Look, HIV patients who get HAART die more often (because people who get HAART are already very sick). We don't get to see the health status confounder because we don't get to observe everything we want. Given this, is HAART in fact killing people, or not?

EDT does the wrong thing here. Any attempt to not handle the confounder properly does the wrong thing here. If something does handle the confounder properly, it's not EDT anymore (because it's not going to look at E[death|HAART]). If you are willing to call such a thing "EDT", then EDT can mean whatever you want it to mean.


Here's the specific example to work out using whatever version of EDT you want:

People get HAART over time (let's restrict to 2 time slices for simplicity). The first time HAART is given (A0) it is randomized. The second time HAART is given (A1), it is given by a doctor according to some (known) policy based on vitals after A0 was given and some time passed (L0). Then we see if the patient dies or not (Y). The graph is this:

A0 -> L0 -> A1 -> Y, with A0 -> A1 and A0 -> Y. There is also health status confounding between L0 and Y (a common cause we don't get to see). Based on this data, how do we determine whether giving people HAART at A0 and A1 is a good idea?


It's true that you can formalize, say fluid dynamics in set theory if you wanted. Does this then mean fluid dynamics is set theoretic? One needs to pick the right level of abstraction.


I think discussions of AIXI, source-code aware agents, etc. in the context of decision theories are a bit sterile because they are very far from actual problems people want to solve (e.g. is this actual non-hypothetical drug killing actual non-hypothetical people?)

EDT does the wrong thing here. Any attempt to not handle the confounder properly does the wrong thing here. If something does handle the confounder properly, it's not EDT anymore (because it's not going to look at E[death|HAART])

According to the wikipedia page, EDT uses conditional probabilities. I.e.

V(HAART) = P(death|HAART)U(death) + P(!death|HAART)U(!death).

The problem is not with this EDT formula in general, but with how these probabilities are defined and estimated. In reality, they are based on a sample, and we are making a decision for a particular patient, i.e.

V(HAART-patient1) = P(death-patient1|HAART-patient1)U(death-patient1) + P(!death-patient1|HAART-patient1)U(!death-patient1).

We don't know any of these probabilities exactly, since you will not find out whether the patient dies until after you give or not give him the treatment. So instead, you estimate the probabilities based on other patients. A completely brain-dead model would use the reference class of all people, and conclude that HAART kills. But a more sophisticated model would include something like P(patient1 is similar to patient2) to define a better reference class, and it would also take into account confounders.

I keep hoping my "toxoplasmosis problem" alternative to the Smoking Lesion will take off!

The toxoplasmosis problem is a scenario that demonstrates a failing of EDT and a success of CDT. Toxoplasma gondii is a single-celled parasite carried by a significant fraction of humanity. It affects mammals in general and is primarily hosted by cats. Infection can have a wide range of negative effects (though most show no symptoms). It has also been observed that infected rats will be less afraid of cats, and even attracted to cat urine. Correlations have been shown between psychiatric disorders and toxoplasmosis, and it has been speculated (but not tested) that the disease may cause people to be more risk taking, and attracted to cats. Neurological mechanisms have been proposed (Flegr 2007).

http://intelligence.org/2013/04/19/altairs-timeless-decision-theory-paper-published/

Other alternatives to the Smoking Lesion Problem:

Eliezer has one with chewing gum and throat abcesses (PDF). "I have avoided [the Smoking Lesion] variant because in real life, smoking does cause lung cancer."

(According to that same document this class of problem is known as Solomon's Problem.)

orthonormal proposes the Aspirin Paradox.

The toxoplasmosis version has the drawback that in the real world there is presumably also a causal link from adoring cats to getting infected, which has to be disregarded for The Toxoplasmosis Problem, just as the real causal effect of smoking on cancer must be disregarded in The Smoking Lesion.

I suspect that it looks like some version of TDT / UDT, where TDT corresponds to something like trying to update on "being the kind of agent who outputs this action in this situation" and UDT corresponds to something more mysterious that I haven't been able to find a good explanation of yet, but I haven't thought about this much.

I can try to explain UDT a bit more if you say what you find mysterious about it. Or if you just want to think about it some more, keep in mind that UDT was designed to solve a bunch of problems at the same time, so if you see some feature of it that seems unmotivated, it might be trying to solve a problem that you haven't focused on yet.

Another thing to keep in mind is that UDT is currently formulated mainly for AI rather than human use (whereas you seem to be thinking mostly in human terms). For example it assumes that the agent has full "bit-level" access to its own source code, memories and sensory data, which allows UDT to conceptualize a decision (the thing you're deriving consequences from, or conditioning upon) as a logical fact about the input/output map implemented by a certain piece of code. It avoids human concepts like "being the kind of", "agent", or "situation", which might be hard to fully specify and unambiguously translate to code. The downside is that it's hard for humans (who do not have full introspective access to their own minds and do think in terms of high level concepts) to apply UDT.

Even more than an explanation, I would appreciate an explanation on the LessWrong Wiki because there currently isn't one! I've just reread through the LW posts I could find about UDT and I guess I should let them stew for awhile. I might also ask people at the current MIRI workshop for their thoughts in person.

Another thing to keep in mind is that UDT is currently formulated mainly for AI rather than human use (whereas you seem to be thinking mostly in human terms).

Only as an intuition pump; when it's time to get down to brass tacks I'm much happier to talk about a well-specified program than a poorly-specified human.

I wrote a brief mathematical write-up of "bare bones" UDT1 and UDT1.1. The write-up describes the version that Wei Dai gave in his original posts. The write-up doesn't get into more advanced versions that invoke proof-length limits, try to "play chicken with the universe", or otherwise develop how the "mathematical intuition module" is supposed to work.

Without trying to make too much of the analogy, I think that I would describe TDT as "non-naive" CDT, and UDT as "non-naive" EDT.

UDT corresponds to something more mysterious

Don't update at all, but instead optimize yourself, viewed as a function from observations to actions, over all possible worlds.

There are tons of details, but it doesn't seem impossible to summarize in a sentence.

My intuition here is that it should be possible to see causal networks as arising naturally out of Bayesian considerations

You disagree, then, with Pearl's dictum that causality is a primitive concept, not reducible to any statistical construction?

The Smoker's Lesion problem is completely dissolved by using the causal information about the lesion. Without that information it cannot be. The correlations among Smoking, Lesion, and Cancer, on their own, allow of the alternative causal possibilities that Smoking causes Lesion, which causes Cancer, or that Cancer causes Lesion, which causes Smoking (even in the presence of the usual causal assumptions of DAG, Markov, and Faithfulness). These three causal graphs cannot be distinguished by the observational statistics. The causal information given in the problem is an essential part of its statement, and no decision theory which ignores causation can solve it.

EDT recommends the action "which, conditional on your having chosen it, gives you the best expectations for the outcome." That formulation glosses over whether that conditional expectation is based on the statistical correlations observed in the population (i.e. ignoring causation), or the correlations resulting from considering the actions as interventions in a causal network. It is generally understood as the former; attempts to fix it consist of changing it to use the latter. The only differences among these various attempts is how willing their proposers are to simply say "do causal reasoning".

When you talk about selection bias, you talk about counterfactuals (do-actions, in Pearl's notation, a causal concept). The Tickle defence introduces a causal hypothesis (the tickle prompting, i.e. causing smoking). I don't follow the reference class part, but it doesn't seem to cover the situation of an EDT reasoner advising someone else who professes an inclination to smoke. That is just as much a problem for EDT as the original version. It is also a problem that AIXI can be set to solving. What might its answer be?

How useful is it to clarify EDT until it becomes some decision theory with a different, previously determined name?

It would be useful for my mental organization of how decision theory works. I don't know if it would be useful to anyone else though.

I don't much care what we call the thing, but exploring the logical relations between conventional EDT and other anti-CDT options could be extremely useful for persuading EDTists to adopt TDT, UDT, or some other novel theory. Framing matters even for academics.

Approximately this point appears to have been made in the decision theory literature already, in Against causal decision theory by Huw Price.