David Chapman criticizes "pop Bayesianism" as just common-sense rationality dressed up as intimidating math[1]:

Bayesianism boils down to “don’t be so sure of your beliefs; be less sure when you see contradictory evidence.”

Now that is just common sense. Why does anyone need to be told this? And how does [Bayes'] formula help?

[...]

The leaders of the movement presumably do understand probability. But I’m wondering whether they simply use Bayes’ formula to intimidate lesser minds into accepting “don’t be so sure of your beliefs.” (In which case, Bayesianism is not about Bayes’ Rule, after all.)

I don’t think I’d approve of that. “Don’t be so sure” is a valuable lesson, but I’d rather teach it in a way people can understand, rather than by invoking a Holy Mystery.

What does Bayes's formula have to teach us about how to do epistemology, beyond obvious things like "never be absolutely certain; update your credences when you see new evidence"?

I list below some of the specific things that I learned from Bayesianism. Some of these are examples of mistakes I'd made that Bayesianism corrected. Others are things that I just hadn't thought about explicitly before encountering Bayesianism, but which now seem important to me.

I'm interested in hearing what other people here would put on their own lists of things Bayesianism taught them. (Different people would make different lists, depending on how they had already thought about epistemology when they first encountered "pop Bayesianism".)

I'm interested especially in those lessons that you think followed more-or-less directly from taking Bayesianism seriously as a normative epistemology (plus maybe the idea of making decisions based on expected utility). The LW memeplex contains many other valuable lessons (e.g., avoid the mind-projection fallacy, be mindful of inferential gaps, the MW interpretation of QM has a lot going for it, decision theory should take into account "logical causation", etc.). However, these seem further afield or more speculative than what I think of as "bare-bones Bayesianism".

So, without further ado, here are some things that Bayesianism taught me.

  1. Banish talk like "There is absolutely no evidence for that belief". P(E | H) > P(E) if and only if P(H | E) > P(H). The fact that there are myths about Zeus is evidence that Zeus exists. Zeus's existing would make it more likely for myths about him to arise, so the arising of myths about him must make it more likely that he exists. A related mistake I made was to be impressed by the cleverness of the aphorism "The plural of 'anecdote' is not 'data'." There may be a helpful distinction between scientific evidence and Bayesian evidence. But anecdotal evidence is evidence, and it ought to sway my beliefs.
  2. Banish talk like "I don't know anything about that". See the post "I don't know."
  3. Banish talk of "thresholds of belief". Probabilities go up or down, but there is no magic threshold beyond which they change qualitatively into "knowledge". I used to make the mistake of saying things like, "I'm not absolutely certain that atheism is true, but it is my working hypothesis. I'm confident enough to act as though it's true." I assign a certain probability to atheism, which is less than 1.0. I ought to act as though I am just that confident, and no more. I should never just assume that I am in the possible world that I think is most likely, even if I think that that possible world is overwhelmingly likely. (However, perhaps I could be so confident that my behavior would not be practically discernible from absolute confidence.)
  4. Absence of evidence is evidence of absence. P(H | E) > P(H) if and only if P(H | ~E) < P(H). Absence of evidence may be very weak evidence of absence, but it is evidence nonetheless. (However, you may not be entitled to a particular kind of evidence.)
  5. Many bits of "common sense" rationality can be precisely stated and easily proved within the austere framework of Bayesian probability.  As noted by Jaynes in Probability Theory: The Logic of Science, "[P]robability theory as extended logic reproduces many aspects of human mental activity, sometimes in surprising and even disturbing detail." While these things might be "common knowledge", the fact that they are readily deducible from a few simple premises is significant. Here are some examples:
    • It is possible for the opinions of different people to diverge after they rationally update on the same evidence. Jaynes discusses this phenomenon in Section 5.3 of PT:TLoS.
    • Popper's falsification criterion, and other Popperian principles of "good explanation", such as that good explanations should be "hard to vary", follow from Bayes's formula. Eliezer discusses this in An Intuitive Explanation of Bayes' Theorem and A Technical Explanation of Technical Explanation.
    • Occam's razor. This can be formalized using Solomonoff induction. (However, perhaps this shouldn't be on my list, because Solomonoff induction goes beyond just Bayes's formula. It also has several problems.)
  6. You cannot expect[2] that future evidence will sway you in a particular direction. "For every expectation of evidence, there is an equal and opposite expectation of counterevidence."
  7. Abandon all the meta-epistemological intuitions about the concept of knowledge on which Gettier-style paradoxes rely. Keep track of how confident your beliefs are when you update on the evidence. Keep track of the extent to which other people's beliefs are good evidence for what they believe. Don't worry about whether, in addition, these beliefs qualify as "knowledge".

What items would you put on your list?

ETA: ChrisHallquist's post Bayesianism for Humans lists other "directly applicable corollaries to Bayesianism".


[1]  See also Yvain's reaction to David Chapman's criticisms.

[2]  ETA: My wording here is potentially misleading.  See this comment thread.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 12:31 AM
Select new highlight date
All comments loaded

A related mistake I made was to be impressed by the cleverness of the aphorism "The plural of 'anecdote' is not 'data'." There may be a helpful distinction between scientific evidence and Bayesian evidence. But anecdotal evidence is evidence, and it ought to sway my beliefs.

Anecdotal evidence is filtered evidence. People often cite the anecdote that supports their belief, while not remembering or not mentioning events that contradict them. You can find people saying anecdotes on any side of a debate, and I see no reason the people who are right would cite anecdotes more.

Of course, if you witness an anecdote with your own eyes, that is not filtered, and you should adjust your beliefs accordingly.

Of course, if you witness an anecdote with your own eyes, that is not filtered

Unless you too selectively (mis)remember things.

Unless you too selectively (mis)remember things.

Or selectively expose yourself to situations.

I think the value of anecdotes often doesn't lie so much in changing probabilities of belief but in illustrating what a belief actually is about.

That, and existence/possibility proofs, and, in the very early phases of investigation, providing a direction for inquiry.

Anecdotal evidence is filtered evidence.

Right, the existence of the anecdote is the evidence, not the occurrence of the events that it alleges.

You can find people saying anecdotes on any side of a debate, and I see no reason the people who are right would cite anecdotes more.

It is true that, if a hypothesis has reached the point of being seriously debated, then there are probably anecdotes being offered in support of it. (... assuming that we're taking about the kinds of hypotheses that would ever have an anecdote offered in support of it.) Therefore, the learning of the existence of anecdotes probably won't move much probability around among the hypotheses being seriously debated.

However, hypothesis space is vast. Many hypotheses have never even been brought up for debate. The overwhelming majority should never come to our attention at all.

In particular, hypothesis space contains hypotheses for which no anecdote has ever been offered. If you learned that a particular hypothesis H were true, you would increase your probability that H was among those hypotheses that are supported by anecdotes. (Right? The alternative is that which hypotheses get anecdotes is determined by mechanisms that have absolutely no correlation, or even negative correlation, with the truth.) Therefore, the existence of an anecdote is evidence for the hypothesis that the anecdote alleges is true.

A typical situation is that there's a contentious issue, and some anecdotes reach your attention that support one of the competing hypotheses.

You have three ways to respond:

  1. You can under-update your belief in the hypothesis, ignoring the anecdotes completely
  2. You can update by precisely the measure warranted by the existence of these anecdotes and the fact that they reached you.
  3. You can over-update by adding too much credence to the hypothesis.

In almost every situation you're likely to encounter, the real danger is 3. Well-known biases are at work pulling you towards 3. These biases are often known to work even when you're aware of them and trying to counteract them. Moreover, the harm from reaching 3 is typically far greater than the harm from reaching 1. This is because the correct added amount of credence in 2 is very tiny, particularly because you're already likely to know that the competing hypotheses for this issue are all likely to have anecdotes going for them. In real-life situations, you don't usually hear anecdotes supporting an incredibly unlikely-seeming hypothesis which you'd otherwise be inclined to think as capable of nurturing no anecdotes at all. So forgoing that tiny amount of credence is not nearly as bad as choosing 3 and updating, typically, by a large amount.

The saying "The plural of anecdotes is not data" exists to steer you away from 3. It works to counteract the very strong biases pulling you towards 3. Its danger, you are saying, is that it pulls you towards 1 rather than the correct 2. That may be pedantically correct, but is a very poor reason to criticize the saying. Even with its help, you're almost always very likely to over-update - all it's doing is lessening the blow.

Perhaps this as an example of "things Bayesianism has taught you" that are harming your epistemic rationality?

A similar thing I noticed is disdain towards "correlation does not imply causation" from enlightened Bayesians. It is counter-productive.

These biases are often known to work even when you're aware of them and trying to counteract them.

This is the problem. I know, as an epistemic matter of fact, that anecdotes are evidence. I could try to ignore this knowledge, with the goal of counteracting the biases to which you refer. That is, I could try to suppress the Bayesian update or to undo it after it has happened. I could try to push my credence back to where it was "manually". However, as you point out, counteracting biases in this way doesn't work.

Far better, it seems to me, to habituate myself to the fact that updates can by miniscule. Credence is quantitative, not qualitative, and so can change by arbitrarily small amounts. "Update Yourself Incrementally". Granting that someone has evidence for their claims can be an arbitrarily small concession. Updating on the evidence doesn't need to move my credences by even a subjectively discernible amount. Nonetheless, I am obliged to acknowledge that the anecdote would move the credences of an ideal Bayesian agent by some nonzero amount.

A typical situation is that there's a contentious issue, and some anecdotes reach your attention that support one of the competing hypotheses.

It is interesting that you think of this as typical, or at least typical enough to be exclusionary of non-contentious issues. I avoid discussions about politics and possibly other contentious issues, and when I think of people providing anecdotes I usually think of them in support of neutral issues, like the efficacy of understudied nutritional supplements. If someone tells you, "I ate dinner at Joe's Crab Shack and I had intense gastrointestinal distress," I wouldn't think it's necessarily justified to ignore it on the basis that it's anecdotal. If you have 3 more friends who all report the same thing to you, you should rightly become very suspicious of the sanitation at Joe's Crab Shack. I think the fact that you are talking about contentious issues specifically is an important and interesting point of clarification.

Anecdotal evidence is filtered evidence.

Still evidence.

After accounting for the filtering, which way does it point? If you're left with a delta log-odds of zero, it's "evidence" only in the sense that if you have no apples you have "some" apples.

You cannot expect that future evidence will sway you in a particular direction. "For every expectation of evidence, there is an equal and opposite expectation of counterevidence."

The (related) way I would expand this is "if you know what you will believe in the future, then you ought to believe that now."

Quoting myself from Yvain's blog:

Here’s a short and incomplete list of habits I would include in qualitative Bayes:

  1. Base rate attention.
  2. Consider alternative hypotheses.
  3. Compare hypotheses by likelihood ratios, not likelihoods.
  4. Search for experiments with high information content (measured by likelihood ratio) and low cost.
  5. Conservation of evidence.
  6. Competing values should have some tradeoff between them.

Each one of those is a full post to explain, I think. I also think they’re strongly reinforcing; 3 and 4 are listed as separate insights, here, but I don’t think one is very useful without the other.

Maybe this wasn't your intent, but framing this post as a rebuttal of Chapman doesn't seem right to me. His main point isn't "Bayesianism isn't useful"--more like "the Less Wrong memeplex has an unjustified fetish for Bayes' Rule" which still seems pretty true.

[1] See also Yvain's reaction to David Chapman's criticisms.

Chapman's follow-up.

I didn't get a lot out of Bayes at the first CFAR workshop, when the class involved mentally calculating odds ratios. It's hard for me to abstractly move numbers around in my head. But the second workshop I volunteered at used a Bayes-in-everyday-life method where you drew (or visualized) a square, and drew a vertical line to divide it according to the base rates of X versus not-X, and then drew a horizontal line to divide each of the slices according to how likely you were to see evidence H in the world where X was true, and the world where not-X was true. Then you could basically see whether the evidence had a big impact on your belief, just by looking at the relative size of the various rectangles. I have a strong ability to visualize, so this is helpful.

I visualize this square with some frequency when I notice an empirical claim about thing X presented with evidence H. Other than that, I query myself "what's the base rate of this?" a lot, or ask myself the question "is H actually more likely in the world where X is true versus false? Not really? Okay, it's not strong evidence."

"Absence of evidence isn't evidence of absence" is such a ubiquitous cached thought in rationalist communities (that I've been involved with) that its antithesis was probably the most important thing I learned from Bayesianism.

I find it interesting that Sir Arthur Conan Doyle, the author of the Sherlock Holmes stories, seems to have understood this concept. In his story "Silver Blaze" he has the following conversation between Holmes and a Scotland Yard detective:

Gregory (Scotland Yard detective): "Is there any other point to which you would wish to draw my attention?"

Holmes: "To the curious incident of the dog in the night-time."

Gregory: "The dog did nothing in the night-time."

Holmes: "That was the curious incident."

I am confused. I always thought that the "Bayes" in Bayesianism refers to the Bayesian Probability Model. Bayes' rule is a powerful theorem, but it is just one theorem, and is not what Bayesianism is all about. I understand that the video being criticized was specifically talking about Bayes' rule, but I do not think that is what Bayesianism is about at all. The Bayesian probability model basically says that probability is a degree of belief (as opposed to other models that only really work with possible possible worlds or repeatable experiments). I always thought this was the main thesis of Bayesianism was "The best language to talk about uncertainty is probability theory," which agrees perfectly with the interpretation that the name comes from the Bayesian probability model, and has nothing to do with Bayes' rule. Am I using the word in a way differently than everyone else?

Banish talk of "thresholds of belief" ... However, perhaps I could be so confident that my behavior would not be practically discernible from absolute confidence.

While this is true mathematically, I'm not sure it's useful for people. Complex mental models have overhead, and if something is unlikely enough then you can do better to stop thinking about it. Maybe someone broke into my office and when I get there on Monday I won't be able to work. This is unlikely, but I could look up the robbery statistics for Cambridge and see that this does happen. Mathematically, I should be considering this in making plans for tomorrow, but practically it's a waste of time thinking about it.

(There's also the issue that we're not good at thinking about small probabilities. It's very hard to keep unlikely possibilities from taking on undue weight except by just not thinking about them.)

Maybe someone broke into my office and when I get there on Monday I won't be able to work. This is unlikely, but I could look up the robbery statistics for Cambridge and see that this does happen. Mathematically, I should be considering this in making plans for tomorrow, but practically it's a waste of time thinking about it.

I think about such things every time I lock a door. Or at least, I lock doors because I have thought about such things, even if they're not at the forefront of my mind when I do them. Do you not lock yours? Do you have an off-site backup for your data? Insurance against the place burning down?

Having taken such precautions as you think useful, thinking further about it is, to use Eliezer's useful concept, wasted motion. It is a thought that, predictably at the time you think it, will as events transpire turn out to not have contributed in any useful way. You will go to work anyway, and see then whether thieves have been in the night.

Tiny probabilities do not, in general, map to tiny changes in actions. Decisions are typically discontinuous functions of the probabilities.

You cannot expect that future evidence will sway you in a particular direction. "For every expectation of evidence, there is an equal and opposite expectation of counterevidence."

Well ... you can have an expected direction, just not if you account for magnitudes.

For example if I'm estimating the bias on a weighted die, and so far I've seen 2/10 rolls give 6's, if I roll again I expect most of the time to get a non-6 and revise down my estimate of the probability of a 6; however on the occasions when I do roll a 6 I will revise up my estimate by a larger amount.

Sometimes it's useful to have this distinction.

Well ... you can have an expected direction, just not if you account for magnitudes.

Yes, on reflection it was a poor choice of words. I was using "expect" in that sense according to which one expects a parameter to equal zero if the expected value of that parameter is zero. However, while "expected value" has a well-established technical meaning, "expect" alone may not. It is certainly reasonably natural to read what I wrote as meaning "my opinion is equally likely to be swayed in either direction," which, as you point out, is incorrect. I've added a footnote to clarify my meaning.

So to summarise in pop Bayesian terms, akin to "don’t be so sure of your beliefs; be less sure when you see contradictory evidence." :

  1. There is always evidence; if it looks like the contrary, you are using too high a bar. (The plural of 'anecdote' is 'qualitative study'.)
  2. You can always give a guess; even if it later turns out incorrect, you have no way of knowing now.
  3. The only thing that matters is the prediction; hunches, gut feelings, hard numbers or academic knowledge, it all boils down to probabilities.
  4. Absence of evidence is evidence of absence, but you can't be picky with evidence.
  5. The maths don't lie; if it works, it is because somewhere there are numbers and rigour saying it should. (Notice the direction of the implication "it works" => "it has maths".)
  6. The more confident you are, the more surprised you can be; if you are unsure it means you expect anything.
  7. "Knowledge" is just a fancy-sounding word; ahead-of-time predictions or bust!

ETA:

  1. Choosing to believe is wishful thinking.

I'll add the Bayesian definition of evidence an awareness of selection effects to the list.

Reading this clarified something for me. In particular, "Banish talk like "There is absolutely no evidence for that belief".

OK, I can see that mathematically there can be very small amounts of evidence for some propositions (e.g. the existence of the deity Thor.) However in practice there is a limit to how small evidence can be for me to make any practical use of it. If we assign certainties to our beliefs on a scale of 0 to 100, then what can I realistically do with a bit of evidence that moves me from 87 to 87.01? or 86.99? I don't think I can estimate my certainty accurately to 1 decimal place--in fact I'm not sure I can get it to within one significant digit on many issues--and yet there's a lot of evidence in the world that should move my beliefs by a lot less than that.

Mathematically it makes sense to update on all evidence. Practically, there is a fuzzy threshold beyond which I need to just ignore very weak evidence, unless there's so much of it that the sum total crosses the bounds of significance.

I suppose we all came across Bayesianism from different points of view - my list is quite a bit different.

For me the biggest one is that the degree to which I should believe in something is basically determined entirely by the evidence, and IS NOT A MATTER OF CHOICE or personal belief. If I believe something with degree of probability X, and see Y happen that is evidence for X, then the degree of probability Z which which I then should believe is a mathematical matter, and not a "matter of opinion."

The prior seems to be a get-out clause here, but since all updates are in principle layered on top of the first prior I had before receiving any evidence of any kind, it surely seems a mistake to give it too much weight.

My own personal view is also that often it's not optimal to update optimally. Why? Lack of computing power between the ears. Rather than straining the grey matter to get the most out of the evidence you have, it's often best to just go out and get more evidence to compensate. Quantity of evidence beats out all sorts of problems with priors or analysis errors, and makes it more difficult to reach the wrong conclusions.

On a non-Bayesian note, I have a rule to be careful of cases which consist of lots of small bits of evidence combined together. This looks fine mathematically until someone points out the lots of little bits of evidence pointing to something else which I just ignored or didn't even see. Selection effects apply more strongly to cases which consist of lots of little parts.

Of course if you have the chance to actually do Bayesian mathematics rather than working informally with the brain, you can of course update exactly as you should, and use lots of little bits of evidence to form a case. But without a formal framework you can expect your innate wetware to mess up this type of analysis.

We should unpack "banish talk of X" to mean that we should avoid assessments/analysis that would naturally be expressed in such surface terms.

Since most of us don't do deep thinking unless we use some notation or words, "banish talk of" is a good heuristic for such training, if you can notice yourself (or others can catch you) doing it.