Confidence levels inside and outside an argument

Related to: Infinite Certainty

Suppose the people at FiveThirtyEight have created a model to predict the results of an important election. After crunching poll data, area demographics, and all the usual things one crunches in such a situation, their model returns a greater than 999,999,999 in a billion chance that the incumbent wins the election. Suppose further that the results of this model are your only data and you know nothing else about the election. What is your confidence level that the incumbent wins the election?

Mine would be significantly less than 999,999,999 in a billion.

When an argument gives a probability of 999,999,999 in a billion for an event, then probably the majority of the probability of the event is no longer in "But that still leaves a one in a billion chance, right?". The majority of the probability is in "That argument is flawed". Even if you have no particular reason to believe the argument is flawed, the background chance of an argument being flawed is still greater than one in a billion.


More than one in a billion times a political scientist writes a model, ey will get completely confused and write something with no relation to reality. More than one in a billion times a programmer writes a program to crunch political statistics, there will be a bug that completely invalidates the results. More than one in a billion times a staffer at a website publishes the results of a political calculation online, ey will accidentally switch which candidate goes with which chance of winning.

So one must distinguish between levels of confidence internal and external to a specific model or argument. Here the model's internal level of confidence is 999,999,999/billion. But my external level of confidence should be lower, even if the model is my only evidence, by an amount proportional to my trust in the model.



Is That Really True?

One might be tempted to respond "But there's an equal chance that the false model is too high, versus that it is too low." Maybe there was a bug in the computer program, but it prevented it from giving the incumbent's real chances of 999,999,999,999 out of a trillion.

The prior probability of a candidate winning an election is 50%1. We need information to push us away from this probability in either direction. To push significantly away from this probability, we need strong information. Any weakness in the information weakens its ability to push away from the prior. If there's a flaw in FiveThirtyEight's model, that takes us away from their probability of 999,999,999 in of a billion, and back closer to the prior probability of 50%

We can confirm this with a quick sanity check. Suppose we know nothing about the election (ie we still think it's 50-50) until an insane person reports a hallucination that an angel has declared the incumbent to have a 999,999,999/billion chance. We would not be tempted to accept this figure on the grounds that it is equally likely to be too high as too low.

A second objection covers situations such as a lottery. I would like to say the chance that Bob wins a lottery with one billion players is 1/1 billion. Do I have to adjust this upward to cover the possibility that my model for how lotteries work is somehow flawed? No. Even if I am misunderstanding the lottery, I have not departed from my prior. Here, new information really does have an equal chance of going against Bob as of going in his favor. For example, the lottery may be fixed (meaning my original model of how to determine lottery winners is fatally flawed), but there is no greater reason to believe it is fixed in favor of Bob than anyone else.2

Spotted in the Wild

The recent Pascal's Mugging thread spawned a discussion of the Large Hadron Collider destroying the universe, which also got continued on an older LHC thread from a few years ago. Everyone involved agreed the chances of the LHC destroying the world were less than one in a million, but several people gave extraordinarily low chances based on cosmic ray collisions. The argument was that since cosmic rays have been performing particle collisions similar to the LHC's zillions of times per year, the chance that the LHC will destroy the world is either literally zero, or else a number related to the probability that there's some chance of a cosmic ray destroying the world so miniscule that it hasn't gotten actualized in zillions of cosmic ray collisions. Of the commenters mentioning this argument, one gave a probability of 1/3*10^22, another suggested 1/10^25, both of which may be good numbers for the internal confidence of this argument.

But the connection between this argument and the general LHC argument flows through statements like "collisions produced by cosmic rays will be exactly like those produced by the LHC", "our understanding of the properties of cosmic rays is largely correct", and "I'm not high on drugs right now, staring at a package of M&Ms and mistaking it for a really intelligent argument that bears on the LHC question", all of which are probably more likely than 1/10^20. So instead of saying "the probability of an LHC apocalypse is now 1/10^20", say "I have an argument that has an internal probability of an LHC apocalypse as 1/10^20, which lowers my probability a bit depending on how much I trust that argument".

In fact, the argument has a potential flaw: according to Giddings and Mangano, the physicists officially tasked with investigating LHC risks, black holes from cosmic rays might have enough momentum to fly through Earth without harming it, and black holes from the LHC might not3. This was predictable: this was a simple argument in a complex area trying to prove a negative, and it would have been presumptous to believe with greater than 99% probability that it was flawless. If you can only give 99% probability to the argument being sound, then it can only reduce your probability in the conclusion by a factor of a hundred, not a factor of 10^20.

But it's hard for me to be properly outraged about this, since the LHC did not destroy the world. A better example might be the following, taken from an online discussion of creationism4 and apparently based off of something by Fred Hoyle:

In order for a single cell to live, all of the parts of the cell must be assembled before life starts. This involves 60,000 proteins that are assembled in roughly 100 different combinations. The probability that these complex groupings of proteins could have happened just by chance is extremely small. It is about 1 chance in 10 to the 4,478,296 power. The probability of a living cell being assembled just by chance is so small, that you may as well consider it to be impossible. This means that the probability that the living cell is created by an intelligent creator, that designed it, is extremely large. The probability that God created the living cell is 10 to the 4,478,296 power to 1.

Note that someone just gave a confidence level of 10^4478296 to one and was wrong. This is the sort of thing that should never ever happen. This is possibly the most wrong anyone has ever been.

It is hard to say in words exactly how wrong this is. Saying "This person would be willing to bet the entire world GDP for a thousand years if evolution were true against a one in one million chance of receiving a single penny if creationism were true" doesn't even begin to cover it: a mere 1/10^25 would suffice there. Saying "This person believes he could make one statement about an issue as difficult as the origin of cellular life per Planck interval, every Planck interval from the Big Bang to the present day, and not be wrong even once" only brings us to 1/10^61 or so. If the chance of getting Ganser's Syndrome, the extraordinarily rare psychiatric condition that manifests in a compulsion to say false statements, is one in a hundred million, and the world's top hundred thousand biologists all agree that evolution is true, then this person should preferentially believe it is more likely that all hundred thousand have simultaneously come down with Ganser's Syndrome than that they are doing good biology5

This creationist's flaw wasn't mathematical; the math probably does return that number. The flaw was confusing the internal probability (that complex life would form completely at random in a way that can be represented with this particular algorithm) with the external probability (that life could form without God). He should have added a term representing the chance that his knockdown argument just didn't apply.

Finally, consider the question of whether you can assign 100% certainty to a mathematical theorem for which a proof exists. Eliezer has already examined this issue and come out against it (citing as an example this story of Peter de Blanc's). In fact, this is just the specific case of differentiating internal versus external probability when internal probability is equal to 100%. Now your probability that the theorem is false is entirely based on the probability that you've made some mistake.

The many mathematical proofs that were later overturned provide practical justification for this mindset.

This is not a fully general argument against giving very high levels of confidence: very complex situations and situations with many exclusive possible outcomes (like the lottery example) may still make it to the 1/10^20 level, albeit probably not the 1/10^4478296. But in other sorts of cases, giving a very high level of confidence requires a check that you're not confusing the probability inside one argument with the probability of the question as a whole.

Footnotes

1. Although technically we know we're talking about an incumbent, who typically has a much higher chance, around 90% in Congress.

2. A particularly devious objection might be "What if the lottery commissioner, in a fit of political correctness, decides that "everyone is a winner" and splits the jackpot a billion ways? If this would satisfy your criteria for "winning the lottery", then this mere possibility should indeed move your probability upward. In fact, since there is probably greater than a one in one billion chance of this happening, the majority of your probability for Bob winning the lottery should concentrate here!

3. Giddings and Mangano then go on to re-prove the original "won't cause an apocalypse" argument using a more complicated method involving white dwarf stars.

4. While searching creationist websites for the half-remembered argument I was looking for, I found what may be my new favorite quote: "Mathematicians generally agree that, statistically, any odds beyond 1 in 10 to the 50th have a zero probability of ever happening." 

5. I'm a little worried that five years from now I'll see this quoted on some creationist website as an actual argument.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 12:37 PM
Select new highlight date
All comments loaded

While searching creationist websites for the half-remembered argument I was looking for, I found what may be my new favorite quote: "Mathematicians generally agree that, statistically, any odds beyond 1 in 10 to the 50th have a zero probability of ever happening."

That reminds me of one of my favourites, from a pro-abstinence blog:

When you play with fire, there is a 50/50 chance something will go wrong, and nine times out of ten it does.

In Terry Pratchett's Discworld series, it is a law of narrative causality that 1 in a million chances work out 9 times out of 10. Some characters once made a difficult thing they were attempting artificially harder, to try to make the probability exactly 1 in a million and invoke this trope.

When you play with fire, there is a 50/50 chance something will go wrong, and nine times out of ten it does.

They are only admitting their poor calibration.

Heh.

Though, admitting poor calibration that way is like saying "I incorrectly believe X to be true, its actually Y".

Note that someone just gave a confidence level of 10^4478296 to one and was wrong. This is the sort of thing that should never ever happen. This is possibly the most wrong anyone has ever been.

I was in some discussion at SIAI once and made an estimate that ended up being off by something like three hundred trillion orders of magnitude. (Something about giant look-up tables, but still.) Anyone outdo me?

Wow. The worst I've ever done is giving 9 orders of magnitude inside my 90% confidence interval for the velocity of the earth and being wrong. (It turns out the earth doesn't move faster than the speed of light!)

Surely declaring "x is impossible", before witnessing x, would be the most wrong you could be?

I take more issue with the people who incredulously shout "That's impossible!" after witnessing x.

I don't. You can witness a magician, e.g., violating conservation of matter, and still declare "that's impossible!"

Basically, you're stating that you don't believe that the signals your senses reported to you are accurate.

The colloquial meaning of "x is impossible" is probably closer to "x has probability <0.1%" than "x has probability 0"

Probabilities of 1 and 0 are considered rule violations and discarded.

Probabilities of 1 and 0 are considered rule violations and discarded.

What should we take for P(X|X) then?

And then what can I put you down for the probability that Bayes' Theorem is actually false? (I mean the theorem itself, not any particular deployment of it in an argument.)

What should we take for P(X|X) then?

He's addressed that:

The one that I confess is giving me the most trouble is P(A|A). But I would prefer to call that a syntactic elimination rule for probabilistic reasoning, or perhaps a set equality between events, rather than claiming that there's some specific proposition that has "Probability 1".

and then

Huh, I must be slowed down because it's late at night... P(A|A) is the simplest case of all. P(x|y) is defined as P(x,y)/P(y). P(A|A) is defined as P(A,A)/P(A) = P(A)/P(A) = 1. The ratio of these two probabilities may be 1, but I deny that there's any actual probability that's equal to 1. P(|) is a mere notational convenience, nothing more. Just because we conventionally write this ratio using a "P" symbol doesn't make it a probability.

Ah, thanks for the pointer. Someone's tried to answer the question about the reliability of Bayes' Theorem itself too I see. But I'm afraid I'm going to have to pass on this, because I don't see how calling something a syntactic elimination rule instead a law of logic saves you from incoherence.

That's quite a lot. Can you tell us what the estimate was?

I'm a bit irked by the continued persistence of "LHC might destroy the world" noise. Given no evidence, the prior probability that microscopic black holes can form at all, across all possible systems of physics, is extremely small. The same theory (String Theory[1]) that has led us to suggest that microscopic black holes might form at all is also quite adamant that all black holes evaporate, and equally adamant that microscopic ones evaporate faster than larger ones by a precise factor of the mass ratio cubed. If we think the theory is talking complete nonsense, then the posterior probability of an LHC disaster goes down, because we favor the ignorant prior of a universe where microscopic black holes don't exist at all.

Thus, the "LHC might destroy the world" noise boils down to the possibility that (A) there is some mathematically consistent post-GR, microscopic-black-hole-predicting theory that has massively slower evaporation, (B) this unnamed and possibly non-existent theory is less Kolmogorov-complex and hence more posterior-probable than the one that scientists are currently using[2], and (C) scientists have completely overlooked this unnamed and possibly non-existent theory for decades, strongly suggesting that it has a large Levenshtein distance from the currently favored theory. The simultaneous satisfaction of these three criteria seems... pretty f-ing unlikely, since each tends to reject the others. A/B: it's hard to imagine a theory that predicts post-GR physics with LHC-scale microscopic black holes that's more Kolmogorov-simple than String Theory, which can actually be specified pretty damn compactly. B/C: people already have explored the Kolmogorov-simple space of post-Newtonian theories pretty heavily, and even the simple post-GR theories are pretty well explored, making it unlikely that even a theory with large edit distance from either ST or SM+GR has been overlooked. C/A: it seems like a hell of a coincidence that a large-edit-distance theory, i.e. one extremely dissimilar to ST, would just happen to also predict the formation of LHC-scale microscopic black holes, then go on to predict that they're stable on the order of hours or more by throwing out the mass-cubed rule[3], then go on to explain why we don't see them by the billions despite their claimed stability. (If the ones from cosmic rays are so fast that the resulting black holes zip through Earth, why haven't they eaten Jupiter, the Sun, or other nearby stars yet? Bombardment by cosmic rays is not unique to Earth, and there are plenty of celestial bodies that would be heavy enough to capture the products.)

[1] It's worth noting that our best theory, the Standard Model with General Relativity, does not predict microscopic black holes at LHC energies. Only String Theory does: ST's 11-dimensional compactified space is supposed to suddenly decompactify at high energy scales, making gravity much more powerful at small scales than GR predicts, thus allowing black hole formation at abnormally low energies, i.e. those accessible to LHC. And naked GR (minus the SM) doesn't predict microscopic black holes. At all. Instead, naked GR only predicts supernova-sized black holes and larger.

[2] The biggest pain of SM+GR is that, even though we're pretty damn sure that that train wreck can't be right, we haven't been able to find any disconfirming data that would lead the way to a better theory. This means that, if the correct theory were more Kolmogorov-complex than SM+GR, then we would still be forced as rationalists to trust SM+GR over the correct theory, because there wouldn't be enough Bayesian evidence to discriminate the complex-but-correct theory from the countless complex-but-wrong theories. Thus, if we are to be convinced by some alternative to SM+GR, either that alternative must be Kolmogorov-simpler (like String Theory, if that pans out), or that alternative must suggest a clear experiment that leads to a direct disconfirmation of SM+GR. (The more-complex alternative must also somehow attract our attention, and also hint that it's worth our time to calculate what the clear experiment would be. Simple theories get eyeballs, but there are lots of more-complex theories that we never bother to ponder because that solution-space doesn't look like it's worth our time.)

[3] Even if they were stable on the order of seconds to minutes, they wouldn't destroy the Earth: the resulting black holes would be smaller than an atom, in fact smaller than a proton, and since atoms are mostly empty space the black hole would sail through atoms with low probability of collision. I recall that someone familiar with the physics did the math and calculated that an LHC-sized black hole could swing like a pendulum through the Earth at least a hundred times before gobbling up even a single proton, and the same calculation showed it would take over 100 years before the black hole grew large enough to start collapsing the Earth due to tidal forces, assuming zero evaporation. Keep in mind that the relevant computation, t = (5120 × π × G^2 × M^3) ÷ (ℏ × c^4), shows that a 1-second evaporation time is equal to 2.28e8 grams[3a] i.e. 250 tons, and the resulting radius is r = 2 × G × M ÷ c^2 is 3.39e-22 meters[3b], or about 0.4 millionths of a proton radius[3c]. That one-second-duration black hole, despite being tiny, is vastly larger than the ones that might be created by LHC -- 10^28 larger by mass, in fact[3d]. (FWIW, the Schwarzschild radius calculation relies only on GR, with no quantum stuff, while the time-to-evaporate calculation depends on some basic QM as well. String Theory and the Standard Model both leave that particular bit of QM untouched.)

[3a] Google Calculator: "(((1 s) h c^4) / (2pi 5120pi G^2)) ^ (1/3) in grams"

[3b] Google Calculator: "2 G 2.28e8 grams / c^2 in meters"

[3c] Google Calculator: "3.3856695e-22 m / 0.8768 femtometers", where 0.8768 femtometers is the experimentally accepted charge radius of a proton

[3d] Google Calculator: "(2.28e8 g * c^2) / 14 TeV", where 14 TeV is the LHC's maximum energy (7 TeV per beam in a head-on proton-proton collision)

I wonder how the anti-LHC arguments on this site might look if we substitute cryptography for the LHC. Mathematicians might say the idea of mathematics destroying the world is ridiculous, but after all we have to trust that all mathematicians announcing opinions on the subject are sane, and we know the number of insane mathematicians in general is greater than zero. And anyway, their arguments would (almost) certainly involve assuming the probability of mathematics destroying the world is 0, so should obviously be disregarded. Thus, the danger of running OpenSSH needs to be calculated as an existential risk taking in our future possible light cone. (Though handily, this would be a spectacular tour de force against DRM.) For an encore, we need someone to calculate the existential risk of getting up in the morning to go to work. Also, did switching on the LHC send back tachyons to cause 9/11? I think we need to be told.

One might be tempted to respond "But there's an equal chance that the false model is too high, versus that it is too low."

I'm not sure why one might be tempted to make this response. Is the idea that, when making any calculation at all, one is equally likely to get a number that is too big as one that is too small? But then, that's before you have looked at the number.

Yet another counter-response is that even if the response were true, the false model could be much too high, but it can only be slightly too low, since 1-10^-9 is quite close to 1.

There are really two claims here. The first one -- that if some guy on the Internet has a model predicting X with 99.99% certainty, then you should assign less probability to X, absent other evidence -- seems interesting, but relatively easy to accept. I'm pretty sure I've been reasoning this way in the past.

The second claim is exactly the same, but applied to oneself. "If I have come up with an argument that predicts X with 99.99% certainty, I should be less than 99.99% certain of X." This is not something that people do by default. I doubt that I do it unless prompted. Great post!

Stylistic nitpick, though: things like "999,999,999 in a billion" are tricky to parse, especially when compared to "999,999,999,999 in a trillion" (which I initially read as approximately 1 in 1000 before counting the 9s) or "1/1 billion". Counting the 9s is part of the problem, the other that the numerator is a number and the denominator is a word. What's wrong with writing 99.99% and 99.9999%? These are different from the original values in the post, but still carry the argument, and are easier to read.

I personally find the best way to deal with such numbers is to talk about nines.

999,999,999 in a billion=99.9 999 999%= 9 nines

999,999,999,999 in a trillion=99.9 999 999 999%= 12 nines

This raises the question: Should scientific journals adjust the p-value that they require from an experiment, to be no larger than the probability (found empirically) that a peer-reviewed article contains a factual, logical, methodological, experimental, or typographical error?

The meta-science part would change with time, e.g. how many people read the article and found no mistakes. Doesn't seem to mix well with a fixed result.

Maybe some separate, online thing that just reported on the probability of claims could handle the meta-science.

First, great post. Second, general injunctions against giving very low probabilities to things seems to be taken by many casual readers as endorsements of the (bad) behavior "privilege the hypothesis" - e.g. moving the probability from very small to moderately small that God exists. That's not right, but I don't have excellent arguments for why it's not right. I'd love it if you wrote an article on choosing good priors.

Cosma Shalizi has done some technical work that seems (to my incompetent eye) to be relevant:

http://projecteuclid.org/DPubS?verb=Display&version=1.0&service=UI&handle=euclid.ejs/1256822130&page=record

That is, he takes Bayesian updating, which requires modeling the world, and answers the question 'when would it be okay to use Bayesian updating, even though we know the model is definitely wrong - e.g. too simple?'. (Of course, making your model "not obviously wrong" by adding complexity isn't a solution.)

I am still confused about how small the probability I should use in the God question is. I understand the argument about privileging the hypothesis and about intelligent beings being very complex and fantastically unlikely.

But I also feel that if I tried to use an argument at least that subtle, when applied to something I am at least as confused about as how ontologically complex a first cause should be, to disprove things at least as widely believed as religion, a million times, I would be wrong at least once.

But I also feel that if I tried to use an argument at least that subtle, when applied to something I am at least as confused about as how ontologically complex a first cause should be, to disprove things at least as widely believed as religion, a million times, I would be wrong at least once.

See Advancing Certainty. The fact that this statement sounds comfortably modest does not exempt it from the scrutiny of the Fundamental Question of Rationality (why do you believe what you believe?). I respectfully submit that if the answer is "because I have been wrong before, where I was equally confident, in previous eras of my life when I wasn't using arguments this powerful (they just felt powerful to me at the time)", that doesn't suffice -- for the same reason that the Lord Kelvin argument doesn't suffice to show that arguments from physics can't be trusted (unless you don't think physics has learned anything since Kelvin).

I've got to admit I disagree with a lot of Advancing Certainty. The proper reference class for a modern physicist who is well acquainted with the mistakes of Lord Kelvin and won't do them again is "past scientists who were well acquainted with the mistakes of their predecessors and plan not to do them again", which I imagine has less than a hundred percent success rate and which might have included Kelvin.

It would be a useful exercise to see whether the most rational physicists of 1950 have more successful predictions as of 2000 than the most rational physicists of 1850 did as of 1900. It wouldn't surprise me if this were true, and so, then the physicists of 2000 could justly put themselves in a new reference class and guess they will be even more successful as of 2050 than the 1950ers were in 2000. But if the success rate after fifty years remains constant, I wouldn't want to say "Yeah, well , we've probably solved all those problems now, so we'll do better".

I've got to admit I disagree with a lot of Advancing Certainty

Do you actually disagree with any particular claim in Advancing Certainty, or does it just seem "off" to you in its emphasis? Because when I read your post, I felt myself "disagreeing" (and panicking at the rapid upvoting), but reflection revealed that I was really having something more like an ADBOC reaction. It felt to me that the intent of your post was to say "Boo confident probabilities!", while I tend to be on the side of "Yay confident probabilities!" -- not because I'm in favor of overconfidence, but rather because I think many worries about overconfidence here tend to be ill-founded (I suppose I'm something of a third-leveler on this issue.)

And indeed, when you see people complaining about overconfidence on LW, it's not usually because someone thinks that some political candidate has a 0.999999999 chance of winning an election; almost nobody here would think that a reasonable estimate. Instead, what you get is people saying that 0.0000000001 is too low a probability that God exists -- on the basis of nothing else than general worry about human overconfidence.

I think my anti-anti-overconfidence vigilance started when I realized I had been socially intimidated into backing off from my estimate of 0.001 in the Amanda Knox case, when in fact that was and remains an entirely reasonable number given my detailed knowledge of the case. The mistake I made was to present this number as if it were something that participants in my survey should have arrived at from a few minutes of reading. Those states -- the ones that survey participants were in, with reference classes like "highly controversial conviction with very plausible defense arguments" -- are what probabilities like 0.1 or 0.3 are for. My state, on the other hand, was more like "highly confident inside-view conclusion bolstered by LW survey results decisively on the same side of 50%".

But this isn't what the overconfidence-hawks argued. What they said, in essence, was that 0.001 was just somehow "inherently" too confident. Only "irrational" people wear the attire of "P(X) = 0.001"; We Here, by contrast, are Aware Of Biases Like Overconfidence, and only give Measured, Calm, Reasonable Probabilities.

That is the mistake I want to fight, now that I have the courage to do so. Though I can't find much to literally disagree about in your post, it unfortunately feels to me like ammunition for the enemy.

I definitely did have the "ammunition for the enemy" feeling about your post, and the "belief attire" point is a good one, but I think the broad emotional disagreement does express itself in a few specific claims:

  1. Even if you were to control for getting tired and hungry and so on, even if you were to load your intelligence into a computer and have it do the hard work, I still don't think you could judge a thousand such trials and be wrong only once. I admit this may not be as real a disagreement as I'm thinking, because it may be a confusion on what sort of reference class we should use to pick trials for you.

  2. I think we might disagree on the Lord Kelvin claim. I think I would predict more of today's physical theories are wrong than you would.

  3. I think my probability that God exists would be several orders of magnitude higher than yours, even though I think you probably know about the same number of good arguments on the issue as I do.

Maybe our disagreement can be resolved empirically - if we were to do enough problems where we gave confidence levels on questions like "The area of Canada is greater than the area of the Mediterranean Sea" and use log odds scoring we might find one of us doing significantly better than the other - although we would have to do quite a few to close off my possible argument that we just didn't hit that one "black swan" question on which you'd say you're one in a million confident and then get it wrong. Would you agree that this would get to the heart of our disagreement, or do you think it revolves solely around more confusing philosophical questions?

(I took a test like that yesterday to test something and I came out overconfident, missing 2/10 questions at the 96% probability level. I don't know how that translates to more real-world questions and higher confidence levels, but it sure makes me reluctant to say I'm chronically underconfident)