0 And 1 Are Not Probabilities

Overly Convenient Excuses

Followup toInfinite Certainty 

1, 2, and 3 are all integers, and so is -4.  If you keep counting up, or keep counting down, you're bound to encounter a whole lot more integers.  You will not, however, encounter anything called "positive infinity" or "negative infinity", so these are not integers.

Positive and negative infinity are not integers, but rather special symbols for talking about the behavior of integers.  People sometimes say something like, "5 + infinity = infinity", because if you start at 5 and keep counting up without ever stopping, you'll get higher and higher numbers without limit.  But it doesn't follow from this that "infinity - infinity = 5".  You can't count up from 0 without ever stopping, and then count down without ever stopping, and then find yourself at 5 when you're done.

From this we can see that infinity is not only not-an-integer, it doesn't even behave like an integer.  If you unwisely try to mix up infinities with integers, you'll need all sorts of special new inconsistent-seeming behaviors which you don't need for 1, 2, 3 and other actual integers.

Even though infinity isn't an integer, you don't have to worry about being left at a loss for numbers.  Although people have seen five sheep, millions of grains of sand, and septillions of atoms, no one has ever counted an infinity of anything. The same with continuous quantities—people have measured dust specks a millimeter across, animals a meter across, cities kilometers across, and galaxies thousands of lightyears across, but no one has ever measured anything an infinity across.  In the real world, you don't need a whole lot of infinity.

(I should note for the more sophisticated readers in the audience that they do not need to write me with elaborate explanations of, say, the difference between ordinal numbers and cardinal numbers.  Yes, I possess various advanced set-theoretic definitions of infinity, but I don't see a good use for them in probability theory.  See below.)

In the usual way of writing probabilities, probabilities are between 0 and 1.  A coin might have a probability of 0.5 of coming up tails, or the weatherman might assign probability 0.9 to rain tomorrow.

This isn't the only way of writing probabilities, though.  For example, you can transform probabilities into odds via the transformation O = (P / (1 - P)).  So a probability of 50% would go to odds of 0.5/0.5 or 1, usually written 1:1, while a probability of 0.9 would go to odds of 0.9/0.1 or 9, usually written 9:1.  To take odds back to probabilities you use P = (O / (1 + O)), and this is perfectly reversible, so the transformation is an isomorphism—a two-way reversible mapping.  Thus, probabilities and odds are isomorphic, and you can use one or the other according to convenience.

For example, it's more convenient to use odds when you're doing Bayesian updates.  Let's say that I roll a six-sided die:  If any face except 1 comes up, there's an 10% chance of hearing a bell, but if the face 1 comes up, there's a 20% chance of hearing the bell.  Now I roll the die, and hear a bell.  What are the odds that the face showing is 1?  Well, the prior odds are 1:5 (corresponding to the real number 1/5 = 0.20) and the likelihood ratio is 0.2:0.1 (corresponding to the real number 2) and I can just multiply these two together to get the posterior odds 2:5 (corresponding to the real number 2/5 or 0.40).  Then I convert back into a probability, if I like, and get (0.4 / 1.4) = 2/7 = ~29%.

So odds are more manageable for Bayesian updates—if you use probabilities, you've got to deploy Bayes's Theorem in its complicated version.  But probabilities are more convenient for answering questions like "If I roll a six-sided die, what's the chance of seeing a number from 1 to 4?"  You can add up the probabilities of 1/6 for each side and get 4/6, but you can't add up the odds ratios of 0.2 for each side and get an odds ratio of 0.8.

Why am I saying all this?  To show that "odd ratios" are just as legitimate a way of mapping uncertainties onto real numbers as "probabilities".  Odds ratios are more convenient for some operations, probabilities are more convenient for others.  A famous proof called Cox's Theorem (plus various extensions and refinements thereof) shows that all ways of representing uncertainties that obey some reasonable-sounding constraints, end up isomorphic to each other.

Why does it matter that odds ratios are just as legitimate as probabilities?  Probabilities as ordinarily written are between 0 and 1, and both 0 and 1 look like they ought to be readily reachable quantities—it's easy to see 1 zebra or 0 unicorns.  But when you transform probabilities onto odds ratios, 0 goes to 0, but 1 goes to positive infinity.  Now absolute truth doesn't look like it should be so easy to reach.

A representation that makes it even simpler to do Bayesian updates is the log odds—this is how E. T. Jaynes recommended thinking about probabilities.  For example, let's say that the prior probability of a proposition is 0.0001—this corresponds to a log odds of around -40 decibels.  Then you see evidence that seems 100 times more likely if the proposition is true than if it is false.  This is 20 decibels of evidence.  So the posterior odds are around -40 db + 20 db = -20 db, that is, the posterior probability is ~0.01.

When you transform probabilities to log odds, 0 goes onto negative infinity and 1 goes onto positive infinity.  Now both infinite certainty and infinite improbability seem a bit more out-of-reach. 

In probabilities, 0.9999 and 0.99999 seem to be only 0.00009 apart, so that 0.502 is much further away from 0.503 than 0.9999 is from 0.99999.  To get to probability 1 from probability 0.99999, it seems like you should need to travel a distance of merely 0.00001.

But when you transform to odds ratios, 0.502 and .503 go to 1.008 and 1.012, and 0.9999 and 0.99999 go to 9,999 and 99,999.  And when you transform to log odds, 0.502 and 0.503 go to 0.03 decibels and 0.05 decibels, but 0.9999 and 0.99999 go to 40 decibels and 50 decibels.

When you work in log odds, the distance between any two degrees of uncertainty equals the amount of evidence you would need to go from one to the other.  That is, the log odds gives us a natural measure of spacing among degrees of confidence.

Using the log odds exposes the fact that reaching infinite certainty requires infinitely strong evidence, just as infinite absurdity requires infinitely strong counterevidence.

Furthermore, all sorts of standard theorems in probability have special cases if you try to plug 1s or 0s into them—like what happens if you try to do a Bayesian update on an observation to which you assigned probability 0.

So I propose that it makes sense to say that 1 and 0 are not in the probabilities; just as negative and positive infinity, which do not obey the field axioms, are not in the real numbers.

The main reason this would upset probability theorists is that we would need to rederive theorems previously obtained by assuming that we can marginalize over a joint probability by adding up all the pieces and having them sum to 1.

However, in the real world, when you roll a die, it doesn't literally have infinite certainty of coming up some number between 1 and 6.  The die might land on its edge; or get struck by a meteor; or the Dark Lords of the Matrix might reach in and write "37" on one side.

If you made a magical symbol to stand for "all possibilities I haven't considered", then you could marginalize over the events including this magical symbol, and arrive at a magical symbol "T" that stands for infinite certainty.

But I would rather ask whether there's some way to derive a theorem without using magic symbols with special behaviors.  That would be more elegant.  Just as there are mathematicians who refuse to believe in double negation or infinite sets, I would like to be a probability theorist who doesn't believe in absolute certainty.

PS:  Here's Peter de Blanc's "mathematical certainty" anecdote.  (I told him not to do it again.)

 

Part of the Overly Convenient Excuses subsequence of How To Actually Change Your Mind

Next post: "Feeling Rational" (start of next subsequence)

Previous post: "Infinite Certainty"

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 7:05 AM
Select new highlight date
All comments loaded

If there's a map you're trying to use all over the place (and you do seem to), then I claim it makes no sense to put a little region on the map labelled "maybe this map doesn't make any sense at all". If the map seems to make sense and you're still following it for everything, you'll have to ignore that region anyway.

Janos, are you saying that it is in fact impossible that your map in fact doesn't make any sense? Because I do, indeed, have a little section of my map labelled "maybe this map doesn't make any sense at all", and every now and then, I think about it a little, because there are so many fundamental premises of which I am unsure even in their definitions. (E.g: "the universe exists", and "but why?") Just because this area of my map drops out of my everyday decision theory due to failure to generate coherent advice on preferences, does not mean it is absent from my map. "You must ignore" or rather "You should usually ignore" is decision theory, and probability theory should usually be firewalled off from preferences.

Computable numbers are the largest countable class I know of.

Either all countable sets are the same size anyway, or you can generate a larger set by saying "all computable reals plus the halting probability". How about computable with various oracles?

What's false is the tempting statement that probability 0 events are impossible. It's only the converse that's true: impossible events have probability 0.

If you cannot repose probability 1 in the statement "all events to which I assign probability 0 are impossible" you should apply a correction and stop reposing probability 0 to those events. Do you mean to say that all impossible events have probability 0, plus some more possible events also have probability 0? This makes no sense, especially as a justification for using "probability 0" in a meaningfully calibrated sense.

To use "probability 0" without a finite expectation of being infinitely surprised, you must repose probability 1 in the belief that you use "probability 0" only for actually impossible events; but not necessarily believe that you assign probability 0 to every impossible event (satisfying both conditions implies logical omniscience).

I should mention that I'm also an infinite set atheist.

It's difficult to assign probability to incoherent statements, because since we can't mean anything by them, we can't assert a referent to the statement -- in that sense, the probability is indeterminate (additionally, one could easily imagine a language in which a statement such as "the green is either" has a perfectly coherent meaning -- and we can't say that's not what we meant, since we didn't mean anything). Recall also that each probability zero statement implies a probability one statement by its denial and vice versa, so one is equally capable of imagining them, if in a contrived way.

I agree with cumulant. The mathematical subject of probability is based on measure theory, which loses a ton of convergence theorems if we exclude 0 and 1. We can agree that things that are not known a priori can't have probability 0 or 1, but I think we must also agree that "an impossible thing will happen soon" has probability 0, because it's a contradiction. An alternate universe in which the number 7 (in the same kind of number system as ours, etc.) is prime is damn-near inconceivable, but an alternate universe in which impossible things are possible is purely absurd.

If our mathematical reasoning is coherent enough for it to be meaningful to make probability assignments then certainly we are not so fundamentally flawed that what we consider tautologies could be false. If you are willing to accept that maybe 0 is 1, then you can't do any of your probability adjustments, or use Bayes' Theorem, or anything of the sort without having a (possibly unstated) caveat that probability theory might be complete nonsense. But what's the probability that probability theory is nonsense (i.e. false or inconsistent)? What does that even mean? We can only assign a probability if that makes sense, so conditioned on the sentence making sense, probability theory must be nonsense with probability 0, no? So averaged over all possible universes (those where probability theory makes sense, and those where it doesn't) the sentence "probability makes sense with probability 1" better approximates the truth value of probability making sense than "probability makes sense with probability p" for p0. If it's not, it's still not worse, but what the hell are we even saying?

i didn't get it.

Easy: it's a demonstration of how you can never be certain that you haven't made an error even on the things you're really sure about.

It's a cheap, dirty demonstration, but one nevertheless.

I'm kinda surprised that it's only been mentioned once in the comments (I only just discovered this site, really really great, by the way) and one from 2010 at that, but it seems to me that "a magical symbol to stand for "all possibilities I haven't considered" " does exist: the symbol "~" (i.e. not). Even the commenter who does mention it makes things complicated for himself: P(Q or ~Q)=1 is the simplest example of a proposition with probability 1.

The proposition is of course a tautology. I do think (but I'm not sure) that that is the only sort of statement that receives probability 1. This is in sync with Eliezer's "amount of evidence" interpretation. A bayesian update can only generate 1 if the initial proposition was of probability 1 or if the evidence was tautological (i.e. if Q then Q or, slightly less lame, if "Q or R" and "~R" then Q, where "Q or R" and "~R" are the evidence).

Skimming the comments, I saw two other proposals for "sure bets", the runner who clocked a negative time and the golf ball landing in a particular spot. That last one degenerated pretty quickly into a discussion about how many points there are in a field and on a ball. I think that's typical of such arguments: it depends on your model. Once you have your model specified the probability becomes 1 (or not) if the statement is (or isn't) tautological in the model. If the model isn't specified, then neither is the statement (what is a precise point?) and hence the probability. Ask the next man what the probability is of a runner clocking a negative time and he'll rightly respond: "Huh?" (unless he is a particularly obfuscatory know-it-all, in which case he might start blabbering about the speed of light. But then too, he makes a claim because he can ascribe meaning to the question, that is, he picks his model). So these are also tautological examples.

I think Eliezer's hold up pretty well for proposition that aren't tautological and hence empirical in nature: they require evidence and only tautological evidence will suffice for certainty.

About the problem of inserting 0's in certain standard theorems: I don't see a problem with Bayes' theorem (I'm curious about other examples). Dividing by 0 is not defined, so the probability of it raining when hell freezes over is not defined. That seems like a satisfactory arrangement.

My intution as a mathematician declares that nobody will never develop an elegant mathematical formulation of probability theory that does not allow for statements that are logically impossible or certain, such as statements of the form p AND NOT p. And it is necessary, if the theory is to be isomorphic to the usual one, that these statements have probability 0 (if impossible) or 1 (if certain). However, I believe that it is quite reasonable to declare, as a condition demanded of any prior deemed rational, that only truly impossible or certain statements have those probabilities. I think that this gives you what you want.

It's obvious that you can make this very demand when working with discrete probability distributions. It may not be obvious that you can make this demand when working with continuous probability distributions. Certainly the usual theory of these, based on so-called ‘measure spaces’ and ‘σ-algebras’ (I mention those in case they jog the reader's memory), cannot tolerate this requirement, at least not if anything at all similar to the usual examples of continuous distributions are allowed.

One answer is that only discrete probability distributions apply to the real world, in which one can never make measurements with infinite precision or observe an infinite sequence of events. Even if the world has infinite size or is continuous to infinitesimal scales, you will never observe that, so you don't need to predict anything about that.

However, even if you don't buy this argument, never fear! There is a mathematical theory of probability based on ‘pointless measure spaces’ and ‘abstract σ-algebras’. In this theory, it again makes perfect sense to demand that any prior must assign probability 0 or 1 only to impossible or certain events. The idea is that if something can never be observed, even in principle, then it is effectively impossible, and the abstract pointless theory allows one to treat it as such.

Then I agree that one should require, as a condition on considering a prior to be rational, that it should assign probability 0 only to these impossible events and assign probability 1 only to their certain complements.

"On some golf courses, the fairway is readily accessible, and the sand traps are not. The green is either."

I think Eliezer's "infinite set atheism" is a belief that infinite sets, although well-defined mathematically, do not exist in the "real world"; in other words, that any physical phenomenon that actually occurs can be described using a finite number of bits. (This can include numbers with infinite decimal expansions, as long as they can be generated by a finitely long computer program. Therefore, using pi in equations is not prohibited, because you're using the symbol "pi" to represent the program, which is finite.)

A consequence of "infinite set atheism" seems to be that the universe is a finite state machine (although one that is not necessarily deterministic). Am I understanding this properly?

One answer would be that an incoherent proposition is not a proposition, and so doesn't have any probability (not even zero, if zero is a probability.)

Another answer would be that there is some probability that you are wrong that the proposition is incoherent (you might be forgetting your knowledge of English), and therefore also some probability that "the green is either" is both coherent and true.

For any state of information X, we have P(A or not A | X) = 1 and P(A and not A | X) = 0. We have to have 0 and 1 as probabilities for probability theory even to work. I think you're taking a reasonable idea -- that P(A | X) should be neither 0 nor 1 when A is a statement about the concrete physical world -- and trying to apply it beyond its applicable domain.

What is the probability of a golf ball landing at any exact point? Zero.

Wrong.

I don't know which is more painful: Eliezer's errors, or those of his detractors.

Probabilities of 0 and 1 are perhaps more like the perfectly massless, perfectly inelastic rods we learn about in high school physics - they are useful as part of an idealized model which is often sufficient to accurately predict real-world events, but we know that they are idealizations that will never be seen in real life.

However, I think we can assign the primeness of 7 a value of "so close to 1 that there's no point in worrying about it".

"When you work in log odds, the distance between any two degrees of uncertainty equals the amount of evidence you would need to go from one to the other. That is, the log odds gives us a natural measure of spacing among degrees of confidence."

That observation is so useful and intuition friendly it probably deserves it's own blog post, and a prominent place in your book.

The ("Bayesian") framework explored in these essays replaces the two Cartesian options, affirmation and denial, by a continuum of judgmental probabilities in the interval from 0 to 1, endpoints included, or -- what comes to the same thing -- a continuum of judgmental odds in the interval from 0 to infinity, endpoints included. Zero and 1 are probabilities no less than 1/2 and 99/100 are. Probability 1 corresponds to infinite odds, 1:0. That's a reason for thinking in terms of odds: to remember how momentous it may be to assign probability 1 to a hypothesis."

Richard Jeffrey, "Probability and the art of judgement".

I leave it as an exercise to correctly state the relationships between Eliezer's article, the Jeffrey quote, and the value of P(A|A).

(Note: Jeffrey is not to be confused with Jeffreys, although both were Bayesian probability theorists.)

Digging up an old thread here, but an interesting point I want to bring up: a friend of mine claims that he internally assigns probability 1 (i.e. an undisprovable belief) only to one statement: that the universe is coherent. Because if not, then mnergarblewtf. Is it reasonable to say that even though no statement can actually have probability 1 if you're a true Bayesian, it's reasonable to internally establish an axiom which, if negated, would just make the universe completely stupid and not worth living in any more?

There's a lot of logic to that. For extremely unlikely possibilities you can often get away with setting their probability to 0 to make the calculations a lot simpler. For possibilities where predicted utility is independent of your actions (like "reality is just completely random") it can also be worthwhile setting their probability to 0 (ie. ignoring them), since they're approximately a constant term in expected utility. These are good ways of approximating actual expected utility so you can still mostly make the right decisions, which bounded rationality requires.

Usually, if a die lands on edge we say it was a spoiled throw and do it over. Similarly if a Dark Lord writes 37 on the face that lands on top, we complain that the Dark Lord is spoiling our game and we don't count it.

We count 6 possibilities for a 6-sided die, 5 possibilities for a 5-sided die, 2 possibilities for a 2-sided die, and if you have a die with just one face -- a spherical die -- what's the chance that face will come up?

I think it would be interesting to develop probability theory with no boundaries, with no 0 and 1. It works fine to do it the way it's done now, and the alternative might turn up something interesting too.

Caledonian: Not wrong. Take the field you're swinging at to be a plane. There are infinitely many points in that plane; that's just the density of the reals.

Now say there is some probability density of landing spots; and, let's say no one spot is special in that it attracts golf balls more than points immediately nearby (i.e. our pdf is continuous and non-atomic). Right there, you need every point (as a singleton) to have measure 0.

Go pick up Billingsley: measure 0 is not the same as impossible nor does it cause any problems.

Eliezer:

I'm not sure what an "infinite set atheist" is, but it seems from your post that you use different notions of probability than what I think of as standard modern measure theory, which surprises me. Utilitarian's example of a uniform r.v. on [0, 1] is perfect: it must take some value in [0, 1], but for all x it takes value x with probability 0. Clearly you can't say that for all x it's impossible for the r.v. to take value x, because it must in fact take one of those values. But the probabilities are still 0. Pragmatically the way this comes out is that "probability 0" doesn't imply impossible. If you perform an experiment countably-infinitely many times with the probability of a certain outcome being 0 each time, the probability of ever getting that outcome is 0; in this sense you can say the outcome is almost impossible. However it's possible that each outcome individually is almost impossible, even though of course the experiment will have an outcome.

You can object that such experiments are physically impossible e.g. because you can only actually measure/observe countably many outcomes. That's fine; that just means you can get by with only discrete measures. But such assumptions about the real world are not known a priori; I like usual measure theory better, and it seems to do quite a good job of encompassing what I would want to mean by "probability", certainly including the discrete probability spaces in which "probability 0" can safely be interpreted to mean "impossible".

You're right, it's not that hard to come up with larger countable classes of reals than the computables; I just meant that all of the usual, "rolls-off-the-tip-of-your-tongue" classes seem to be subsets of the computables. But maybe Nick is right, and the definables are broader. I haven't studied this either.

And yes, I also sometimes think about how assumptions I make about life and the perceptible universe could be wrong, but I do not do this much for mathematics that I've studied deeply enough, because I'm almost as convinced of its "truth" as I am of my own ability to reason, and I don't see the use in reasoning about what to do if I can't reason. This is doubly true if the statements I'm contemplating are nonsense unless the math works.

If the map seems to make sense and you're still following it for everything, you'll have to ignore that region anyway.

Just cos it's not a very nice place to visit, doesn't mean it ain't on the map. ;)

A real mathematician got in a debate with EY over this post, and made some really good points: https://np.reddit.com/r/badmathematics/comments/2bazyc/0_and_1_are_not_probabilities_any_more_than/cj43y8k

Maybe this doesn't stand up mathematically, but I really like the intuition of log odds instead of probability. And this post explained it quite well. And the main point that you shouldn't believe in absolute certainties is still true. An ideal AI using probability theory would probably use log odds, and not have a 0 or 1.

O = (P / (1 - P))

probabilities and odds are isomorphic

This is undefined for P = 1. If you claim that that function is a real-valued bijection between probabilities and odds then P = 1 doesn't work so you're begging the question. Always take care to not divide by zero.

Whether or not real-world events can have a probability of 0 or 1 is a different question than "are 0 and 1 probabilities?". They most certainly are.

Interesting Log-Odds paper by Brian Lee and Jacob Sanders, November 2011.

P(A&B)+P(A&~B)+P(~A&B)+P(~A&~B)=1

Isn't the "1" above a probability?

Disallowing a symbol for "all events" breaks the definition of a probability space. It's probably easier to allow extended reals and break some field axioms than figure out do rigorous probability without a sigma-algebra.

Haha, very nice CGD. Shows how much those philosophers of language know about golf. :-)

Although... hmm... interesting. I think that gives us a way to think about another probability 1 statement: statements that occupy the entire logical space. Example: "either there are probability 1 statements, or there are not probability 1 statements." That statement seems to be true with probability 1...

Caledonian, every undergraduate who has ever taken a statistics class knows that the probability of any single point in a continuous distribution is zero. Probabilities in continuous space are measured on intervals. Basic calculus...