(Subjective Bayesianism vs. Frequentism) VS. Formalism

One of the core aims of the philosophy of probability is to explain the relationship between frequency and probability. The frequentist proposes identity as the relationship. This use of identity is highly dubious. We know how to check for identity between numbers, or even how to check for the weaker copula relation between particular objects; but how would we test the identity of frequency and probability? It is not immediately obvious that there is some simple value out there which is modeled by probability, like position and mass are values that are modeled by Newton's Principia. You can actually check if density * volume = mass, by taking separate measurements of mass, density and volume, but what would you measure to check a frequency against a probability?

There are certain appeals to frequentest philosophy: we would like to say that if a bag has 100 balls in it, only 1 of which is white, then the probability of drawing the white ball is 1/100, and that if we take a non-white ball out, the probability of drawing the white ball is now 1/99. Frequentism would make the philosophical justification of that inference trivial. But of course, anything a frequentist can do, a Bayesian can do (better). I mean that literally: it's the stronger magic.

A Subjective Bayesian, more or less, says that the reason frequencies are related to probabilities is because when you learn a frequency you thereby learn a fact about the world, and one must update one's degrees of belief on every available fact. The subjective Bayesian actually uses the copula in another strange way:

Probability is subjective degree of belief.

and subjective Bayesians also claim:

Probabilities are not in the world, they are in your mind.

These two statements are brilliantly championed in Probability is Subjectively Objective. But ultimately, the formalism which I would like to suggest denies both of these statements. Formalists do not ontologically commit themselves to probabilities, just as they do not say that numbers exist; hence we don't allocate probabilities in the mind or anywhere else; we only commit ourselves to number theory, and probability theory. Mathematical theories are simply repeatable processes which construct certain sequences of squiggles called "theorems", by changing the squiggles of other theorems, according to certain rules called "inferences". Inferences always take as input certain sequences of squiggles called premises, and output a sequence of squiggles called the conclusion. The only thing an inference ever does is add squiggles to a theorem, take away squiggles from a theorem, or both. It turns out that these squiggle sequences mixed with inferences can talk about almost anything, certainly any computable thing. The formalist does not need to ontologically commit to numbers to assert that "There is a prime greater than 10000.", even though "There is x such that" is a flat assertion of existence; because for the formalist "There is a prime greater than 10000." simply means that number theory contains a theorem which is interpreted as "there is a prime greater than 10000." When you say a mathematical fact in English, you are interpreting a theorem from a formal theory. If under your suggested interpretation, all of the theorems of the theory are true, then whatever system/mechanism your interpretation of the theory talks about, is said to be modeled by the theory.

So, what is the relation between frequency and probability proposed by formalism? Theorems of probability, may be interpreted as true statements about frequencies, when you assign certain squiggles certain words and claim the resulting natural language sentence. Or for short we can say: "Probability theory models frequency." It is trivial to show that Komolgorov models frequency, since it also models fractions; it is an algebra after all. More interestingly, probability theory models rational distributions of subjective degree of believe, and the optimal updating of degree of believe given new information. This is somewhat harder to show; dutch-book arguments do nicely to at least provide some intuitive understanding of the relation between degree of belief, betting, and probability, but there is still work to be done here. If Bayesian probability theory really does model rational belief, which many believe it does, then that is likely the most interesting thing we are ever going to be able to model with probability. But probability theory also models spatial measurement? Why not add the position that probability is volume to the debating lines of the philosophy of probability?

Why are frequentism's and subjective Bayesianism's misuses of the copula not as obvious as volumeism's? This is because what the Bayesian and frequentest are really arguing about is statistical methodology, they've just disguised the argument as an argument about what probability is. Your interpretation of probability theory will determine how you model uncertainty, and hence determine your statistical methodology. Volumeism cannot handle uncertainty in any obvious way; however, the Bayesian and frequentest interpretations of probability theory, imply two radically different ways of handling uncertainty.

The easiest way to understand the philosophical dispute between the frequentist and the subjective Bayesian is to look at the classic biased coin:

A subjective Bayesian and a frequentist are at a bar, and the bartender (being rather bored) tells the two that he has a biased coin, and asks them "what is the probability that the coin will come up heads on the first flip?" The frequentist says that for the coin to be biased means for it not have a 50% chance of coming up heads, so all we know is that it has a probability that is not equal 50%. The Bayesain says that that any evidence I have for it coming up heads, is also evidence for it coming up tails, since I know nothing about one outcome, that doesn't hold for its negation, and the only value which represents that symmetry is 50%.

I ask you. What is the difference between these two, and the poor souls engaged in endless debate over realism about sound in the beginning of Making Beliefs Pay Rent?

If a tree falls in a forest and no one hears it, does it make a sound? One says, "Yes it does, for it makes vibrations in the air." Another says, "No it does not, for there is no auditory processing in any brain."

One is being asked: "Are there pressure waves in the air if we aren't around?" the other is being asked: "Are there auditory experiences if we are not around?" The problem is that "sound" is being used to stand for both "auditory experience" and "pressure waves through air". They are both giving the right answers to these respective questions. But they are failing to Replace the Symbol with the Substance and they're using one word with two different meanings in different places. In the exact same way, "probability" is being used to stand for both "frequency of occurrence" and "rational degree of belief" in the dispute between the Bayesian and the frequentist. The correct answer to the question: "If the coin is flipped an infinite amount of times, how frequently would we expect to see a coin that landed on heads?" is "All we know, is that it wouldn't be 50%." because that is what it means for the coin to be biased. The correct answer to the question: "What is the optimal degree of belief that we should assign to the first trial being heads?" is "Precisely 50%.", because of the symmetrical evidential support the results get from our background information. How we should actually model the situation as statisticians depends on our goal. But remember that Bayesianism is the stronger magic, and the only contender for perfection in the competition.

For us formalists, probabilities are not anywhere. We do not even believe in probability technically, we only believe in probability theory. The only coherent uses of "probability" in natural language are purely syncategorematic. We should be very careful when we colloquially use "probability" as a noun or verb, and be very careful and clear about what we mean by this word play. Probability theory models many things, including degree of belief, and frequency. Whatever we may learn about rationality, frequency, measure, or any of the other mechanisms that probability models, through the interpretation of probability theorems, we learn because probability theory is isomorphic to those mechanisms. When you use the copula like the frequentist or the subjective Bayesian, it makes it hard to notice that probability theory modeling both frequency and degree of belief, is not a contradiction. If we use "is" instead of "model", it is clear that frequency is not degree of belief, so if probability is belief, then it is not frequency.  Though frequency is not degree of belief, frequency does model degree of belief, so if probability models frequency, it must also model degree of belief.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 12:31 AM
Select new highlight date
All comments loaded

In other words, the OP has mixed up the quotation and the referent (or the representation and the referent).

It seems to me that I am the one proposing a sharp distinction between probability theory (the representation), and rational degree of belief (the referent). If you say that probability is degree of belief, you destroy all the distinction between the model and the modeled. If by "probability" you mean subjective degree of belief, I don't really care what you call it. But know that "probability" has been used in ways which are not consistent with that synonymy claim. By the fact that we do not have 100% belief that bayes does model ideal inference with uncertainty, this means that bayesian probability is not identical to subjective belief given out knowledge. If X is identical to Y, then X is isomorphic-to/models Y. Because we can still conceive of bayes not perfectly modeling rationality, without implying a contradiction, this means that our current state of knowledge does not include that bayes is identical to subjective degree of belief.

We learn that something is probability by looking at probability theory, not by looking at subjective belief. If rational subjective belief turned out to not be modeled by probability theory, then we would say that subjective degree of belief was not like probability, not that probability theory does not define probability.

The first person to make bayes, may have been thinking about rationality when he/she first created the system, or he/she may have been thinking about spatial measurements, or he/she may have been thinking about finite frequencies, and he/she would have made the same formal system in every case. Their interpretations would have been different, but they would all be the one identical probability theory. Which one the actual creator was thinking of, is irrelevant. What spaces, beliefs, finite frequencies all have in common is that they are modeled by probability theory. To use "probability" to refer to one of these, over another, is a completely arbitrary choice (mind you I said finite frequency).

If we loose nothing by using "models" instead of "is", why would we ever use "is"? "Is' is a much stronger claim than "models". And frankly, I know how to check whether or not a given argument is an animal, for instance; how do I check if a given argument is a probability? I see if it satisifies the probability axioms. Finite frequency, measure, and rational degree of belief all seem to follow the probability axioms and inferences under specific, though similar, interpretations of probability theory.

Why can't a frequentist say: "Bayesians are conflating probability with subjective degree of belief." ? They were here first after all.

Probability does model frequency, and it does model subjective degree of believe, and this is not a contradiction. Using the copula is the problem, obviously: if subjective degree of believe is not frequency, and probability is frequency, then probability is not subjective degree of belief. Analogously, if subjective degree of believe is not frequency, and probability is subjective degree of belief, then probability is not frequency.

The problem is that they all conflate "probability" with "subjective degree of belief" and "frequency", the bayesian conflates subjective degree of belief and probability. The frequentist conflates probability and frequency.

The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.

The debate over whether to use Bayesian methods or frequentest methods is of import. I think potato was trying to say this here:

How we should actually model the situation as a probability distribution depends on our goal. But remember that Bayesianism is the stronger magic.

But the question of whether probability is frequency, or if probability is subjective degree of belief, is just as silly as a dispute over whether numbers are quantity, or if they are orders. The answer is that numbers model both, and are neither.

The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.

I think you have this backwards. Frequentist techniques typically come with adversarial guarantees (i.e., "as long as the underlying distribution has bounded variance, this method will work"), whereas Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).

If you have a reasonable Bayesian generative model, then using it will probably give you better results with less data. But if you really can't even build the model (i.e. specify a prior that you trust) then frequentist techniques might actually be appropriate. Note that the distinction I'm drawing is between Bayesian and frequentist techniques, as opposed to Bayesian and frequentist interpretations of probability. In the former case, there are actual reasons to use both. In the latter case, I agree with you that the Bayesian interpretation is obviously correct.

Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).

Bayesian methods with uninformative (possibly improper) priors agree with frequentist methods whenever the latter make sense.

Thanks for the clarification.

What I was trying to emphasise is that, pace "potato", the frequentist/Bayesian dispute isn't just an argument about words but actually has ramifications for how one is likely to approach statistical inference - so it shouldn't be compared to the definitional dispute "If a tree falls in a forest and no one hears it, does it make a sound?"

If someone treated frequentist approaches as though they were equivalent to Bayesian methods in general, then he would occasionally be drastically in error. PT:TLoS offers many examples of this (for example the comparison of a Bayesian "psi test" and the chi-squared test on page 300). My comment about the Gaussian distribution had in mind Jaynes's discussion of "pre-data and post-data considerations" starting on page 499, in which he discusses the fact that orthodox practice answers the wrong question: it gives correct answers to "if the hypothesis being tested is in fact true, what is the probability that we shall get data indicating that it is true?" when the real problems of scientific inference are concerned with the question "what is the probability conditional on the data that the hypothesis is true?", and this problem is the result of frequentist philosophy's failure to admit the existence of prior and posterior probabilities for a fixed parameter or an hypothesis. He suggests that this conflation goes somewhat unnoticed because in the case of the commonly encountered Gaussian sampling distribution the difference is relatively unimportant, but compares another case (Cauchy sampling distributions) in which the Bayesian analysis is far superior.

On the other hand the interlocutors in the standard definitional dispute have no substantive disagreement, i.e. they actually anticipate the same things, so their disagreement amounts to nothing apart from the fact that they waste their time arguing about words.

I'll defer to your opinion (which is probably much better informed than mine) on whether frequentist methods work well when their limitations are borne in mind.

User:potato smartly linked to Wei Dai's post "Frequentist Magic vs. Bayesian Magic", which along with its commentary would perhaps be the next thing to read after this post. (Wei Dai and user:potato think that (algorithmic) Bayesian magic is stronger but I agree with Toby Ord's and Vladimir Slepnev's points and would caution against hasty conclusions.)

More interestingly, probability theory models rational distributions of subjective degree of believe, and the optimal updating of degree of believe given new information. This is somewhat harder to show; dutch-book arguments do nicely to at least provide some intuitive understanding of the relation between degree of belief, betting, and probability, but there is still work to be done here.

In a sense there has been highly important and insightful work done in the vein of the Dutch book arguments, looking into e.g. how decisions should be made under what is called "indexical" or "anthropic" uncertainty, a situation where Bayesian reasoning seems like it ought to work but doesn't as such. This work has been done in large part by Less Wrong-cognizant folk. One easy-to-understand summary of some of the motivations for such research can be found found in User:ata's analysis of the classic "sleeping beauty" problem which is similar to User:potato's disambiguating approach used in the above post (and note Vladimir Nesov's similar warning against equivocation on the word "probability"). If you're interested in the foundations of justification (whether epistemic or moral, or what's the difference?), then you'd be wise to look into the big open questions in decision theory. There are a lot of 'em.

Based on reading the comment threads here, it seems as though some folks are missing something important in this post. So, I'll try to restate it differently in case that helps. Below, double-quotes are used to refer to the word, while single-quotes are used merely to highlight and separate tokens.

There is a mathematical formalism, which we can call "probability", which does a good job of modeling various sorts of things, like 'subjective degrees of belief', 'frequencies', 'volume', and 'area'.

'Bayesians' think that "probability" should refer to 'subjective degrees of belief'. 'Frequentists' think that "probability" should refer to 'frequencies'. One alleged problem with 'Frequentists' is that they seem to carelessly slip between 'subjective degrees of belief' and 'frequencies', due to the natural-language meaning of "probability".

One suggestion proposed in the comment threads was to simply use "probability" to refer to 'subjective degrees of belief' as the 'Bayesians' would have us do, since we already have the word "frequencies" to refer to 'frequencies'. I intuit that the OP's objection to this suggestion is that it leaves us with no word for the mathematical formalism called "probability" above, which can apply just as well to 'frequencies', 'subjective degrees of belief', and many other things.

I intuit that the OP's objection to this suggestion is that it leaves us with no word for the mathematical formalism called "probability" above, which can apply just as well to 'frequencies', 'subjective degrees of belief', and many other things.

"Probability theory"

Upvoted for resolving a mysterious question I resolved a few days ago myself.