Like The Cognitive Science of Rationality, this is a post for beginners. Send the link to your friends!

Science is broken. We know why, and we know how to fix it. What we lack is the will to change things.

 

In 2005, several analyses suggested that most published results in medicine are false. A 2008 review showed that perhaps 80% of academic journal articles mistake "statistical significance" for "significance" in the colloquial meaning of the word, an elementary error every introductory statistics textbook warns against. This year, a detailed investigation showed that half of published neuroscience papers contain one particular simple statistical mistake.

Also this year, a respected senior psychologist published in a leading journal a study claiming to show evidence of precognition. The editors explained that the paper was accepted because it was written clearly and followed the usual standards for experimental design and statistical methods.

Science writer Jonah Lehrer asks: "Is there something wrong with the scientific method?"

Yes, there is.

This shouldn't be a surprise. What we currently call "science" isn't the best method for uncovering nature's secrets; it's just the first set of methods we've collected that wasn't totally useless like personal anecdote and authority generally are.

As time passes we learn new things about how to do science better. The Ancient Greeks practiced some science, but few scientists tested hypotheses against mathematical models before Ibn al-Haytham's 11th-century Book of Optics (which also contained hints of Occam's razor and positivism). Around the same time, Al-Biruni emphasized the importance of repeated trials for reducing the effect of accidents and errors. Galileo brought mathematics to greater prominence in scientific method, Bacon described eliminative induction, Newton demonstrated the power of consilience (unification), Peirce clarified the roles of deduction, induction, and abduction, and Popper emphasized the importance of falsification. We've also discovered the usefulness of peer review, control groups, blind and double-blind studies, plus a variety of statistical methods, and added these to "the" scientific method.

In many ways, the best science done today is better than ever — but it still has problems, and most science is done poorly. The good news is that we know what these problems are and we know multiple ways to fix them. What we lack is the will to change things.

This post won't list all the problems with science, nor will it list all the promising solutions for any of these problems. (Here's one I left out.) Below, I only describe a few of the basics.

 

Problem 1: Publication bias

When the study claiming to show evidence of precognition was published, psychologist Richard Wiseman set up a registry for advance announcement of new attempts to replicate the study.

Carl Shulman explains:

A replication registry guards against publication bias, and at least 5 attempts were registered. As far as I can tell, all of the subsequent replications have, unsurprisingly, failed to replicate Bem's results. However, JPSP and the other high-end psychology journals refused to publish the results, citing standing policies of not publishing straight replications.

From the journals' point of view, this (common) policy makes sense: bold new claims will tend to be cited more and raise journal prestige (which depends on citations per article), even though this means most of the 'discoveries' they publish will be false despite their low p-values (high statistical significance). However, this means that overall the journals are giving career incentives for scientists to massage and mine their data for bogus results, but not to challenge bogus results presented by others.

This is an example of publication bias:

Publication bias is the term for what occurs whenever the research that appears in the published literature is systematically unrepresentative of the population of completed studies. Simply put, when the research that is readily available differs in its results from the results of all the research that has been done in an area, readers and reviewers of that research are in danger of drawing the wrong conclusion about what that body of research shows. In some cases this can have dramatic consequences, as when an ineffective or dangerous treatment is falsely viewed as safe and effective. [Rothstein et al. 2005]

Sometimes, publication bias can be more deliberate. The anti-inflammatory drug Rofecoxib (Vioxx) is a famous case. The drug was prescribed to 80 million people, but in it was later revealed that its maker, Merck, had withheld evidence of the drug's risks. Merck was forced to recall the drug, but it had already resulted in 88,000-144,000 cases of serious heart disease.

 

Example partial solution

One way to combat publication bias is for journals to only accept experiments that were registered in a public database before they began. This allows scientists to see which experiments were conducted but never reported (perhaps due to negative results). Several prominent medical journals (e.g. The Lancet and JAMA) now operate this way, but this protocol is not as widespread as it could be.

 

Problem 2: Experimenter bias

Scientists are humans. Humans are affected by cognitive heuristics and biases (or, really, humans just are cognitive heuristics and biases), and they respond to incentives that may not align with an optimal pursuit of truth. Thus, we should expect experimenter bias in the practice of science.

There are many stages in research during which experimenter bias can occur:

  1. in reading-up on the field,
  2. in specifying and selecting the study sample,
  3. in [performing the experiment],
  4. in measuring exposures and outcomes,
  5. in analyzing the data,
  6. in interpreting the analysis, and
  7. in publishing the results. [Sackett 1979]

Common biases have been covered elsewhere on Less Wrong, so I'll let those articles explain how biases work.

 

Example partial solution

There is some evidence that the skills of rationality (e.g. cognitive override) are teachable. Training scientists to notice and meliorate biases that arise in their thinking may help them to reduce the magnitude and frequency of the thinking errors that may derail truth-seeking attempts during each stage of the scientific process.

 

Problem 3: Bad statistics

I remember when my statistics professor first taught me the reasoning behind "null hypothesis significance testing" (NHST), the standard technique for evaluating experimental results. NHST uses "p-values," which are statements about the probability of getting some data (e.g. one's experimental results) given the hypothesis being tested. I asked my professor, "But don't we want to know the probability of the hypothesis we're testing given the data, not the other way around?" The reply was something about how this was the best we could do. (But that's false, as we'll see in a moment.)

Another problem is that NHST computes the probability of getting data as unusual as the data one collected by considering what might be expected if that particular experiment was repeated many, many times. But how do we know anything about these imaginary repetitions? If I want to know something about a particular earthquake, am I supposed to imagine a few dozen repetitions of that earthquake? What does that even mean?

I tried to answer these questions on my own, but all my textbooks assumed the soundness of the mistaken NHST framework for scientific practice. It's too bad I didn't have a class with biostatistican Steven Goodman, who says:

The p-value is almost nothing sensible you can think of. I tell students to give up trying.

The sad part is that the logical errors of NHST are old news, and have been known ever since Ronald Fisher began advocating NHST in the 1920s. By 1960, Fisher had out-advocated his critics, and philosopher William Rozeboom remarked:

Despite the awesome pre-eminence [NHST] has attained... it is based upon a fundamental misunderstanding of the nature of rational inference, and is seldom if ever appropriate to the aims of scientific research.

There are many more problems with NHST and with "frequentist" statistics in general, but the central one is this: NHST does not follow from the axioms (foundational logical rules) of probability theory. It is a grab-bag of techniques that, depending on how those techniques are applied, can lead to different results when analyzing the same data — something that should horrify every mathematician.

The inferential method that solves the problems with frequentism — and, more importantly, follows deductively from the axioms of probability theory — is Bayesian inference.

So why aren't all scientists using Bayesian inference instead of frequentist inference? Partly, we can blame the vigor of NHST's early advocates. But we can also attribute NHST's success to the simple fact that Bayesian calculations can be more difficult than frequentist calculations. Luckily, new software tools like WinBUGS let computers do most of the heavy lifting required for Bayesian inference.

There's also the problem of sheer momentum. Once a practice is enshrined, it's hard to dislodge it, even for good reasons. I took three statistics courses in university and none of my textbooks mentioned Bayesian inference. I didn't learn about it until I dropped out of university and studied science and probability theory on my own.

Remember the study about precognition? Not surprisingly, it was done using NHST. A later Bayesian analysis of the data disconfirmed the original startling conclusion.

 

Example partial solution

This one is obvious: teach students probability theory instead of NHST. Retrain current scientists in Bayesian methods. Make Bayesian software tools easier to use and more widespread.

 

Conclusion

If I'm right that there is unambiguous low-hanging fruit for improving scientific practice, this suggests that particular departments, universities, or private research institutions can (probabilistically) out-perform their rivals (in terms of actual discoveries, not just publications) given similar resources.

I'll conclude with one particular specific hypothesis. If I'm right, then a research group should be able to hire researchers trained in Bayesian reasoning and in catching publication bias and experimenter bias, and have them extract from the existing literature valuable medical truths that the mainstream medical community doesn't yet know about. This prediction, in fact, is about to be tested.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 7:59 PM
Select new highlight date
All comments loaded

I only had time to double-check one of the scary links at the top, and I wasn't too impressed with what I found:

In 2010, a careful review showed that published industry-sponsored trials are four times more likely to show positive results than published independent studies, even though the industry-sponsored trials tend to use better experimental designs.

But the careful review you link to claims that studies funded by the industry report 85% positive results, compared to 72% positive by independent organizations and 50% positive by government - which is not what I think of when I hear four times! They also give a lot of reasons to think the difference may be benign: industry tends to do different kinds of studies than independent orgs. The industry studies are mainly Phase III/IV - a part of the approval process where drugs that have already been shown to work in smaller studies are tested on a larger population; the nonprofit and government studies are more often Phase I/II - the first check to see whether a promising new chemical works at all. It makes sense that studies on a drug which has already been found to probably work are more positive than the first studies on a totally new chemical. And the degree to which pharma studies are more likely to be late-phase is greater than the degree to which pharma companies are more likely to show positive results, and the article doesn't give stats comparing like to like! The same review finds with p < .001 that pharma studies are bigger, which again would make them more likely to find a result where one exists.

The only mention of the "4x more likely" number is buried in the Discussion section and cites a completely different study, Lexchin et al.

Lexchin reports an odds ratio of 4, which I think is what your first study meant when they say "industry studies are four times more likely to be positive". Odds ratios have always been one of my least favorite statistical concepts, and I always feel like I'm misunderstanding them somehow, but I don't think "odds ratio of 4" and "four times more likely" are connotatively similar (someone smarter, please back me up on this?!). For example, the largest study in Lexchin's meta-analysis, Yaphe et al, finds that 87% of industry studies are positive versus 65% of independent studies, for an odds ratio of 3.45x. But when I hear something like "X is four times more likely than Y", I think of Y being 20% likely and X being 80% likely; not 65% vs. 87%.

This means Lexchin's results are very very similar to those of the original study you cite, which provides some confirmation that those are probably the true numbers. Lexchin also provides another hypothesis for what's going on. He says that "the research methods of trials sponsored by drug companies is at least as good as that of non-industry funded research and in many cases better", but that along with publication bias, industry fudges the results by comparing their drug to another drug, and then giving the other drug wrong. For example, if your company makes Drug X, you sponsor a study to prove that it's better than Drug Y, but give patients Drug Y at a dose that's too low to do any good (or so high that it produces side effects). Then they conduct that study absolutely perfectly and get the correct result that their drug is better than another drug at the wrong dosage. This doesn't seem like the sort of thing Bayesian statistics could fix; in fact, it sounds like it means study interpretation would require domain-specific medical knowledge; someone who could say "Wait a second, that's not how we usually give penicillin!" I don't know whether this means industry studies that compare their drug against a placebo are more trustworthy.

So, summary. Industry studies seem to hover around 85% positive, non-industry studies around 65%. Part of this is probably because industry studies are more likely to be on drugs that there's already some evidence that they work, and not due to scientific misconduct at all. More of it is due to publication bias and to getting the right answer to a wrong question like "Does this work better than another drug when the other is given improperly?".

Phrases like "Industry studies are four times more likely to show positive results" are connotatively inaccurate and don't support any of these proposals at all, except maybe the one to reduce publication bias.

This reinforces my prejudice that a lot of the literature on how misleading the literature is, is itself among the best examples of how misleading the literature is.

Thanks for this. I've removed the offending sentence.

Yes, "four times as likely" is not the same as an odds ratio of four. And the problem here is the same as the problem in army1987's LL link that odds ratios get mangled in transmission.

But I like odds ratios. In the limit of small probability, odds ratios are the same as "times as likely." But there's nothing 4x as likely as 50%. Does that mean that 50% is very similar to all larger probabilities? Odds ratios are unchanged (or inverted) by taking complements: 4% to 1% is an odds ratio of about 4; 99% to 96% is also 4 (actually 4.1 in both cases). Complementation is exactly what's going on here. The drug companies get 1.2x-1.3x more positive results than the independent studies. That doesn't sound so big, but everyone is likely to get positive results. If we speak in terms of negative results, the independent studies are 2-3x likely to get negative results as the drug companies. Now it sounds like a big effect.

Odds ratios give a canonical distance between probabilities that doesn't let people cherry-pick between 34% more positives and 3x more negatives. They give us a way to compare any two probabilities that is the obvious one for very small probabilities and is the obvious one of the complement for very large probabilities. The cost of interpolating between the ends is that they are confusing in the middle. In particular, this "3x more negatives" turns into an odds ratio of 4.

Sometimes 50% really is similar to all larger probabilities. Sometimes you have a specific view on things and should use that, rather than the off the shelf odd ratio. But that doesn't seem to be true here.

This reinforces my prejudice that a lot of the literature on how misleading the literature is, is itself among the best examples of how misleading the literature is.

At the least, it allows one to argue that the claim "scientific papers are generally reliable" is self-undermining. The prior probability is also high, given the revolving door of "study of the week" science reporting we all are regularly exposed to.

This reinforces my prejudice that a lot of the literature on how misleading the literature is, is itself among the best examples of how misleading the literature is.

A lot of the literature on cognitive biases is itself among the best examples of how biased people are (though unfortunately not usually in ways that would prove their point, with the obvious exception of confirmation bias).

I am skeptical of the teaching solution section under 2), relative to institutional shifts (favoring confirmatory vs exploratory studies, etc). Section 3 could also bear mention of some of the many ways of abusing Bayesian statistical analyses (e.g. reporting results based on gerrymandered priors, selecting which likelihood ratio to highlight in the abstract and get media attention for, etc). Cosma Shalizi would have a lot to say about it.

I do like the spirit of the post, but it comes across a bit boosterish.

Section 3 could also bear mention of some of the many ways of abusing Bayesian statistical analyses

On this note, I predict that if Bayesian statistical analyses ever displaced NHST as the mainstream standard, they would be misused about as much as NHST.

Currently there's a selection bias: NHST is much more widely taught than Bayesian analyses, so NHST users are much more likely to be lowest common demoninator crank-turners who don't really understand statistics generally. By contrast, if you've managed to find out how to do Bayesian inference, you're probably better at statistics than the average researcher and therefore less likely to screw up whatever analysis you choose to do. If every researcher were taught Bayesian inference this would no longer be true.

Still, I think Bayesian methods are superior enough that the net benefit of that would be positive. (Also, proper Bayesian training would also cover how to construct ignorance priors, and I suspect nefariously chosen priors would be easier to spot than nefarious frequentist mistreatment of data.)

Bayesian methods are better in a number of ways, but ignorant people using a better tool won't necessarily get better results. I don't think the net effect of a mass switch to Bayesian methods would be negative, but I do think it'd be very small unless it involved raising the general statistical competence of scientists.

Even when Bayesian methods get so commonplace that they could be used just by pushing a button in SPSS, researchers will still have many tricks at their disposal to skew their conclusions. Not bothering to publish contrary data, only publishing subgroup analyses that show a desired result, ruling out inconvenient data points as "outliers", wilful misinterpretation of past work, failing to correct for doing multiple statistical tests (and this can be an issue with Bayesian t-tests, like those in the Wagenmakers et al. reanalysis lukeprog linked above), and so on.

As a biologist, I can say that most statistical errors are just that: errors. They are not tricks. If researchers understand the statistics that they are using, a lot of these problems will go away.

A person has to learn a hell of a lot before they can do molecular biology research, and statistics happens to be fairly low on the priority list for most molecular biologists. In many situations we are able to get around the statistical complexities by generating data with very little noise.

We've also discovered the usefulness of peer review

I object, for reasons wonderfully stated by gwern here

Why do we need the process of peer review? Peer review is not robust against even low levels of collusion (http://arxiv.org/abs/1008.4324v1). Scientists who win the Nobel Prize find their other work suddenly being heavily cited (http://www.nature.com/news/2011/110506/full/news.2011.270.ht...), suggesting either that the community either badly failed in recognizing the work's true value or that they are now sucking up & attempting to look better by the halo effect. (A mathematician once told me that often, to boost a paper's acceptance chance, they would add citations to papers by the journal's editors - a practice that will surprise none familiar with Goodhart's law and the use of citations in tenure & grants.) Physicist Michael Nielsen points out (http://michaelnielsen.org/blog/three-myths-about-scientific-...) that peer review is historically rare (just one of Einstein's 300 papers was peer reviewed! the famous Nature did not institute peer review until 1967), has been poorly studied (http://jama.ama-assn.org/cgi/content/abstract/287/21/2784) & not shown to be effective, is nationally biased (http://jama.ama-assn.org/cgi/content/full/295/14/1675), erroneously rejects many historic discoveries (one study lists "34 Nobel Laureates whose awarded work was rejected by peer review" (http://www.canonicalscience.org/publications/canonicalscienc...); Horribin 1990 (http://jama.ama-assn.org/content/263/10/1438.abstract) lists others like the discovery of quarks), and catches only a small fraction (http://jama.ama-assn.org/cgi/content/abstract/280/3/237) of errors. And fraud, like the one we just saw in psychology? Forget about it (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjourna...); "A pooled weighted average of 1.97% (N = 7, 95%CI: 0.86–4.45) of scientists admitted to have fabricated, falsified or modified data or results at least once –a serious form of misconduct by any standard– and up to 33.7% admitted other questionable research practices. In surveys asking about the behaviour of colleagues, admission rates were 14.12% (N = 12, 95% CI: 9.91–19.72) for falsification, and up to 72% for other questionable research practices....When these factors were controlled for, misconduct was reported more frequently by medical/pharmacological researchers than others." No, peer review is not the secret sauce of science. Replication is more like it. (Emphasis not in original)

That was actually just a slightly-edited-for-Hacker-News excerpt from my standing mini-essay explaining why we can't trust science too much; the whole thing currently lives at http://www.gwern.net/DNB%20FAQ#fn51

I asked my professor, "But don't we want to know the probability of the hypothesis we're testing given the data, not the other way around?" The reply was something about how this was the best we could do.

One senses that the author (the one in the student role) neither has understood the relative-frequency theory of probability nor has performed any empirical research using statistics--lending the essay the tone of an arrogant neophyte. The same perhaps for the professor. (Which institution is on report here?) Frequentists reject the very concept of "the probability of the theory given the data." They take probabilities to be objective, so they think it a category error to remark about the probability of a theory: the theory is either true or false, and probability has nothing to do with it.

You can reject relative-frequentism (I do), but you can't successfully understand it in Bayesian terms. As a first approximation, it may be better understood in falsificationist terms. (Falsificationism keeps getting trotted out by Bayesians, but that construct has no place in a Bayesian account. These confusions are embarrassingly amateurish.) The Fischer paradigm is that you want to show that a variable made a real difference--that what you discovered wasn't due to chance. However, there's always the possibility that chance intervened, so the experimenter settles for a low probability that chance alone was responsible for the result. If the probability (the p value) is low enough, you treat it as sufficiently unlikely not to be worth worrying about, and you can reject the hypothesis that the variable made no difference.

If, like I, you think it makes sense to speak of subjective probabilities (whether exclusively or along with objective probabilities), you will usually find an estimate of the probabilities of the hypothesis given the data, as generated by Bayesian analysis, more useful. That doesn't mean it's easy or even possible to do a Bayesian analysis that will be acceptable to other scientists. To get subjective probabilities out, you must put subjective probabilities in. Often the worry is said to be the infamous problem of estimating priors, but in practice the likelihood ratios are more troublesome.

Let's say I'm doing a study of the effect of arrogance on a neophyte's confidence that he knows how to fix science. I develop and norm a test of Arrogance/Narcissism and also an inventory of how strongly held a subject's views are in the philosophy of science and the theory of evidence. I divide the subjects in two groups according to whether they fall above or below the A/N median. I then use Fischerian methods to determine whether there's an above-chance level of unwarranted smugness among the high A/N group. Easy enough, but limited. It doesn't tell me what I most want to know, how much credence should I put in the results. I've shown there's evidence for an effect, but there's always evidence for some effect: the null hypothesis, strictly speaking, is always false. No two entities outside of fundamental physics are exactly the same.

Bayesian analysis promises more, but whereas other scientists will respect my crude frequentist analysis as such—although many will denigrate its real significance—many will reject my Bayesian analysis out of hand due to what must go into it. Let's consider just one of the factors that must enter the Bayesian analysis. I must estimate the probability that that the 'high-Arrogance' subjects will score higher on Smugness if my theory is wrong, that is, if arrogance really has no effect on Smugness. Certainly my Arrogance/Narcissism test doesn't measure the intended construct without impurities. I must estimate the probability that all the impurities combined or any of them confound the results. Maybe high-Arrogant scorers are dumber in addition to being more arrogant, and that is what's responsible for some of the correlation. Somehow, I must come up with a responsible way to estimate the probability of getting my results if Arrogance had nothing to do with Smugness. Perhaps I can make an informed approximation, but it will be unlikely to dovetail with the estimates of other scientists. Soon we'll be arguing about my assumptions--and what we'll be doing will be more like philosophy than empirical science.

The lead essay provides a biased picture of the advantages of Bayesian methods by completely ignoring its problems. A poor diet for budding rationalists.

Frequentists reject the very concept of "the probability of the theory given the data." They take probabilities to be objective, so they think it a category error to remark about the probability of a theory:

Then they should also reject the very concept of "the probability of the data given the theory", since that quantity has "the probability of the theory" explicitly in the denominator.

Then they should also reject the very concept of "the probability of the data given the theory", since that quantity has "the probability of the theory" explicitly in the denominator.

You are reading "the probability of the data D given the theory T" to mean p(D | T), which in turn is short for a ratio p(D & T)/p(T) of probabilities with respect to some universal prior p. But, for the frequentist, there is no universal prior p being invoked.

Rather, each theory comes with its own probability distribution p_T over data, and "the probability of the data D given the theory T" just means p_T(D). The different distributions provided by different theories don't have any relationship with one another. In particular, the different distributions are not the result of conditioning on a common prior. They are incommensurable, so to speak.

The different theories are just more or less correct. There is a "true" probability of the data, which describes the objective propensity of reality to yield those data. The different distributions from the different theories are comparable only in the sense that they each get that true distribution more or less right.

There are many more problems with NHST and with "frequentist" statistics in general, but the central one is this: NHST does not follow from the axioms (foundational logical rules) of probability theory. It is a grab-bag of techniques that, depending on how those techniques are applied, can lead to different results when analyzing the same data — something that should horrify every mathematician.

The inferential method that solves the problems with frequentism — and, more importantly, follows deductively from the axioms of probability theory — is Bayesian inference.

But two Bayesian inferences from the same data can also give different results. How could this be a non-issue for Bayesian inference while being indicative of a central problem for NHST? (If the answer is that Bayesian inference is rigorously deduced from probability theory's axioms but NHST is not, then the fact that NHST can give different results for the same data is not a true objection, and you might want to rephrase.)

By a coincidence of dubious humor, I recently read a paper on exactly this topic, how NHST is completely misunderstood and employed wrongly and what can be improved! I was only reading it for a funny & insightful quote, but Jacob Cohen (as in, 'Cohen's d') in pg 5-6 of "The Earth Is Round (p < 0.05)" tells us that we shouldn't seek to replace NHST with a "magic alternative" because "it doesn't exist". What we should do is focus on understanding the data with graphics and datamining techniques; report confidence limits on effect sizes, which gives us various things I haven't looked up; and finally, place way more emphasis on replication than we currently do.

An admirable program; we don't have to shift all the way to Bayesian reasoning to improve matters. Incidentally, what Bayesian inferences are you talking about? I thought the usual proposals/methods involved principally reporting log odds, to avoid exactly the issue of people having varying priors and updating on trials to get varying posteriors.

I thought the usual proposals/methods involved principally reporting log odds, to avoid exactly the issue of people having varying priors and updating on trials to get varying posteriors.

This only works in extremely simple cases.

Incidentally, what Bayesian inferences are you talking about? I thought the usual proposals/methods involved principally reporting log odds, to avoid exactly the issue of people having varying priors and updating on trials to get varying posteriors.

I didn't have any specific examples in mind. But more generally, posteriors are a function of both priors and likelihoods. So even if one avoids using priors entirely by reporting only likelihoods (or some function of the likelihoods, like the log of the likelihood ratio), the resulting implied inferences can change if one's likelihoods change, which can happen by calculating likelihoods with a different model.

This shouldn't be a surprise. What we currently call "science" isn't the best method for uncovering nature's secrets; it's just the first set of methods we've collected that wasn't totally useless like personal anecdote and authority generally are.

It's ridiculous to call non-scientific methods are "useless". Our civilization is based on such non-scientific methods. Observation, anecdotal evidence, trial and error, markets etc. are all deeply unscientific and extremely useful ways of gaining useful knowledge. Next to these Science is really a fairly minor pursuit.

Our civilization is based on such non-scientific methods.

I'd say that existing folk practices and institutions (what I think you mean by "our civilization") are based on the non-survival of rival practices and institutions. Our civilization has the institutions it has, for the same reason that we have two eyes and not three — not because two eyes are better than three, but because any three-eyed rivals to prototypical two-eyed ancestors happened not to survive.

Folk practices have typically been selected at the speed of generations, with cultures surviving or dying out — the latter sometimes due to war or disease; but sometimes just as the youth choose to convert to a more successful culture. Science aims at improving knowledge at a faster rate than folk practice selection.

But as with the problem of global warming and its known solutions, what we lack is the will to change things.

Please don't insert gratuitous politics into LessWrong posts.

What David_G said. Global warming is a scientific issue. Maybe "what we lack is the will to change things" is the right analysis of the policy problems, but among climate change experts there's a whole lot more consensus about global warming than there is among AI researchers about the Singularity. "You can't say controversial things about global warming, but can say even more controversial things about AI" is a rule that makes about as much sense as "teach the controversy" about evolution.

Global warming is a scientific issue.

It's also a political issue, to a much greater extent than the possibility and nature of a technological singularity.

Evolution is also a political issue. Shall we now refrain from talking about evolution, or mentioning what widespread refusal to accept evolution, up to the point of there being a strong movement to undermine the teaching of evolution in US schools, says about human rationality?

I get that it can be especially hard to think rationally about politics. And I agree with what Eliezer has written about government policy being complex and almost always involving some trade-offs, so that we should be careful about thinking there's an obvious "rationalist view" on policy questions.

However, a ban on discussing issues that happen to be politicized is idiotic, because it puts us at the mercy of contingent facts about what forms of irrationality happen to be prevalent in political discussion at this time. Evolution is a prime example of this. Also, if the singularity became a political issue, would we ban discussion of that from LessWrong?

We should not insert political issues which are not relevant to the topic, because the more political issues one brings to the discussion, the less rational it becomes. It would be most safe to discuss all issues separately, but sometimes is it not possible, e.g. when the topic being discussed relies heavily on evolution.

One part of trying to be rational is to accept that people are not rational, and act accordingly. For every political topic there is a number of people whose minds will turn off if they read something they disagree with. It does not mean we should be quiet on the topic, but we should not insert it where it is not relevant.

Explaining why X is true, in a separate article, is correct approach. Saying or suggesting something like "by the way, people who don't think X is true are wrong" in an unrelated topic, is wrong approach. Why is it so? In the first example you expect your proof of X to be discussed in the comments, because it is the issue. In the second example, discussions about X in comments are off-topic. Asserting X in a place where discussion of X is unwelcome, is a kind of Dark Arts; we should avoid it even if we think X is true.

The topic of evolution, unlike the topic of climate change, is entangled with human psychology, AI, and many other important topics; not discussing it would be highly costly. Moreover, if anyone on LessWrong disagrees with evolution, it's probably along Newsomian eccentric lines, not along tribal political lines. Also, lukeprog's comments on the subject made implicit claims about the policy implications of the science, not just about the science itself, which in turn is less clear-cut than the scientific case against a hypothesis requiring a supernatural agent, though for God's sake please nobody start arguing about exactly how clear-cut.

As a matter of basic netiquette, please use words like "mistaken" or "harmful" instead of "idiotic" to describe views you disagree with.

This post is mostly directed at newbies, which aren't supposed to be trained in trying to keep their brain from shutting down whenever the "politics" pattern matcher goes off.

In other words, it could cause some readers to stop reading before they get to the gist of the post. Even at Hacker News, I sometimes see "I stopped reading at this point" posts.

Also, I see zero benefit from mentioning global warming specifically in this post. Even a slight drawback outweigh zero benefit.

(I don't necessarily disagree with your points, I was simply making a relevant factual claim; yet you seem to have unhesitatingly interpreted my factual claim as automatically implying all sorts of things about what policies I would or would not endorse. Hm...)

Please don't insert gratuitous politics into LessWrong posts.

Global warming is a scientific issue.

...and what to do about it is a political issue.

Also, "will" may be the wrong concept.

How about "not enough people with the power to change things see sufficient reasons to so"?

http://www.medicineispersonal.com/

I hope they're not using that landing page for anything important. It's not clear what product (if any) they're selling, there's no call to action, and in general it looks to me like it's doing a terrible job of overcoming inferential distances. I'd say you did a far better job of selling them than they did. Someone needs to read a half a dozen blog posts about how customers only think of themselves, etc.

Great post by the way, Luke.

The inferential method that solves the problems with frequentism — and, more importantly, follows deductively from the axioms of probability theory — is Bayesian inference.

You seem to be conflating Bayesian inference with Bayes Theorem. Bayesian inference is a method, not a proposition, so cannot be the conclusion of a deductive argument. Perhaps the conclusion you have in mind is something like "We should use Bayesian inference for..." or "Bayesian inference is the best method for...". But such propositions cannot follow from mathematical axioms alone.

Moreover, the fact that Bayes Theorem follows from certain axioms of probability doesn't automatically show that it's true. Axiomatic systems have no relevance to the real world unless we have established (whether explicitly or implicitly) some mapping of the language of that system onto the real world. Unless we've done that, the word "probability" as used in Bayes Theorem is just a symbol without relevance to the world, and to say that Bayes Theorem is "true" is merely to say that it is a valid statement in the language of that axiomatic system.

In practice, we are liable to take the word "probability" (as used in the mathematical axioms of probability) as having the same meaning as "probability" (as we previously used that word). That meaning has some relevance to the real world. But if we do that, we cannot simply take the axioms (and consequently Bayes Theorem) as automatically true. We must consider whether they are true given our meaning of the word "probability". But "probability" is a notoriously tricky word, with multiple "interpretations" (i.e. meanings). We may have good reason to think that the axioms of probability (and hence Bayes Theorem) are true for one meaning of "probability" (e.g. frequentist). But it doesn't automatically follow that they are also true for other meanings of "probability" (e.g. Bayesian).

I'm not denying that Bayesian inference is a valuable method, or that it has some sort of justification. But justifying it is not nearly so straightforward as your comment suggests, Luke.

disclaimer: I'm not very knowledgeable in this subject to say the least.

This seems relevant: Share likelihood ratios, not posterior beliefs

It would seem useful for them to publish p(data|hypothesis) because then I can use my priors for p(hypothesis) and p(data) to calculate p(hypothesis|data).

Otherwise, depending on what information they updated on to get their priors I might end up updating on something twice.

What should I read to get a good defense of Bayesianism--that isn't just pointing out difficulties with frequentism, NHST, or whatever? I understand the math, but am skeptical that it can be universally applied, due to problems with coming up with the relevant priors and likelihoods.

It's like the problem with simple deduction in philosophy. Yes, if your premises are right, valid deductions will lead you to true conclusions, but the problem is knowing whether the premises used by the old metaphysicians (or modern ones, for that matter) are true. Bayesianism fails to solve this problem for many cases (though I'm not denying that you do sometimes know the relevant probabilities).

I do definitely plan on getting my hands on a copy of Richard Carrier's new book when it comes out, so if that's currently the best defense of Bayesianism out there, I'll just wait another two months.

What we currently call "science" isn't the best method for uncovering nature's secrets; it's just the first set of methods we've collected that wasn't totally useless like personal anecdote and authority generally are.

Prior methods weren't completely useless. Humans went from hunter-gatherers to civilization without the scientific method or a general notion of science. It is probably more fair to say that science was just much better than all previous methods.

NHST uses "p-values," which are statements about the probability of getting some data (e.g. one's experimental results) given the hypothesis being tested.

Wait, that confused me. I thought the p-value was the chance of the data given the null hypothesis.

I really like the discussions of the problems, but I would have loved to see more discussions of the solutions. How do we know, more specifically, that they will solve things? What are the obstacles to putting them into effect -- why, more specifically, do people just not want to do it? I assume it's something a bit more complex than a bunch of people going around saying "Yeah, I know science is flawed, but I don't really feel like fixing it." (Or maybe it isn't?)