Even if you have a nail, not all hammers are the same

(Related to Over-ensapsulation and Subtext is not invariant under linear transformation)

Between 2004 and 2007, Goran Bjelakovic et al. published 3 famous meta-analysis of vitamin supplements, concluding that vitamins don't help people but instead kill people.  This is now the accepted dogma; and if you ask your doctor about vitamins, she's likely to tell you not to take them, based on reading either one of these articles, or one of the many summaries of these articles made in secondary sources like The Mayo Clinic Journal.

The 2007 study claims that beta-carotene and vitamins A and E are positively correlated with death - the more you take, the more likely you are to die. Therefore, vitamins kill.  The conclusion on E requires a little explanation, but the data on beta-carotene and A is simple and specific:

Univariate meta-regression analyses revealed significant influences of dose of beta carotene (Relative Risk (RR), 1.004; 95% CI, 1.001-1.007; P = .012), dose of vitamin A (RR, 1.000006; 95% CI, 1.000002-1.000009; P = .003), ... on mortality.

This appears to mean that, for each mg of beta carotene that you take, your risk of death increases by a factor (RR) of 1.004; for each IU of vitamin A that you take, by a factor of 1.000006.  "95% CI, 1.001-1.007" means that the standard deviation of the sample indicates a 95% probability that the true RR lies somewhere between 1.001 and 1.007.  "P = .012" means that there's only a 1.2% chance that you would be so unlucky as to get a sample giving that result, if in fact the true RR were 1.

A risk factor of 1.000006 doesn't sound like much; but I'm taking 2,500 IU of vitamin A per day.  That gives a 1.5% increase in my chance of death!  (Per 3.3 years.)  And look at those P-values: .012, .003!

So why do I still take vitamins?

What all of these articles do, in excruciating detail with regard to sample selection (though not so much with regard to the math), is to run a linear regression on a lot of data from studies of patients taking vitamins.  A linear regression takes a set of data where each datapoint looks like this:

     Y = a1X1 + c

and a multiple linear regression takes a set of data where each datapoint usually looks like this:

     Y = a1X1 + a2X2 + ... anXn + c

where Y and all the Xi's are known.  In this case, Y is a 1 for someone who died and a 0 for someone who didn't, and each Xi is the amount of some vitamin taken.  In either case, the regression finds the values for a1, ... an, c that best fit the data (meaning they minimize the sum, over all data points, of the squared error of the value predicted for Y, (Y - (a1X1 + a2X2 + ... anXn + c)2).

Scientists love linear regression.  It's simple, fast, and mathematically pure.  There are lots of tools available to perform it for you.  It's a powerful hammer in a scientists' toolbox.

But not everything is a nail.  And even for a nail, not every hammer is the right hammer.  You shouldn't use linear regression just because it's the "default regression analysis".  When a paper says they performed "a regression", beware.

A linear analysis assumes that if 10 milligrams is good for you, then 100 milligrams is ten times as good for you, and 1000 milligrams is one-hundred times as good for you.

This is not how vitamins work.  Vitamin A is toxic in doses over 15,000 IU/day, and vitamin E is toxic in doses over 400 IU/day (Miller et al. 2004, Meta-Analysis: High-Dosage Vitamin E Supplementation May Increase All-Cause Mortality;  Berson et al. 1993, Randomized trial of vitamin A and vitamin E supplementation for retinitis pigmentosa.). The RDA for vitamin A is 2500 IU/day for adults. Good dosage levels for vitamin A appear to be under 10,000 IU/day, and for E, less than 300 IU/day. (Sadly, studies rarely discriminate in their conclusions between dosage levels for men and women.  Doing so would give more useful results, but make it harder to reach the coveted P < .05 or P < .01.)

Quoting from the 2007 JAMA article:

The dose and regimen of the antioxidant supplements were: beta carotene 1.2 to 50.0 mg (mean, 17.8 mg) , vitamin A 1333 to 200 000 IU (mean, 20 219 IU), vitamin C 60 to 2000 mg (mean, 488 mg), vitamin E 10 to 5000 IU (mean, 569 IU), and selenium 20 to 200 μg (mean 99 μg) daily or on alternate days for 28 days to 12 years (mean 2.7 years).

The  mean  values used in the study of both A and E are in ranges known to be toxic. The maximum values used were ten times the known toxic levels, and about 20 times the beneficial levels.

17.8 mg of beta-carotene translates to about 30,000 IUs of vitamin A, if it were converted to vitamin A. This is also a toxic value. It is surprising that beta-carotene showed toxicity, though, since common wisdom is that beta-carotene is converted to vitamin A only as needed.

Vitamins, like any medicine, have an inverted-J-shaped response curve. If you graph their health effects, with dosage on the horizontal access, and some measure of their effects - say, change to average lifespan - on the vertical axis, you would get an upside-down J. (If you graph the death rate on the vertical axis, as in this study, you would get a rightside-up J.) That is, taking a moderate amount has some good effect; taking a huge a mount has a large bad effect.

If you then try to draw a straight line through the J that best-matches the J, you get a line showing detrimental effects increasing gradually with dosage. The results are exactly what we expect. Their conclusion, that "Treatment with beta carotene, vitamin A, and vitamin E may increase mortality," is technically correct. Treatment with anything may increase mortality, if you take ten times the toxic dose.

For a headache, some people take 4 200mg tablets of aspirin. 10 tablets of aspirin might be toxic. If you made a study averaging in people who took from 1 to 100 tablets of aspirin for a headache, you would find that "aspirin increases mortality".

(JAMA later published 4 letters criticizing the 2007 article.  None of them mentioned the use of linear regression as a problem.  They didn't publish my letter - perhaps because I didn't write it until nearly 2 months after the article was published.)

Anyone reading the study should have been alerted to this by the fact that all of the water-soluble vitamins in the study showed no harmful effects, while all of the fat-soluble vitamins "showed" harmful effects. Fat-soluble vitamins are stored in the fat, so they build up to toxic levels when people take too much for a long time.

A better methodology would have been to use piecewise (or "hockey-stick") regression, which assumes the data is broken into 2 sections (typically one sloping downwards and one sloping upwards), and tries to find the right breakpoint, and perform a separate linear regression on each side of the break that meets at the break.  (I almost called this "The case of the missing hockey-stick", but thought that would give the answer away.)

Would these articles have been accepted by the most-respected journals in medicine if they evaluated a pharmaceutical in the same way?  I doubt it; or else we wouldn't have any pharmaceuticals.  Bias against vitamins?  You be the judge.

Meaningful results have meaningful interpretations

The paper states the mortality risk in terms of "relative risk" (RR).  But  relative risk  is used for studies of 0/1 conditions, like smoking/no smoking, not for studies that use regression on different dosage levels.  How do you interepret the RR value for different dosages?  Is it RR x dosage?  Or RRdosage (each unit multiplies risk by RR)?  The difference between these interpretations is trivial for standard dosages.  But can you say you understand the paper if you can't interpret the results?

To answer this question, you have to ask exactly what type of regression the authors used.  Even if a linear non-piecewise regression were correct, the best regression analysis to use in this case would be a logistic regression, which estimates the probability of a binary outcome conditioned on the regression variables. The authors didn't consider it necessary to report what type of regression analysis they performed; they reported only the computer program (STATA) and the command ("metareg").  The  STATA metareg manual  is not easy to understand, but three things are clear:

  • It doesn't use the word "logistic" anywhere, and it doesn't use the logistic function, so it isn't logistic regression.
  • It does regression on the log of the risk ratio between two binary cases, a "treatment" case and a "no-treatment" case; and computes regression coefficients for possibly-correlated continuous treatment variables (such as vitamin doses).
  • It doesn't directly give relative risk for the correlated variables.  It gives regression coefficients telling the change in log relative risk per unit of (in this case) beta carotene or vitamin A.  If anything, the reported RR is probably er, where  r is the computed regression coefficient.  This means the interpretation is that risk is proportional to RRdosage.

Since there is no "treatment/no treatment" case for this study, but only the variables that would be correlated with treatment/no treatment, it would have been impossible to put the data into a form that metareg can use.  So what test, exactly, did the authors perform?  And what do the results mean?  It remains a mystery to me - and, I'm willing to bet, to every other reader of the paper.

References

Bjelakovic et al. 2007,  "Mortality in randomized trials of antioxidant supplements for primary and secondary prevention: Systematic review and meta-analysis",  Journal of the American Medical Association, Feb. 28 2007. See a commentary on it  here.

Bjelakovic et al. 2006, "Meta-analysis: Antioxidant supplements for primary and secondary prevention of colorectal adenoma", Alimentary Pharmacology & Therapeutics 24, 281-291.

Bjelakovic et al. 2004, "Antioxidant supplements for prevention of gastrointestinal cancers: A systematic review and meta-analysis," The Lancet 364, Oct. 2 2004.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 11:55 PM
Select new highlight date
All comments loaded

I don't understand. Why is "they used the wrong statistical formula" worth 47 upvotes on the main article? Because people here are interested in supplementation? Because it's a fun math problem?

In the other comments, people are discussing which algorithm would be more appropriate, and debating the nuances of each particular method. Not willing to take the time to understand the math, it comes across as, "This could be right, or wrong, depending on such-and-such, and boy isn't that stupid..."

I run into this problem every time I read anything on health or medicine (it seems limited to these topics). Someone says it's good for you, someone says it's bad for you, both sides attack the other's (complex, expert) methods, and the non-expert is left even more confused than when they first started looking into the matter. And it doesn't help that personal outcomes can be drastically different regardless of the normal result.

To me, this topic is still confusing, with a slight update toward "take more vitamins." Without taking classes in statistics and/or medicine, how can I become less wrong on problems like this? Who can I trust, and why?

Why is "they used the wrong statistical formula" worth 47 upvotes on the main article?

47 votes doesn't mean "This is a great article". It means 47 more people liked it than disliked it. Peanut butter gets more karma than caviar.

I don't understand. Why is "they used the wrong statistical formula" worth 47 upvotes on the main article? Because people here are interested in supplementation? Because it's a fun math problem?

Both. It's instrumental in that vitamin supplementation is a concern many here have. It's also useful as an example of how studies can have flaws, and how these flaws can be found with surprisingly little analysis. Dissections of bad studies helps us avoid similar flaws in our own conclusions. And there are indeed researchers on LessWrong, as well as motivated laymen that can follow the math, and even run their own mathematical regressions. This truly is valuable.

Not willing to take the time to understand the math, ... Without taking classes in statistics and/or medicine, how can I become less wrong on problems like this?

You can't learn to be less wrong about mathematical questions without learning more math. (By definition.)

You can't learn to be less wrong about mathematical questions without learning more math. (By definition.)

Depends. I could become less wrong about mathematical questions by learning to listen to people who are less wrong about math. (More generally: I may be able to improve my chance of answering a question correctly even if I can't directly answer it myself.)

The way traditional rationalists without special training relate to scientific findings is usually by uncritically accepting them as authoritative. One can become less wrong by learning that scientists are not close to perfect. They make mistakes and sometimes deceive themselves and others. Probably the single most common way this happens is through statistical malpractice. This post, in excellent detail and language non-experts can comprehend, explains one such case and identifies a general type of statistical screw-up: using the wrong tool.

Who can I trust, and why?

Trust no one. Learn a little math.

You don't need to be able to solve the problems on your own, just enough to understand the arguments. I'm not a math guy either but only some of the statistical stuff is totally out of my grasp. Did you understand the math in this post?

In the other comments, people are discussing which algorithm would be more appropriate, and debating the nuances of each particular method. Not willing to take the time to understand the math, it comes across as, "This could be right, or wrong, depending on such-and-such, and boy isn't that stupid..."

It shouldn't. It should come across as "this is wrong, it can make people die, it is evil, don't listen to it."

I run into this problem every time I read anything on health or medicine (it seems limited to these topics).

This is an interesting point in itself. Why health and medicine?

Maybe causal inference is straight up more difficult in health and medicine: effects are smaller and more ambiguous than in hard sciences, and have many hard-to-manipulate causes that blur the signal.

There are borderline results in fields like physics, obviously, but they're usually more esoteric and tend to have relatively clear cut theory behind them (which is why I'd guess you're not too worried about, say, last year's ambiguous results from the Cryogenic Dark Matter Search), so they don't provoke so much back-and-forthing.

This leads me to a prediction: you'd have as much difficulty reading up on results in psychology and sociology as you do in health and medicine. As for what to do about it? Uh...not sure. I'm still chewing over this thread.

This is an interesting point in itself. Why health and medicine?

Maybe causal inference is straight up more difficult in health and medicine: effects are smaller and more ambiguous than in hard sciences, and have many hard-to-manipulate causes that blur the signal.

My self-serving explanation is that health/ medicine/ biology select for people who enjoy (or better tolerate) rote memorization (Levels 0-1 in my hierarchy) rather than "how it works"-type understanding (Levels 2-3). This gets the group of intelligent people with worse ability to know the broader meaning of what they're doing, a skill that tends to curtail questionable statistical practices.

Yes, I know it sounds insulting, but what really turned me off from taking more biology in high school and college, and from med school, is that it's so much more memorization-oriented rather than generative-model oriented. This suspicion is confirmed when I hear about e.g. ecologists just now getting around to using the method of adjacency matrix eigenvectors (i.e., Google's PageRank) to identify key organisms in ecosystems.

And an alternate alternate explanation: Poor priorities. Doctors want to hear all the clinical details, and are mentally worn out by the time they finish with those. There's just no time or energy to do the math too.

When I used to work for NASA in theoretical air traffic management, I'd try to explain some abstract point about turbulent or chaotic traffic flow to operational FAA guys, and they would get bogged-down in details about what kind of planes we were talking about, what altitudes they were flying at, which airlines they belonged to, and on and on.

Here's an alternate insulting explanation, based on many (but not all) of the doctors I've met:

Doctors are bad at being unsure of themselves.

Wow, I just read Robin's writeup on this and it caused me to significantly lower the amount of credence I place on his other positions (but very slightly lower my opinion of supplements). It just struck me as overwhelmingly sloppy and rhetorical. Particularly his justification attempt in response to this thread. (But I suppose Robin's responses to criticism have never impressed me anyway.)

Here is what is bothering me:

The only causal hypothesis I've heard for the supplements leading to an higher rate of death is Phil's claim that the doses are too high and are resulting in vitamin toxicity. If we accept Robin's claim that "vitamins kill" (that is that the causal claim is true and the results aren't just the result of uncontrolled-for correlations) one of three things has to be true: vitamins kill no matter how the body obtains them; there is something involved in taking 'supplements' that kills; or Phil is right and vitamin toxicity is the cause. (Is there an option I'm missing?) It seems extremely clear to me that absent any hypothesis for the second of these, vitamin toxicity is by far the most likely explanation. And the best part about this hypothesis is that is also the easiest to test. All you have to do is look for hockeysticks!

Why refuse to test the one explanation for the results we have?

Another possibility is that cancers have higher nutritional needs than normal cells, and some vitamins might be feeding cancer more than they're feeding the person.

One more explanation: some supplements might come from untrustworthy manufacturers and be contaminated with an unidentified toxin. That's a tough one to test, since it's unlikely that any of the studies saved samples of the pills they used or even documented where they came from.

One more hypothesis: supplements are risky because you may end up with vitamins and minerals which are out of proportion with each other.

They do the same kind of thing with ionizing radiation: a lot of organizations assume that the health effects of radiation are completely linear, even far below the range where we've been able to measure, despite the lack of evidence for this (and some evidence suggesting a J-shaped curve). Other organizations refuse to extrapolate to extremely low doses, citing the lack of evidence.

The issue is just way too politicized.

There's a general principle that very small doses of toxins or stresses of any kind - vaccines, radiation, oxidants, poisons, alcohol, heat, cold, exercise - are beneficial, because they provoke the body to a protective overreaction. One of the talks at the 2007 DC conference on cognitive aging even suggested that this is responsible for why people who think more have fewer memory problems as they age.

(This suggests that our bodies are lazy - they could maintain themselves better than they do on every dimension. Or it might be that, if we measured all the responses simultaneously, we'd find that mounting a protective response to radiation made us more vulnerable to infection, alcohol, and all the rest.)

Or it might be that, if we measured all the responses simultaneously, we'd find that mounting a protective response to radiation made us more vulnerable to infection, alcohol, and all the rest.

Or maybe it would just require the expenditure of energy.

The fact that this assumption is made so explicit makes it much less problematic than the problem this article is talking about.

Thanks Phil. I am suitably outraged at both that both the authors and the journal published this.

I'm not sure whether 'benefit of the doubt' in this instance suggests 'political motivation' or 'incompetence'. I'll give them whichever benefit of the doubt they prefer. The most basic knowledge of the field suggests a prior probability that a fat soluble vitamin has a linear response with dosage is negligible.

I think the simplest hypothesis is that this was a case of pushbutton statistics - get a statistics package, read the documentation, and feed it numbers until it gives you numbers back.

The papers overwhelm the reader with so many details about how to categorize and treat the different samples in the meta-study, that it's easy to feel like they've "done enough" and just wave the math through.

It might be that, in order to pay more attention to statistical correctness, you've got to pay less attention to other details. A person has only so much mental energy! So it may reflect not poor statistics skills so much as poor priorities. Doctors want to hear all the clinical details; but there's little time and mental energy left for anything else.

You seem to have a mild case of pushbutton statistics

Negligence hurts people. In this case it hurts people at the margin, where nutritional advice from misinformed doctors tips the scales. Yet while negligence in surgery is a PR nightmare, there is still a net benefit of prestige to having papers published, read and referenced even when it can be shown that the research is flawed. If only negligent publications came with a commensurate penalty to the credibility of the author and journal, even if only until they published a suitable retraction.

For anyone interested, here is a decent algorithm for getting the "correct" number of lines in your linear regression.

http://www.cs.princeton.edu/~wayne/kleinberg-tardos/06dynamic-programming-2x2.pdf

Pages 5 and 6.

So why do you still take vitamins? If you look at their Figure 2, there aren't many studies that 'favored antioxidants', and some of those studies had low doses.

"A linear analysis assumes that if 10 milligrams is good for you, then 100 milligrams is ten times as good for you, and 1000 milligrams is one-hundred times as good for you." That's only true if the range of data included both 10 milligrams and 1000 milligrams. Linearity is only assumed within the range of data of the data sets.

The hockey stick approach seems too restrictive as well. Just use a p-spline.

There doesn't appear to be statistician on the paper. This study really needed one. Using meta-regression to estimate a dose effect is challenging, especially when you don't have access to the original data (just using aggregate, study-level covariates). In fact, the dose effect and the concept of study heterogeneity are conflated here.

I agree with you that it's unclear what they actually did.

What vitamins does everyone take? I take a no-iron multi-vitamin, extra vitamin D, and fish oil, all from cheap sources. I would be especially curious if anyone takes/has evidence for more expensive vitamins that are better absorbed.

better methodology would have been to use piecewise (or "hockey-stick") regression, which assumes the data is broken into 2 sections (typically one sloping downwards and one sloping upwards), and tries to find the right breakpoint, and perform a separate linear regression on each side of the break that meets at the break. (I almost called this "The case of the missing hockey-stick", but thought that would give the answer away.)

An even better methodology would be to allow for higher order terms in the regression model. Adding squared terms, the model would look like this:

or

This would allow for nice those nice looking curves you were talking about. And it can be combined with logistic regression. Really, regression is very flexible; there's no excuse for what they did.

Also, the scientists could have done a little model checking. If what Phil says about the U/J shaped response curve is true, the first order model would have been rejected by some sensible model selection criterion (AIC, BIC, stepwise selection, lack-of-fit F test, etc)

related side note: In my grad stat classes, "Linear Regression" usually includes things like my example above - i.e. linear functions of the (potentially transformed) explanatory variables including higher order terms. Is this different from the how the term is widely used?

unrelated side note: is there a way to type pretty math in the comments?

followup question: are scientists outside of the field of statistics really this dumb when it comes to statistics? It seems like they see their standard methods (i.e., regression) as black boxes that take data as an input and then output answers. Maybe my impression is skewed by the examples popping up here on LW.

is there a way to type pretty math in the comments?

Yes: Comment formatting

One supplement that is now being widely prescribed is Vitamin D. Testing to see if one is Vitamin D deficient is common here in the Boston area. It is being suggested that insufficient Vitamin D is linked to multiple health ailments- autoimmune diseases and increased risk of heart attacks in particular. My doctor prescribed Vitamin D supplements for me.