Prior Probability: the Dirty Little Secret of “Evidence-Based Alternative Medicine”—Continued

This is an addendum to my previous entry on Bayesian statistics for clinical research.† After that posting, a few comments made it clear that I needed to add some words about estimating prior probabilities of therapeutic hypotheses. This is a huge topic that I will discuss briefly. In that, happily, I am abetted by my own ignorance. Thus I apologize in advance for simplistic or incomplete explanations. Also, when I mention misconceptions about either Bayesian or “frequentist” statistics, I am not doing so with particular readers in mind, even if certain comments may have triggered my thinking. I am quite willing to give readers credit for more insight into these issues than might be apparent from my own comments, which reflect common, initial difficulties in digesting the differences between the two inferential approaches. Those include my own difficulties, after years of assuming that the “frequentist” approach was both comprehensive and rational—while I had only a cursory understanding of it. That, I imagine, placed me well within two standard deviations of the mean level of statistical knowledge held by physicians in general.

Opinions are Inescapable

First, a couple of general observations. As Steven Goodman and other Bayesian advocates have argued, we do not now avoid subjective beliefs regarding clinical hypotheses, nor can we. There is not a legitimate choice between making subjective estimates of prior probabilities and relying exclusively on “objective” data, notwithstanding the wishes of those who zealously cling to the “frequentist” school. The inescapable fact is that there is no logical resolution to the age-old epistemological dilemma of induction vs. deduction. “Frequentist” statistical tools, designed for deductive inference, do not address the question of how data from a particular experiment affect the probability of a hypothesis being correct—an inductive exercise.

The simplest illustration of the problem, as discussed by Goodman and others, is the “P-value fallacy“: while most physicians and many biomedical researchers think that a “P” of 0.05 for a clinical trial means that there is only a 5% chance that the null hypothesis is true, that is not the case. Here is what “P=0.05″ actually means: if many similar trials are performed testing the same novel hypothesis, and if the null hypothesis is true, then it (the null) will be falsely rejected in 5% of those trials. For any single trial, it doesn’t tell us much.

“Non-Informative Priors”: Bayes Factors are the new P-values

Turning to the matter at hand, it’s important to reiterate that the usefulness of Bayesian analysis does not depend exclusively on unique or known prior probabilities, or even on ranges that a group of experts might agree upon. The power of Bayes’ Theorem is to show how data from an investigation alter any prior probability to generate a new, “posterior probability.” Thus one class of prior probabilities, dubbed “non-informative” or “reference priors,” is arbitrary. As discussed in Goodman’s second article (pp. 1006-8), their usefulness stems from a term in Bayes’ Theorem that is entirely objective: the Bayes Factor. Recall that the Bayes Factor is a “likelihood ratio,” which in its simplest form compares how well the data reflect the best supported hypothesis to how well the data reflect the null hypothesis. This means that the Bayes Factor can be the useful “measure of evidence” for a single trial that most of us had previously assumed, incorrectly, the “P-value” to be. Goodman advocates this use and develops an “exchange rate” between the two values (p. 1007).

This can be illustrated by an example. A few years ago a report concluded: ”we found thatsupplementary, remote, blinded, intercessory prayer produceda measurable improvement in the medical outcomes of criticallyill patients.” The effect was said to be demonstrated by lower “CCU course scores” in the prayed-for group compared to the controls, which the authors deemed “statistically significant” because ”P=.04.”

Although there were numerous flaws in the study, this discussion will be restricted to the P-value claim. The previous couple of paragraphs will have already alerted the reader to the fallacy of the “P=.04″ assertion, but let’s put the cards on the table:

  1. The authors appeared to believe, and seemed to expect readers to believe, that their data showed that there was only a 4% chance of a “null” effect.
  2. Referring to Goodman’s assumptions for “exchange rates” between P values and Bayes Factors, P=.04 can be shown to be equivalent to a “minimum” Bayes factor (i.e., the one most consistent with the data) of slightly less than 0.15 (Table 2, p. 1008).
  3. Thus picking an arbitrary, “non-informative” prior probability of 50%  (even odds) would yield a posterior probability (by Bayes Theorem) of the “null” effect of about 12%: far greater than the ≤5% that we have been taught to think of as “statistically significant” evidence of a treatment effect.
  4. Using the same table, it can be seen that to be truly 96% confident of a non-null effect after the study, the prior probability of “intercessory prayer”—people praying from a distance for patients they do not know and who are unaware of it—would have to have been about 70%. (That is, the prior probability of the null hypothesis would have to have been about 30%).

By starting with a “neutral” prior, using Bayesian statistics in this case demonstrates that the data did not show what the investigators concluded. Alternatively, there needn’t have been a unique prior probability estimate, or the investigators might have been asked to offer their own estimates—even if those might have seemed unrealistically “enthusiastic”—which could then have been compared to “skeptical” priors offered by others. Both of those terms appear in Bayesian literature as examples of arbitrary priors—or so I gather. Whatever the estimates, they would have been available for others to scrutinize.

Extrapolating to more reasonable prior probability realms (see below), it would have required far more dramatic findings  to reach a posterior probability of ≥95% of a non-null effect. If we also impose varying estimates of bias, the posterior probabilities of a non-null effect become lower, maybe vastly lower.

Informative Priors

Estimating “reasonable prior probability realms”—which offers the fullest use of Bayes’ Theorem for clinical trials and health care policy questions—is a topic of major interest in the field. Rather than address it myself (I’m not competent to do so), I’ll make a brief introduction and then refer readers to more informative sources. First, although there is always a subjective element, there are systematic means to estimate “priors,” and these add considerable rigor to the exercise. Second, such priors are expected to be “transparent,” i.e., the bases for their derivation must be clearly stated. This is quite different from the way subjective opinion is typically introduced now, as illustrated on p. 1002 of Goodman’s first article. Third, priors (of all types) are usually given as distributions, not as discrete numbers. Fourth, they are expected to be derived from all pertinent information. This is from the non-quantitative, online Primer on Bayesian Statistics in Health Economics and Outcomes Research (O’Hagan & Luce):

The ‘Evidence’

Prior information should be based on sound evidence and reasoned judgements. A good way to think of this is to parody a familiar quotation: the prior distribution should be ‘the evidence, the whole evidence and nothing but the evidence’:

  • ‘the evidence’ – genuine information legitimately interpreted;
  • ‘the whole evidence’ – not omitting relevant information (preferably a consensus that pools the knowledge of a range of experts);
  • ‘nothing but the evidence’ – not contaminated by bias or prejudice.

A more sophisticated, quantitative treatment can be found here. Peruse the table of contents, especially chapter 5.

Most Hypotheses are Wrong…

There is at least one broad theme regarding estimating prior probabilities. It is virtually self-evident that most hypotheses are wrong. This follows from two premises: 1. the number of possible hypotheses is limited only by human imagination; 2. most people’s imaginations are not informed by a sophisticated appreciation for nature or science. Even those who are exceptions to the second premise are humbled by the elusiveness of fruitful insights. Consider the words of Nobel laureate Peter Medawar, one of the great immunologists of the 20th century:

It is a layman’s conclusion that in science we caper from pinnacle to pinnacle of achievement and that we exercise a method which preserves us from error. Indeed we do not; our way of going about things takes it for granted that we guess less often right than wrong…

It is likely that the multitude of less gifted scientists will guess right even less often. John Ioannidis, whose recent paper “Why most published research findings are false” is a favorite among your SBM bloggers, asserts in his article that “the majority of modern biomedical research is operating in areas with very low pre- and post-study probability for true findings.” (Hold the phone! Steven Goodman, our other hero of Bayesian analysis, disagreed with Ioannidis, although not entirely; Ioannidis replied here).

According to FDA Consumer,

More often than many scientists care to admit, researchers just have to give up when a drug is poorly absorbed, is unsafe, or simply doesn’t work. The organization Pharmaceutical Research and Manufacturers of America estimates that only 5 in 5,000 compounds that enter preclinical testing make it to human testing, and only 1 of those 5 may be safe and effective enough to reach pharmacy shelves.

Even granting a degree of self-serving exaggeration by “BigPharm,” it is clear that most proposed, biologically plausible drugs never pan out.

…Hence Most Priors are Low (but you still have to estimate each on its own terms)

Add to all of that the ridiculous certitudes of the hopelessly naive, and it is clear that there is a surfeit of guessing wrong about how things work. Does that mean that most proposals for trials have a prior probability of, say, less than 0.5? You bet it does (much less, in most cases), although that fact does not properly figure into the estimate of any single proposal.

Later: some words about other useful comments regarding the “Galileo gambit” and more.


The Prior Probability, Bayesian vs. Frequentist Inference, and EBM Series:

1. Homeopathy and Evidence-Based Medicine: Back to the Future Part V

2. Prior Probability: The Dirty Little Secret of “Evidence-Based Alternative Medicine”

3. Prior Probability: the Dirty Little Secret of “Evidence-Based Alternative Medicine”—Continued

4. Prior Probability: the Dirty Little Secret of “Evidence-Based Alternative Medicine”—Continued Again

5. Yes, Jacqueline: EBM ought to be Synonymous with SBM

6. The 2nd Yale Research Symposium on Complementary and Integrative Medicine. Part II

7. H. Pylori, Plausibility, and Greek Tragedy: the Quirky Case of Dr. John Lykoudis

8. Evidence-Based Medicine, Human Studies Ethics, and the ‘Gonzalez Regimen’: a Disappointing Editorial in the Journal of Clinical Oncology Part 1

9. Evidence-Based Medicine, Human Studies Ethics, and the ‘Gonzalez Regimen’: a Disappointing Editorial in the Journal of Clinical Oncology Part 2

10. Of SBM and EBM Redux. Part I: Does EBM Undervalue Basic Science and Overvalue RCTs?

11. Of SBM and EBM Redux. Part II: Is it a Good Idea to test Highly Implausible Health Claims?

12. Of SBM and EBM Redux. Part III: Parapsychology is the Role Model for “CAM” Research

13. Of SBM and EBM Redux. Part IV: More Cochrane and a little Bayes

14. Of SBM and EBM Redux. Part IV, Continued: More Cochrane and a little Bayes

15. Cochrane is Starting to ‘Get’ SBM!

16. What is Science? 

Posted in: Clinical Trials, Medical Academia, Science and Medicine

Leave a Comment (29) ↓

29 thoughts on “Prior Probability: the Dirty Little Secret of “Evidence-Based Alternative Medicine”—Continued

  1. pec says:

    Yes you’re right, science is not a simple method for getting clear answers. However what you’re saying can easily be misused — you can discount any results you don’t happen to like by claiming a low prior probability.

    And, of course, you don’t like any of CAM’s positive results. This provides a way to apply a double standard and bend the statistics with subjective judgments.

  2. pec, your flair for trivializing has exceeded even its own lofty standards. Here are a few key points that have been discussed here over the past few weeks:

    Opinion is inescapable, but behind the guise of frequentist statistics is hidden, underdeveloped, incomplete, disorganized, lacks rigor, may be dishonest, etc.

    Formal estimates of prior probability require honest, rigorous, transparent, and exhaustive discussions of the issue.

    An example of at least part of how such an estimation might be derived was offered in a sequence of 5 postings about homeopathy. Needless to say, that has never been done in any reports or reviews of homeopathy trials.

    Those postings demonstrate, to anyone who knows a little bit about how things work, that the prior probability of homeopathy is infinitesimal.

    The same cannot be said for many other proposals.

  3. pec says:

    If you do not believe something is possible then, according to you, the prior probability has to be infinitesimal. If you are an atheist, for example, then you “know” that healing prayer cannot work. But someone with different experiences and beliefs would “know” the opposite.

    People who are interested in CAM and think it has potential usually come from a philosophical perspective that is completely different from yours. You and they would never agree how to set the prior probabilities. If you ignored their input and set them according to your belief system, you would win every time.

    But is this supposed to be a political game, or is it supposed to be science, a quest for understanding?

    Of course, it has to be both, since funding is limited and everyone wants some. Therefore, we should try to keep the rules fair. The cutoff for p should always be the same within a given field.

    No, we should not believe the results of one prayer study because p was .04. But the same goes for statins, cancer drugs, etc.

  4. pec, your final paragraph is correct, and if you had read all that I’ve written you would know that I already agree with it. That has been the point of my last two posts.

    The rest of what you’ve written is misleading at best, but mostly betrays an ignorance of “CAM” methods, of the points made here, and of nature. First the misleading part: although your final paragraph is correct, it fails to acknowledge that many “evidence-based” treatments are based on evidence that is both plausible to begin with and supported by not one, but many trials for which “P < 0.05.” When many trials show such results, “P” is no longer fallacious (read Goodman). For “intercessory prayer,” homeopathy, and every other “CAM” claim that is highly implausible, there has been no such consistency of “positive” trials. Rather, trials have flitted around the null effect—exactly as expected if the claims are spurious.

    Now for the ignorant parts:

    Regarding your comment about “the cutoff for P”: re-read the two parts of this blog and read Goodman and Ioannidis. When you understand them, but not before, you will have earned the right to comment about “the cutoff for P.”

    Neither I nor the other bloggers here have been motivated by some pre-existing belief to decide that the prior probability of homeopathy, intercessory prayer, or other claims is low. I had no beliefs about homeopathy until I investigated its tenets. It soon appeared that homeopathy probably clashed with nature, but in order to be sure I investigated further. My findings, many of which I summarized in the series of homeopathy blogs over the last few weeks, convinced me that homeopathy is far more easily explained by common, well-established psychological and social phenomena than by esoteric, fantastic physical “theories” that contradict well-established facts of nature—including, for example, aspects of information theory that make it possible for you and me to be having this debate online.

    Similar points can be made about intercessory prayer, although the details are different. There are good scientific reasons to doubt it (it has a “receptor” problem and, in some versions, violates conservation of energy), but there is also a long history of failed trials: more than one hundred years worth, because it is psychokinesis by another name—as even its most enthusiastic proponents admit (Larry Dossey, for example).

    My opinions are not arbitrary, did not precede my investigations of the methods, and are not ignorant of either the methods or of what our species has learned about nature. My opinions are the results of those investigations.

    Of course there are people with “different beliefs.” But yes, this is supposed to be science, a quest for understanding. Not merely “beliefs.”

  5. pec says:

    No competent researcher would accept a hypothesis based on one experiment. That is the purpose of meta-analyses. There is no need to introduce subjective judgments about whether an idea seems plausible to you.

  6. pec says:

    [this is supposed to be science, a quest for understanding. Not merely “beliefs.”]

    Yes, I thought that’s what I was saying.

  7. daedalus2u says:

    pec, I have the hypothesis that if I drop an object 10,000 times on February 24, 2008, I can predict if will fall to the ground at least 99.99% of the time. I do a single experiment, on February 23, 2008, by dropping the object and it falls to the ground. I consider that my hypothesis is confirmed and that I don’t need to do any more experiments.

    How many times would you need to drop the object on February 23, 2008 to be willing to bet that it would fall at least 99.99% of the time?

    Of course I do have some prior expectations that my hypothesis will be confirmed correct. You may call it subjective judgment, I call it prior theoretical plausibility. Dropping it once confirms that the object has a density greater than air. My theoretical understanding of the physics of falling objects gives me a very high degree of confidence that the object will fall the next day too.

  8. pec says:

    I SAID that we can rely on meta-analyses and therefore do not need to bother with prior probabilities. There is no need to introduce even more subjective judgments into science!

  9. BlazingDragon says:

    Pec, your arguments are bunk. You claim that science is ignoring real “theories” as to why woo-based crap works. The problem is, they are not “theories.” They are the opinion of people who are ignorant of basic physics, chemistry, and biology. Just because someone states that something is true, doesn’t mean that it is.

    Dr. Atwood clearly stated that the method of derivation of probabilities has to be transparent (and therefore subject to criticism and revision, if necessary). If someone has a crack-pot estimation of “probability” that is based on theories that clash (in a major way) with well-supported physical rules of nature, then it isn’t a “probability,” it is someone pulling a number out of their rear-end and claiming it is a probability.

    Woo-based therapies that are based on “hypotheses” that contradict basic laws of nature (things that have been observed, tested, re-tested, etc. for anywhere from 50-200+ years) are NOT scientifically valid, and no matter how many times you state that “scientists are ignoring X” (where X is some fatally flawed statement that is blatantly contradicted by modern science), it will not become true.

    If you want to believe your pet “theories,” by all means, do so. Just don’t expect any scientist to accept them and don’t whine when you are laughed at or ridiculed by people who use logic and reason, not belief, to determine reality.

  10. BlazingDragon says:

    P.S. As a post-script, Einstein’s theories were first thought of as nutty by a lot of physicists. But his ideas have been tested many many times and found to be accurate every single time. Scientists have tested many of his theories to incredible precision and he’s STILL right. This is a great example of someone who had a “nutty” idea, but he put down equations that could be used to devise real experiments and make firm predictions as to what would happen in those experiments. Most people had to accept (grudgingly in some cases) that his theories were highly likely to be true as the number of experimental verifications of predictions based on his theories continued to increase…

    This is a classic example of proving the “establishment” dead wrong and a great example of how to do it. Come up with the equivalent of E= mc^2 for your favorite woo-based therapy and make some hard-and-fast predictions. Someone, somewhere will do the trial. This is how science is done. It isn’t done because you whine that people don’t take crazy theories seriously.

  11. daedalus2u says:

    BlazingDragon, well said. Just to emphasize your point, if a “hypothesis” is contradicted by data, it isn’t a “hypothesis” any more, it is a wrong idea.

  12. pec says:

    Controversies in medicine, and many other fields, are very rarely settled as clearly as Einstein’s theory of relativity. Even in physics you don’t often see anything that dramatic and conclusive.

    So anti-CAM activists will often point to Einstein’s theory as the prototype of good science, the example everyone should follow. But mainstream medical science can’t follow the example either, in most cases.

    When battling against CAM, at least use examples from medical research, rather than physics. And don’t select the rare experiments that have definitive results.

  13. BlazingDragon says:


    Your ignorance of physics is showing. Einstein’s theory wasn’t (and still isn’t) one experiment. He did a hell of a lot more than predict E = mc^2. This equation is the simplest of his predictions, but the fact that it is true has a hell of a lot of ramifications beyond this equation (if this equation is true, many other things are also true). He also predicted a lot of other stuff (the math on these is much more complicated, which is why you never see them in the popular media) and they were all correct as well, down to incredible precision.

    His theory (even though it is from physics) is a perfect example of how ground-breaking and paradigm-changing science should be done. Since Einstein’s theories were so radical, they required a LOT of proof, in multiple ways, before they were widely accepted. The reason they were accepted isn’t because Einstein is some sort of “physics God” that everyone believes just because… the reason they were accepted is that they have proven true, time and time again, no matter how they are tested, no matter to what degree of precision/accuracy they are tested.

    All in all, a perfect example and a standard CAM will not meet because most of CAM “theory” is based on stuff that is wildly out of sync with biology, chemistry, and physics.

  14. pec says:

    I never said anything about Einstein’s theory being “one experiment.”

    I said you should compare CAM research to mainstream medical research. Medical research of any kind is usually fuzzy and ambiguous, unlike certain areas of physics.

    When you compare CAM to the aspects of Einstein’s theories that were settled clearly, then you are making an irrelevant comparison. You are just trying to argue that CAM is bad, and you are not concerned with being rational or fair.

  15. wertys says:


    Just because medical research has areas where uncertainties exist does not mean that we should lower the bar regarding the quality of research which is acceptable. It is not as though there are no areas where knowledge is not fairly certain, and predictions can’t be made. The problem with sCAM research is that it usually fails to make predictions which can be tested, and rationalizes post hoc the results that it doesn’t like. Until sCAM practitoners accept this they cannot have a seat at the science table.

  16. skidoo says:

    This reminds me of the conjuction fallacy. That is, the more details we add to an hypothesis, the less likely the probability that it’s true.

    Premise: The primary ingredients of Doritos are salt and corn flour.

    Hypothesis A: Doritos are delicious (p).
    Hypothesis B: Doritos are delicious AND they come in a blue bag (q).

    We assign a high probability to p, say .99, and we even assign the same high probability to q.

    .99 x .99 = .9801

    So the probability of both conditions being true is less than the probability of either one being true.

    Extraordinary claims require extraordinary evidence. I’ve heard that somewhere before….

Comments are closed.