## Prior Probability: The Dirty Little Secret of “Evidence-Based Alternative Medicine”

This is actually the second entry in this series;† the first was Part V of the Homeopathy and Evidence-Based Medicine series, which began the discussion of why Evidence-Based Medicine (EBM) is not up to the task of evaluating highly implausible claims. That discussion made the point that EBM favors equivocal clinical trial data over basic science, even if the latter is both firmly established and refutes the clinical claim. It suggested that this failure in calculus is not an indictment of EBM’s originators, but rather was an understandable lapse on their part: it never occurred to them, even as recently as 1990, that EBM would soon be asked to judge contests pitting low powered, bias-prone clinical investigations and reviews against facts of nature elucidated by voluminous and rigorous experimentation. Thus although EBM correctly recognizes that basic science is an *insufficient* basis for determining the safety and effectiveness of a new medical treatment, it overlooks its *necessary* place in that exercise.

This entry develops the argument in a more formal way. In so doing it advocates a solution to the problem that has been offered by several others, but so far without real success: the adoption of Bayesian inference for evaluating clinical trial data.

Many readers will recognize that the term “prior probability” comes from Bayesian statistical analysis. They may correctly conclude that at least part of the reason to prefer Bayesian over “frequentist” statistical evaluations of clinical trials—which have been dominant throughout the careers of every physician now alive—is that the former require considering evidence external to the trial in question. That, of course, is what we should be doing in any case, but it helps to have a formal reminder. Bayes’ Theorem shows how our existing view (the prior probability) of the truth of a matter can be altered by new experimental data. Prior probability must be estimated from all existing evidence: basic science, previous clinical trials, funding sources, investigators’ identities and histories, and other factors. How conclusions based on such evidence might be altered by new data is illustrated by this statement of Bayes’ Theorem:

Where:

*P* stands for probability;

*A* is the hypothesis in question;

| stands for “given”; and

*B* is the data generated by the trial at hand.

Thus P(A|B), the probability of the hypothesis given the data (also called the “posterior probability”), is proportional to P(B|A), the probability of the data given the hypothesis, and also to P(A), the “prior probability” of the hypothesis. P(A|B) is *inversely* proportional to P(B), the probability of producing the data.

We might not know P(B), but it is a constant. Thus on the right side of the equation we can direct our attention to the terms in the numerator, which predict certain things: if the prior probability of a hypothesis is high, it will not require much in the way of confirming data to reassure us of that opinion. If the prior probability of a hypothesis is small, it will require a large amount of credible, confirming data to convince us to take it seriously. If the prior probability is exceedingly small, it will require a massive influx of confirming data to convince us to take it seriously (yes, extraordinary claims really do require extraordinary evidence). The simplest result, albeit one that many find discomfiting, is found if P(A) approaches zero: no amount of “confirming data”—especially of the error-prone sort generated by a clinical trial—should convince us to accept the hypothesis.

It turns out that the last assertion, although undeniably true, is not necessary to make the case for the superiority of Bayesian statistics in clinical research. I didn’t know that until I had read the following two articles:

Dr. Goodman makes the arguments for Bayesian inference in far more compelling ways than I can. Thus after a brief introduction I’ll quote him liberally—and ask his forgiveness in advance for whatever embarrassing inaccuracies I will be or may have already been guilty of spouting.

Dr. Goodman observes that it is the subjective nature of prior probabilities (“measuring ‘belief’ ”) that explains why clinical trial literature has shied away from Bayesian statistics, instead favoring the familiar “frequentist statistics” with its P values, confidence intervals, and hypothesis tests: tools that are widely assumed to provide objective measures of evidence for hypotheses by looking exclusively at data from trials. Those tools don’t provide such objective measures, however, nor can they. As any scientist and most physicians know, it is foolish to evaluate trial results without considering external knowledge. What most don’t know, however, is that “frequentist statistics” are irrational tools for the job that they have been assigned to do, and that they include methods that are not compatible even with each other. These points are introduced in the abstract of the first article cited above:

An important problem exists in the interpretation of modern medical research data: Biological understanding and previous research play little formal role in the interpretation of quantitative results. This phenomenon is manifest in the discussion sections of research articles and ultimately can affect the reliability of conclusions. The standard statistical approach has created this situation by promoting the illusion that conclusions can be produced with certain “error rates,” without consideration of information from outside the experiment. This statistical approach, the key components of which are P values and hypothesis tests, is widely perceived as a mathematically coherent approach to inference. There is little appreciation in the medical community that the methodology is an amalgam of incompatible elements, whose utility for scientific inference has been the subject of intense debate among statisticians for almost 70 years. This article introduces some of the key elements of that debate and traces the appeal and adverse impact of this methodology to the P value fallacy, the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result. This argument is made as a prelude to the suggestion that another measure of evidence should be used—the Bayes factor, which properly separates issues of long-run behavior from evidential strength and allows the integration of background knowledge with statistical findings.”

The “intense debate” that Dr. Goodman refers to is over a problem central to science: that of “inductive” vs. “deductive” reasoning. As many will recall from college philosophy courses, “inductive” reasoning uses observations to generate hypotheses: if the first 10,000 swans one sees are white, then a reasonable (tentative) hypothesis is that all swans are white. That is the way science, including clinical trials, usually works. The obvious problem with it is that it can’t be conclusive: the 10,001st swan might be black. “Deductive” reasoning begins with a principle and makes predictions: if at least some swans are white, then the next one we see has a probability > zero of being white. Deductive reasoning is logically sound, but has obvious limitations as a tool for learning about nature.

It turns out that “frequentist statistics” not only lacks a formal way to consider external evidence, but is inappropriate for evaluating clinical trials for a more fundamental reason: it applies only to deductive inference. Thus

…when physicians are presented with a single-sentence summary of a study that produced a surprising result with P = 0.05, the overwhelming majority will confidently state that there is a 95% or greater chance that the null hypothesis is incorrect. This is an understandable but categorically wrong interpretation because the P value is calculated on the assumption that the null hypothesis is true. It cannot, therefore, be a direct measure of the probability that the null hypothesis is false.

This logical error reinforces the mistaken notion that the data alone can tell us the probability that a hypothesis is true.” (emphasis added)

On the other hand,

Determining which underlying truth is most likely on the basis of the data is a problem in inverse probability, or inductive inference, that was solved quantitatively more than 200 years ago by the Reverend Thomas Bayes.”

Dr. Goodman explains this expertly, but for me it still required a few reads before it began to sink in. (Please, dear reader, if you hope to see Evidence-Based Medicine become synonymous with Science-Based Medicine, tackle these articles.) The final sentence in the abstract above introduces the second point about Bayesian statistics that I had not previously appreciated, which follows from its application to inductive inference: it is not necessary to dwell on Prior Probability estimates to appreciate the superiority of the Bayesian method. Another term in the theorem, known as the Bayes Factor, is calculated entirely from objective data but is a more useful and accurate “measure of evidence” than the familiar “P value.” The Bayes Factor is illustrated in this statement of Bayes’ Theorem (from Dr. Goodman’s second article):

**(Prior Odds of Null Hypothesis) X (Bayes Factor) = Posterior Odds of Null Hypothesis**

Where Bayes factor = Prob(Data, given the null hypothesis)

Prob(Data, given the alternative hypothesis)

The abstract of Goodman’s second article continues the discussion (emphasis added):

Bayesian inference is usually presented as a method for determining how scientific belief should be modified by data. Although Bayesian methodology has been one of the most active areas of statistical development in the past 20 years, medical researchers have been reluctant to embrace what they perceive as a subjective approach to data analysis. It is little understood that Bayesian methods have a data-based core, which can be used as a calculus of evidence. This core is the Bayes factor, which in its simplest form is also called a likelihood ratio. The minimum Bayes factor is objective and can be used in lieu of the P value as a measure of the evidential strength. Unlike P values, Bayes factors have a sound theoretical foundation and an interpretation that allows their use in both inference and decision making.

Bayes factors show that P values greatly overstate the evidence against the null hypothesis.Most important, Bayes factors require the addition of background knowledge to be transformed into inferences—probabilities that a given conclusion is right or wrong. They make the distinction clear between experimental evidence and inferential conclusions while providing a framework in which to combine prior with current evidence.”

At this point I must stop, but let me suggest a fun project: pick a “CAM” study of an implausible hypothesis such as homeopathy, “distant healing,” or whatever, that has been evaluated by “frequentist statistics” and purports to demonstrate an effect “significant at P=.04” or so. Now, using your new knowledge of inductive inference, re-evaluate the data using a few points from a range of prior odds of the null hypothesis being true, say 8 to 1—99,999 to 1 (which are far more favorable to homeopathy, for example, than established knowledge warrants). You needn’t even make calculations: both Goodman and Ioannidis provide tables and nomograms that can help you estimate the answers.

Here is a short additional bibliography, including a couple of shameless pitches for offerings by two of your humble bloggers:

1. Browner W, Newman T. Are all significant P values created equal? The analogy between diagnostic tests and clinical research. JAMA. 1987;257:2459-63.

2. Stalker DF. Evidence and Alternative Medicine. The Mt. Sinai J of Med.1995 62;2:132-143

3. Brophy JM, Joseph L. Placing trials in context using Bayesian analysis. GUSTO revisited by Reverend Bayes. JAMA. 1995;273:871-5.

4. Lilford R, Braunholtz D. The statistical basis of public policy: a paradigm shift is overdue. BMJ. 1996;313:603-7.

5. Freedman L. Bayesian statistical methods [Editorial]. BMJ. 1996;313:569-70.

6. Sampson WI, Atwood KC IV. Propagation of the absurd: demarcation of the absurd revisited. MJA 2005; 183 (11/12): 580-581

7. Atwood KC IV. Prior Probability: the dirty little secret of “Evidence-Based Alternative Medicine.” Talk given at the 11th European Skeptics Congress. London, September 5-7, 2003

………………………….

**† ****The Prior Probability, Bayesian vs. Frequentist Inference, and EBM Series:**

1. Homeopathy and Evidence-Based Medicine: Back to the Future Part V

2. Prior Probability: The Dirty Little Secret of “Evidence-Based Alternative Medicine”

3. Prior Probability: the Dirty Little Secret of “Evidence-Based Alternative Medicine”—Continued

4. Prior Probability: the Dirty Little Secret of “Evidence-Based Alternative Medicine”—Continued Again

5. Yes, Jacqueline: EBM ought to be Synonymous with SBM

6. The 2nd Yale Research Symposium on Complementary and Integrative Medicine. Part II

7. H. Pylori, Plausibility, and Greek Tragedy: the Quirky Case of Dr. John Lykoudis

10. Of SBM and EBM Redux. Part I: Does EBM Undervalue Basic Science and Overvalue RCTs?

11. Of SBM and EBM Redux. Part II: Is it a Good Idea to test Highly Implausible Health Claims?

12. Of SBM and EBM Redux. Part III: Parapsychology is the Role Model for “CAM” Research

13. Of SBM and EBM Redux. Part IV: More Cochrane and a little Bayes

14. Of SBM and EBM Redux. Part IV, Continued: More Cochrane and a little Bayes

15. Cochrane is Starting to ‘Get’ SBM!

16. What is Science?

You wrote:

Layman’s question: How does one derive an accurate quantity for the prior plausibility value?

Very nice article. Now I understand why this blog is called Science Based Medicine.

The obvious solution is to require discussion of the a priori probabilities before the data from the study is taken. For a clinical trial, the IRB would have to demand that the a priori probabilities of benefit and harm be within certain ranges. That is really where the a priori probabilities have to be taken into account, before people are subjected to a clinical trial which might help or hurt them.

Journals and reviewers need to start demanding that a discussion of the a priori probability be included in any clinical trial article. Perhaps as supplemental information, but it should be required.

I think the a priori probability of this being accepted by investigators, journals, reviewers, IRBs and funding agencies (government and corporate) is very small.

This type of statistical framework could be used to incorporate even anecdotal “data”. It doesn’t give it the weight that the CAM supporters would like, but it doesn’t give it zero (or negative weight) that many self-described “scientific-purists” do. In thinking about it, clinical trials should try to avoid both false positive and false negative errors. The way to do this is to calculate how the trial data changes the a priori probabilities of each.

The prior probability is discussed in an informal manner in the introduction to a study, when the researchers essentially present the justification for performing this particular study. I think part of the reason that this issue doesn’t get much play when it comes to regular medical research is that it doesn’t make a difference if your a priori assumption is that of even odds – an assumption which is reasonable much of the time.

Linda

I had hoped to wait a bit before adding my two cents, but:

I’ll write a short addendum over the weekend addressing the issue of estimating prior probabilities. The short answer: precise values are not possible for hypotheses of the sorts usually involved in clinical trials, but precision isn’t as important as range, for which there can be wide agreement.

Linda’s point is essentially the same that I made in “Homeopathy Part V” regarding EBM’s failure to account for plausibility. “Even odds,” however, are higher than is usually justified even for plausible hypotheses, as I’ll explain in the addendum, and although reasonable prior odds (nevertheless < even) are usually sufficient to justify performing a study, they still need to be considered (they make a difference) when evaluating the results.

Skidoo,

This table from Ioannidis provides a rough guide for prior probabilities for different types of research.

http://medicine.plosjournals.org/perlserv/?request=slideshow&type=table&doi=10.1371/journal.pmed.0020124&id=4104

The column headed ‘R’ is essentially the prior odds (probability=odds/(1+odds)) and the PPV is essentially your posterior probability.

And it sounds like Kimball Atwood will add to this over the weekend.

Linda

@Linda:

Thanks. That’s interesting. So there has to be consensus about the robustness of the study that’s already been done on the subject? Or at least, those factoring in prior plausibility have to be prepared to defend their position regarding the range of values they selected.

““Even odds,” however, are higher than is usually justified even for plausible hypotheses, as I’ll explain in the addendum, and although reasonable prior odds (nevertheless < even) are usually sufficient to justify performing a study, they still need to be considered (they make a difference) when evaluating the results.”

I don’t disagree with you. It was my attempt to forestall the criticism that CAM is being held to standards that EBM ignores when it comes to conventional medicine. I wished to point out that prior probability is taken into account as the type of evidence which is given a grade of ‘A’ also tends to have even (or better) odds, while Grade B or lower evidence corresponds to the kind of research that has less than even odds. That doesn’t mean that we wouldn’t be better off making prior probability explicit though, for the reasons you have outlined in greater detail.

Linda

Skidoo,

I am going to wait for Kimball Atwood’s addendum before I elaborate, as I don’t want to talk at cross-purposes.

Linda

I just realized I keep writing, “prior

plausibility,” when I should be writing “priorprobability.” I understand the difference, but my brain is being uncooperative.A very interesting example for a number of people might be estimating the prior probability for Marshall and Warren’s early work on Helicobacter pylori and its impact on gastroduodenal management. I frequently have Marshall quoted to me as a variation on the Galileo gambit so establishing whether he and Warren would have been helped or hindered by Bayesian techniques would be useful.

I have no doubt Marshall and Warren would have been helped. Their hypothesis was consistent with all the data of ulcer management and treatment. It simply wasn’t consistent with the prior interpretation of that data which happened to be wrong.

This type of thing is something I spend a lot of time thinking about because it reflects some of the difficulties I am having trying to advance my research on the health effects of commensal nitric oxide generating bacteria. People who believe in EBM won’t talk to me because I don’t have any clinical data. I can’t get any clinical data without funding and I can’t get any funding without clinical data. I can’t bring myself to talk with anyone not doing EBM (i.e. woo).

It doesn’t matter that my hypothesis (that these bacteria are an important and positive factor in health) fits the data in the literature better than the default hypothesis that they are not important. All the diseases and disorders they are (extremely likely to be) a factor in have prior interpretations of what is important and these bacteria are not considered a factor (simply because they have never been considered). It conflicts with many ideas that people have (which happen to be wrong) such as all bacteria are bad, being clean is good and good for you, and all disorders are a breakdown of homeostasis.

My estimate of the prior probability of these bacteria being important in some things is very high (90% or greater based on my understanding of the literature). The prior probability of these bacteria being harmful is very low (less than 0.01%), considering that there has never been a reported infection, these bacteria lack all virulence factors, these bacteria are extremely common in the environment, and many municipal water supplies are abundant sources of them).

I know that eventually I will be shown to be correct, and then there will be the Daedalus gambit for purveyors of woo to fall back on. That is actually one of the things I find most distressing.

Wait a minute…

1. how is this not just a dressed up way to refuse to accept evidence based on your preconceptions? (Since, in most cases, you just pick a prior P that you think is “generous”). Suppose, hypothetically, that some new altmed treatment came out that sounded utterly ludicrous, but then was found to be effective in study after study, and replicated under controlled conditions by skeptical researchers. You would simply look at that research and say “well, I estimate the prior P of this treatment working as epsilon, since it sounds so stupid, therefore these studies don’t prove anything”. Why should an impartial observer accept your post-hoc calculation of prior P (which you can of course set as low as you like in order to make any evidence worthless) as carrying more weight than the studies themselves?

2. Why is this even necessary? I had been under the impression that homeopathy had no sound studies supporting its claims even without introducing (rightfully) low prior P estimates. Is this just a way to attack alt med treatments that don’t violate known laws of physics? (Such as herbal medicine, which actually has some treatments that work, since, you know, some herbs contain beneficial compounds.)

Obviously I didn’t get the “addendum” up over the weekend, and now that you commenters have made several more good points I can tell it’ll take a couple of weeks worth of posts just to address them all. But I promise to do so.