This is an addendum to my previous entry on Bayesian statistics for clinical research.† After that posting, a few comments made it clear that I needed to add some words about estimating prior probabilities of therapeutic hypotheses. This is a huge topic that I will discuss briefly. In that, happily, I am abetted by my own ignorance. Thus I apologize in advance for simplistic or incomplete explanations. Also, when I mention misconceptions about either Bayesian or “frequentist” statistics, I am not doing so with particular readers in mind, even if certain comments may have triggered my thinking. I am quite willing to give readers credit for more insight into these issues than might be apparent from my own comments, which reflect common, initial difficulties in digesting the differences between the two inferential approaches. Those include my own difficulties, after years of assuming that the “frequentist” approach was both comprehensive and rational—while I had only a cursory understanding of it. That, I imagine, placed me well within two standard deviations of the mean level of statistical knowledge held by physicians in general.
Opinions are inescapable
First, a couple of general observations. As Steven Goodman and other Bayesian advocates have argued, we do not now avoid subjective beliefs regarding clinical hypotheses, nor can we. There is not a legitimate choice between making subjective estimates of prior probabilities and relying exclusively on “objective” data, notwithstanding the wishes of those who zealously cling to the “frequentist” school. The inescapable fact is that there is no logical resolution to the age-old epistemological dilemma of induction vs. deduction. “Frequentist” statistical tools, designed for deductive inference, do not address the question of how data from a particular experiment affect the probability of a hypothesis being correct—an inductive exercise.
The simplest illustration of the problem, as discussed by Goodman and others, is the “P-value fallacy“: while most physicians and many biomedical researchers think that a “P” of 0.05 for a clinical trial means that there is only a 5% chance that the null hypothesis is true, that is not the case. Here is what “P=0.05” actually means: if many similar trials are performed testing the same novel hypothesis, and if the null hypothesis is true, then it (the null) will be falsely rejected in 5% of those trials. For any single trial, it doesn’t tell us much.
“Non-informative priors”: Bayes factors are the new P-values
Turning to the matter at hand, it’s important to reiterate that the usefulness of Bayesian analysis does not depend exclusively on unique or known prior probabilities, or even on ranges that a group of experts might agree upon. The power of Bayes’ Theorem is to show how data from an investigation alter any prior probability to generate a new, “posterior probability.” Thus one class of prior probabilities, dubbed “non-informative” or “reference priors,” is arbitrary. As discussed in Goodman’s second article (pp. 1006-8), their usefulness stems from a term in Bayes’ Theorem that is entirely objective: the Bayes Factor. Recall that the Bayes Factor is a “likelihood ratio,” which in its simplest form compares how well the data reflect the best supported hypothesis to how well the data reflect the null hypothesis. This means that the Bayes Factor can be the useful “measure of evidence” for a single trial that most of us had previously assumed, incorrectly, the “P-value” to be. Goodman advocates this use and develops an “exchange rate” between the two values (p. 1007).
This can be illustrated by an example. A few years ago a report concluded: “we found that supplementary, remote, blinded, intercessory prayer produceda measurable improvement in the medical outcomes of critically ill patients.” The effect was said to be demonstrated by lower “CCU course scores” in the prayed-for group compared to the controls, which the authors deemed “statistically significant” because “P=.04.”
Although there were numerous flaws in the study, this discussion will be restricted to the P-value claim. The previous couple of paragraphs will have already alerted the reader to the fallacy of the “P=.04” assertion, but let’s put the cards on the table:
- The authors appeared to believe, and seemed to expect readers to believe, that their data showed that there was only a 4% chance of a “null” effect.
- Referring to Goodman’s assumptions for “exchange rates” between P values and Bayes Factors, P=.04 can be shown to be equivalent to a “minimum” Bayes factor (i.e., the one most consistent with the data) of slightly less than 0.15 (Table 2, p. 1008).
- Thus picking an arbitrary, “non-informative” prior probability of 50% (even odds) would yield a posterior probability (by Bayes Theorem) of the “null” effect of about 12%: far greater than the ≤5% that we have been taught to think of as “statistically significant” evidence of a treatment effect.
- Using the same table, it can be seen that to be truly 96% confident of a non-null effect after the study, the prior probability of “intercessory prayer”—people praying from a distance for patients they do not know and who are unaware of it—would have to have been about 70%. (That is, the prior probability of the null hypothesis would have to have been about 30%).
By starting with a “neutral” prior, using Bayesian statistics in this case demonstrates that the data did not show what the investigators concluded. Alternatively, there needn’t have been a unique prior probability estimate, or the investigators might have been asked to offer their own estimates—even if those might have seemed unrealistically “enthusiastic”—which could then have been compared to “skeptical” priors offered by others. Both of those terms appear in Bayesian literature as examples of arbitrary priors—or so I gather. Whatever the estimates, they would have been available for others to scrutinize.
Extrapolating to more reasonable prior probability realms (see below), it would have required far more dramatic findings to reach a posterior probability of ≥95% of a non-null effect. If we also impose varying estimates of bias, the posterior probabilities of a non-null effect become lower, maybe vastly lower.
Estimating “reasonable prior probability realms”—which offers the fullest use of Bayes’ Theorem for clinical trials and health care policy questions—is a topic of major interest in the field. Rather than address it myself (I’m not competent to do so), I’ll make a brief introduction and then refer readers to more informative sources. First, although there is always a subjective element, there are systematic means to estimate “priors,” and these add considerable rigor to the exercise. Second, such priors are expected to be “transparent,” i.e., the bases for their derivation must be clearly stated. This is quite different from the way subjective opinion is typically introduced now, as illustrated on p. 1002 of Goodman’s first article. Third, priors (of all types) are usually given as distributions, not as discrete numbers. Fourth, they are expected to be derived from all pertinent information. This is from the non-quantitative, online Primer on Bayesian Statistics in Health Economics and Outcomes Research (O’Hagan & Luce):
Prior information should be based on sound evidence and reasoned judgements. A good way to think of this is to parody a familiar quotation: the prior distribution should be ‘the evidence, the whole evidence and nothing but the evidence’:
- ‘the evidence’ – genuine information legitimately interpreted;
- ‘the whole evidence’ – not omitting relevant information (preferably a consensus that pools the knowledge of a range of experts);
- ‘nothing but the evidence’ – not contaminated by bias or prejudice.
A more sophisticated, quantitative treatment can be found here. Peruse the table of contents, especially chapter 5.
Most hypotheses are wrong…
There is at least one broad theme regarding estimating prior probabilities. It is virtually self-evident that most hypotheses are wrong. This follows from two premises: 1. the number of possible hypotheses is limited only by human imagination; 2. most people’s imaginations are not informed by a sophisticated appreciation for nature or science. Even those who are exceptions to the second premise are humbled by the elusiveness of fruitful insights. Consider the words of Nobel laureate Peter Medawar, one of the great immunologists of the 20th century:
It is a layman’s conclusion that in science we caper from pinnacle to pinnacle of achievement and that we exercise a method which preserves us from error. Indeed we do not; our way of going about things takes it for granted that we guess less often right than wrong…
It is likely that the multitude of less gifted scientists will guess right even less often. John Ioannidis, whose recent paper “Why most published research findings are false” is a favorite among your SBM bloggers, asserts in his article that “the majority of modern biomedical research is operating in areas with very low pre- and post-study probability for true findings.” (Hold the phone! Steven Goodman, our other hero of Bayesian analysis, disagreed with Ioannidis, although not entirely; Ioannidis replied here).
According to FDA Consumer,
More often than many scientists care to admit, researchers just have to give up when a drug is poorly absorbed, is unsafe, or simply doesn’t work. The organization Pharmaceutical Research and Manufacturers of America estimates that only 5 in 5,000 compounds that enter preclinical testing make it to human testing, and only 1 of those 5 may be safe and effective enough to reach pharmacy shelves.
Even granting a degree of self-serving exaggeration by “BigPharm,” it is clear that most proposed, biologically plausible drugs never pan out.
…Hence most priors are low (but you still have to estimate each on its own terms)
Add to all of that the ridiculous certitudes of the hopelessly naive, and it is clear that there is a surfeit of guessing wrong about how things work. Does that mean that most proposals for trials have a prior probability of, say, less than 0.5? You bet it does (much less, in most cases), although that fact does not properly figure into the estimate of any single proposal.
Later: some words about other useful comments regarding the “Galileo gambit” and more.
† The Prior Probability, Bayesian vs. Frequentist Inference, and EBM Series:
16. What is Science?