Of SBM and EBM Redux. Part IV, Continued: More Cochrane and a little Bayes

OK, I admit that I pulled a fast one. I never finished the last post as promised, so here it is.

Cochrane Continued

In the last post I alluded to the 2006 Cochrane Laetrile review, the conclusion of which was:

This systematic review has clearly identified the need for randomised or controlled clinical trials assessing the effectiveness of Laetrile or amygdalin for cancer treatment.

I’d previously asserted that this conclusion “stand[s] the rationale for RCTs on its head,” because a rigorous, disconfirming case series had long ago put the matter to rest. Later I reported that Edzard Ernst, one of the Cochrane authors, had changed his mind, writing, “Would I argue for more Laetrile studies? NO.” That in itself is a reason for optimism, but Dr. Ernst is such an exception among “CAM” researchers that it almost seemed not to count.

Until recently, however, I’d only seen the abstract of the Cochrane Laetrile review. Now I’ve read the entire review, and there’s a very pleasant surprise in it (Professor Simon, take notice). In a section labeled “Feedback” is this letter from another Cochrane reviewer, which was apparently added in August of 2006, well before I voiced my own objections:

The authors’ state that they: “[have] clearly identified the need for randomised or controlled clinical trials assessing the effectiveness of Laetrile or amygdalin for cancer treatment.” This is to fail completely to understand the nature of oncology research in which agents are tested in randomized trials (“Phase III”) only after they have been successful in Phase I and II study. There was a large Phase II study of laetrile (N Engl J Med. 1982 Jan 28;306(4):201-6) which the authors of the review do not cite, they merely exclude as being non-randomized. But the results of the paper are quite clear: there was no evidence that laetrile had any effect on cancer (all patients had progression of disease within a few months); moreover, toxicity was reported. To expose patients to a toxic agent that did not show promising results in a single arm study is clinical, scientific and ethical nonsense.

I would like to make a serious recommendation to the Cochrane Cancer group that no reviews on cancer are published unless at least one of the authors either has a clinical practice that focuses on cancer or actively conducts primary research on cancer. My recollection when the Cochrane collaboration was established was that the combination of “methodologic” and “content” expertise was essential.

Wow! That letter makes several of the same arguments that we’ve made here: that for both scientific and ethical reasons, scientific promise (including success in earlier trials) ought to be a necessary pre-requisite for a large RCT; that the 1982 Moertel case series was sufficient to disqualify Laetrile; and that EBM, at least in this Cochrane review, suffers from “methodolatry.” It also brings to mind Steven Goodman’s words:

An important problem exists in the interpretation of modern medical research data: Biological understanding and previous research play little formal role in the interpretation of quantitative results. This phenomenon is manifest in the discussion sections of research articles and ultimately can affect the reliability of conclusions. The standard statistical approach has created this situation by promoting the illusion that conclusions can be produced with certain “error rates,” without consideration of information from outside the experiment.

This method thus facilitated a subtle change in the balance of medical authority from those with knowledge of the biological basis of medicine toward those with knowledge of quantitative methods, or toward the quantitative results alone, as though the numbers somehow spoke for themselves.

Perhaps most surprising about the ‘Feedback’ letter is the identity of its author: Andrew Vickers, a biostatistician who wrote the Center for Evidence-Based Medicine’s “Introduction to evidence-based complementary medicine.” I’ve complained about that treatise before in this long series, observing that

There is not a mention of established knowledge in it, although there are references to several claims, including homeopathy, that are refuted by things that we already know.

Well, Dr. Vickers may not have considered plausibility when he wrote his Intro to EBCM, but he certainly seems to have done so when he wrote his objection to the Cochrane Laetrile review. Which is an appropriate segue to a topic that Dr. Vickers hints at (“content expertise”), perhaps unintentionally, in the letter quoted above: Bayesian inference.

Bayes Revisited

A few years ago I posted three essays about Bayesian inference: they are linked below (nos. 2-4). The salient points are these:

  1. Bayes’s Theorem is the solution to the problem of inductive inference, which is how medical research (and most science) proceeds: we want to know the probability of our hypothesis being true given the data generated by the experiment in question.
  2. Frequentist inference, which is typically used for medical research, applies to deductive reasoning: it tells us the probability of a set of data given the truth of a hypothesis. To use it to judge the probability of the truth of that hypothesis given a set of data is illogical: the fallacy of the transposed conditional.
  3. Frequentist inference, furthermore, is based on assumptions that defy reality: that there have been an infinite number of identically designed, randomized experiments (or other sort of random sampling), without error or bias.
  4. Bayes’s Theorem formally incorporates, in its “prior probability” term, information other than the results of the experiment. This is the sticking point for many in the EBM crowd: they consider prior probability estimates, which are at least partially subjective, to be arbitrary, capricious, untrustworthy, and—paradoxically, because it is science that is ignored in the breach—unscientific.
  5. Nevertheless, prior probability matters whether we like it or not, and whether we can estimate it with any certainty or not. If the prior probability is high, even modest experimental evidence supporting a new hypothesis deserves to be taken seriously; if it is low, the experimental evidence must be correspondingly robust to warrant taking the hypothesis seriously. If the prior probability is infinitesimal, the experimental evidence must approach infinity to warrant taking the hypothesis seriously.
  6. Frequentist methods lack a formal measure of prior probability, which contributes to the seductive but erroneous belief that “conclusions can be produced…without consideration of information from outside the experiment.”
  7. The Bayes Factor is a term in the theorem that is based entirely on data, and is thus an objective measure of experimental evidence. Bayes factors, in the words of Dr. Goodman, “show that P values greatly overstate the evidence against the null hypothesis.”

I bring up Bayes again to respond to Prof. Simon’s statements, recently echoed by several readers, that people may differ strongly in what they consider plausible, and that it is not clear how prior probability estimates might be incorporated into formal reviews. I’ve discussed these issues previously (here and here, and in recent comments here and here), but it is worth adding a point or two.

First, it doesn’t really matter that people may differ strongly in what they consider plausible. What matters is that they commit to some range of plausibility—in public and with justifications, in the cases of authors and reviewers, so that readers will know where they stand—and that everyone understands that this matters when it comes to judging the experimental evidence for or against a hypothesis.

An example will explain these points. Wayne Jonas was the Director of the US Office of Alternative Medicine from 1995 until its metamorphosis into the NCCAM in 1999. He is the co-author, along with Jennifer Jacobs, of Healing with Homeopathy: the Doctors’ Guide (©1996), which unambiguously asserts that ultra-dilute homeopathic preparations have specific effects. Yet Jonas is also the co-author (with Klaus Linde) of a 2005 letter to the Lancet that includes this statement, prefacing his argument that homeopathy, already subjected to hundreds of clinical trials, has not been disproved and deserves further trials:

We agree that homoeopathy is highly implausible and that the evidence from placebo-controlled trials is not robust.

Bayes’s theorem shows that Jonas can’t have it both ways. Either he doesn’t really agree that homeopathy is highly implausible (which seems likely, unless he changed his mind between 1996 and 2005—oops, he didn’t); or, if he does, he needs to recognize that his statement quoted above is equivalent to arguing that the homeopathy ‘hypothesis’ has been disproved, at least to an extent sufficient to discourage further trials.

Next, does it matter that we can’t translate qualitative statements of plausibility to precise quantitative measures? Does this mean that prior probability, in the Bayesian sense, is not applicable? I don’t think so, and neither do many scientists and statisticians. Even “neutral” or “non-informative” priors, when combined with Bayes factors, are more useful than P values (see #7 above). “Informative” priors—estimated priors or ranges of priors based on existing knowledge—are both useful and revealing: useful because they show how differing priors affect the extent to which we ought to revise our view of a hypothesis in the face of new experimental evidence (see #5 above); and revealing of where authors and others really stand, and of the information that those authors have used to make their estimates.

I believe that frequentist statistics has allowed Dr. Jonas and other “CAM” enthusiasts to project a posture of scientific skepticism, as illustrated by Jonas’s words quoted above, without having to accept the consequences thereof. If convention had compelled him to offer a prior high enough to warrant further trials of homeopathy, Dr. Jonas would have revealed himself as credulous and foolish.

Finally, there is no reason that qualitative priors can’t be translated, if not precisely then at least usefully, to estimated quantitative priors. Sander Greenland, an epidemiologist and a Bayesian, explains this in regard to household wiring as a possible risk factor for childhood leukemia. First, he argues that there are often empirical bases for estimating priors:

…assuming (an) absence of prior information is empirically absurd. Prior information of zero implies that a relative risk of (say) 10100 is as plausible as a value of 1 or 2. Suppose the relative risk was truly 10100; then every child exposed >3 mG would have contracted leukaemia, making exposure a sufficient cause. The resulting epidemic would have come to everyone’s attention long before the above study was done because the leukaemia rate would have reached the prevalence of high exposure, or ~5/100 annually in the US, as opposed to the actual value of 4 per 100,000 annually; the same could be said of any relative risk >100. Thus there are ample background data to rule out such extreme relative risks.

The same could be said for many “CAM” methods that, while not strictly subjects of epidemiology per se, have generated ample experimental data (see homeopathy) or have been in use by enough people for enough time to have been noticed for substantial deviations from typical outcomes of universal diseases, should such deviations exist (see “Traditional [insert ethnic group here] Medicine”).

Next, Greenland has no problem with non-empirically generated priors, because these are revealing as well:

Many authors have expressed extreme scepticism over the existence of an actual magnetic-field effect, so much so that they have misinterpreted positive findings as null because they were not ‘statistically significant’ (e.g. UKCCS, 1999). The Bayesian framework allows this sort of prejudice to be displayed explicitly in the prior, rather than forcing it into misinterpretation of the data.

By “misinterpretation,” Greenland is arguing not that the “positive findings” of epidemiologic studies have proven the existence of a magnetic field effect, but that the objections of extreme skeptics must be made explicit: it is their presumed, if unstated, prior probability estimates that justify their conclusions about whether or not there is an actual magnetic field effect associated with childhood leukemia; it is not the data collection itself. Prior probability estimates put people’s cards on the table.

I recommend the rest of Greenland’s article, which is full of interesting stuff. For example, he doesn’t agree that “objective” Bayesian methods, using non-informative priors (see my point #7 above) are more useful than frequentist methods, since they are really doing the same thing:

…frequentist results are what one gets from the Bayesian calculation when the prior information is made negligibly small relative to the data information. In this sense, frequentist results are just extreme Bayesian results, ones in which the prior information is zero, asserting that absolutely nothing is known about the [question] outside of the study. Some promote such priors as ‘letting the data speak for themselves’. In reality, the data say nothing by themselves: The frequentist results are computed using probability models that assume complete absence of bias and so filter the data through false assumptions.

All for now. In the next post I’ll discuss another Cochrane review that has some pleasant surprises.

*The Prior Probability, Bayesian vs. Frequentist Inference, and EBM Series:

1. Homeopathy and Evidence-Based Medicine: Back to the Future Part V

2. Prior Probability: The Dirty Little Secret of “Evidence-Based Alternative Medicine”

3. Prior Probability: the Dirty Little Secret of “Evidence-Based Alternative Medicine”—Continued

4. Prior Probability: the Dirty Little Secret of “Evidence-Based Alternative Medicine”—Continued Again

5. Yes, Jacqueline: EBM ought to be Synonymous with SBM

6. The 2nd Yale Research Symposium on Complementary and Integrative Medicine. Part II

7. H. Pylori, Plausibility, and Greek Tragedy: the Quirky Case of Dr. John Lykoudis

8. Evidence-Based Medicine, Human Studies Ethics, and the ‘Gonzalez Regimen’: a Disappointing Editorial in the Journal of Clinical Oncology Part 1

9. Evidence-Based Medicine, Human Studies Ethics, and the ‘Gonzalez Regimen’: a Disappointing Editorial in the Journal of Clinical Oncology Part 2

10. Of SBM and EBM Redux. Part I: Does EBM Undervalue Basic Science and Overvalue RCTs?

11. Of SBM and EBM Redux. Part II: Is it a Good Idea to test Highly Implausible Health Claims?

12. Of SBM and EBM Redux. Part III: Parapsychology is the Role Model for “CAM” Research

13. Of SBM and EBM Redux. Part IV: More Cochrane and a little Bayes

14. Of SBM and EBM Redux. Part IV, Continued: More Cochrane and a little Bayes

15. Cochrane is Starting to ‘Get’ SBM!

16. What is Science? 

Posted in: Clinical Trials, Homeopathy, Medical Academia, Science and Medicine

Leave a Comment (63) ↓

63 thoughts on “Of SBM and EBM Redux. Part IV, Continued: More Cochrane and a little Bayes

  1. daedalus2u says:

    I think showing actual calculations of final probabilities starting with different Bayesian priors would be informative. If you have a really high prior, then equivocal studies drop it down pretty fast. If you have a low prior, equivocal studies don’t bring it up. That is essentially a sensitivity analysis on how the prior affects the outcome.

    If you tried to find a prior that would justify more studies given a bunch of equivocal studies, you can’t.

  2. Scott says:

    One of the real obstacles to doing Bayesian analysis is that frequentist is easier. And there are more readily-available and usable tools to help with frequentist analysis, so in practice it’s MUCH easier.

    Which doesn’t even consider “that’s how it’s done.” One can’t even count on scientists to understand frequentist statistics – far too often, the answer given to “if I say that the 5% confidence interval for alpha is 1.0 to 2.0, does that mean that there is a 95% chance alpha is within that range?” is “yes.” Even in physics, which is far more mathematically based than medicine. When the level of statistical knowledge of the COMMON tools is that poor, how can we expect scientists to understand why Bayes is so important?

    This sort of cultural change will be an uphill battle. Important, but very difficult.

  3. rork says:

    Not bad. I take it as a call to map priors to posteriors, and that this should involve discussion of the reasonableness of the prior. Leukemia example was excellent.

    I’d disagree with Scott’s argument based on popularity. The common tools are actually confusing when judged as models of learning – I’m not sure they even claim to model learning at all. In class#1 for graduate biologists learning stats, I’d toss a coin one time, and we’d discuss what inferences could be made, and how we would gamble on the results of the next toss. The effects on the opposition are like depleted uranium rounds. Students would, like the reviews criticized here, cry that the real problem was that more data was needed. Yes, more data always helps, and that will never change (and get used to it) but stupid is stupid, and the stupid is just more eminent when we look at examples were data is limited. Note that I’m not claiming that for simple experiments that giving p and N’s are not sufficient statistics – yes, some folks aren’t able to interpret them perfectly, I admit.

    Can we agree that scientists need more training in this area?
    Is my doc competent to read the medical literature, even now? (Perhaps only a subset of them that will publish expert summary advice need to be Jedi’s, if the rest will follow them, but will they?)

  4. Scott says:

    @ rork:

    Actually, you’re reinforcing my point as opposed to disagreeing with it. The fact that the tools don’t promote learning is part of the problem! People want a simple black box whose crank they can turn and get an answer out without having to understand the statistics involved.

  5. JMB says:

    The usual application of frequentist methods in medicine also relies on randomization and patient selection to minimize effects of prior probabilities. Those same prior probabilities we are minimizing in an RCT become important factors in framing decisions for individual patients. Some of the uncertainty faced in medical practice can be reduced with Bayes methods. The adoption of Bayes methods can improve the quality of medical research, and reduce uncertainty of medical practice.

  6. Jan Willem Nienhuys says:

    @ daedalus2u on 04 Mar 2011 at 7:35 am

    I don’t know what kind of example you are thinking of, but here is one. We are dealing with test that says ‘yes’ correctly in 80% of the cases that reality is ‘yes’.
    The same test says ‘no’ correctly in 95% of the cases that reality is ‘no’ (this corresponds to using p=0.05 as significance limit). For reasons I don’t understand this positive correct rate 80% is called the power or sensitivity of the test.

    You first calculate 80% divided by 5% (true positive rate divided by false positive rate). This is 16. let’s call that the yes-factor of the test.

    Now calculate 20% divided by 95% . This is 1 over 4.75. It is the false negative rate divided by the true negative rate.

    Next you need the prior odds. Odds means the ratio [yes] : [no]. I like using ratios rather than quotients because I may want to talk about 0:1 or 1:0.
    The square brackets mean: probability of. Before you start you have a prior probability of, say, 0.2 . Now do the test. It comes out ‘yes’.
    Multiply 0.2 times 16, equals 3.2
    For safety you do another test, it also says yes. Multiply again by 16. Gives 51.2.
    So now the posterior odds are 51.2. That amounts to [no] = 1/ 52.2. Each time the test comes out ‘yes’, the odds go up by a factor 16. Even if the prior odds are one millionth, ten ‘yes’ test answers will raise the odds to one million.

    But if the test says ‘no’, we have to use the other number. The prior odds are divided by 4.75. After one step they are 0.0421… , after the next step (assuming another ‘no’) they are 0.00886… which is roughly also the posterior [yes] (more precisely 0.00878…).

    So if the tests keep saying ‘no’ the posterior odds keep dropping.

    If the yes and no answers both occur frequently, you are in trouble. The tests are wrong.

    The problem is often that the prior odds are often only vaguely known.

    Suppose we are dealing with the risk of HIV infection for a person not belonging to any known risk group. The prior odds are very small and unknown. But the yes-factor for an HIV-test is very large and also unknown, so the posterior odds after a ‘positive’ result are very small times very large so not negligible but unknown.

    Not only sometimes the prior odds aren’t known well, sometimes they aren’t even based on something that remotely can be called probabilities. For example ‘the probability that other life exists in this galaxy’.

    All this only applies to serious tests with a proper power and p-value. If one sets up an experiment with homeopathy nobody knows what effect to expect ‘if homeopathy works in this case’. So there goes your power. Moreover lots of homeopathy trials are done by thinking up the outcome criterion after breaking of the code (Texas sharpshooter technique) or other sins. I don’t know how rampant this practice is in ordinary medicine. I fear the worst.

    One solution is to see homeopathy trials as bets about a claim, something like the Randi Challenge. In that case the party offering the money is very careful. He doesn’t give out the million in advance to the claimant hoping the claimant will get a nice publication with an acknowledgement for the funding, i.e. the NCCAM technique (am I wrong?).

  7. Jan Willem Nienhuys says:

    Oops: I started with a prior probability of 1/6, so prior odds of 1/6 : 5/6, as a number: 0.2.

  8. JMB says:

    @Jan Willem Nienhuyson 04 Mar 2011 at 2:11 pm

    Be careful about mixing decisions about scientific hypotheses from scientific observation, with decisions about whether the result of a clinical test means that a patient has a disease. In scientific hypotheses, the problem is how to translate scientific concepts into a numeric figure of prior probability. If we allow several repetitions of scientific studies, the importance of the estimate of prior probability becomes rapidly diminished (except for extreme values). Acceptance or rejection of a scientific hypothesis really fits the Bayesian perception of probability… it’s not so much that there is real truth with some error in sampling, but the hypothesis either increases in acceptance, or decreases, based on any data that becomes available.

    In the case of the decision for a patient we can’t translate basic science concepts into a figure representing the prior probability of disease. In the clinical setting we often don’t have the luxury of repeated tests to diminish the bias of imprecise prior probabilities. The significance of “another hypothesis bites the dust” pales in comparison to the cost of being wrong for a patient. We need more empirical estimates for clinical decisions. An experienced good clinician often has reliable estimates based on experience. Ideally, we would have more observational data to give prior probabilities for clinical decisions, especially if we are going to reduce the training requirements for being a healthcare provider. In some ways the clinical decision fits the frequentist view more than the Bayes view… there is truth that will be know for the patient, it will become known over time. The patient either has or has not been infected by HIV, we just may not immediately know the truth.

  9. Jan Willem Nienhuys says:

    @ JMB on 04 Mar 2011 at 10:37 pm

    Hypotheses vs patients: the mathematics stays the same. But the difference is usually that in patients one has a reason (based on other symptoms, for instance) to order tests. I gave the HIV example to show what happens if there is no reason whatsoever, or only that an insurance company demands such a test.

    The same problem arises with mass screening.

    Another difference between patients and theories is that in the case of theories there actualy seldom is no prior probability in the ordinary sense. One does not have a series of comparable situations and some statistics about how often ‘the theory’ happens to be correct or false. In patients prior symptoms establish a hint about the prior probabilty. And many tests do better than an error rate of 5%. I imagine that a standard lab test for ‘blood glucose over 10 mmol/l’ will rarely be mistaken.

    I don’t quite know what is meant by the frequentist or Bayes view. But if the Bayes view involves treating belief and feelings of (im)plausibilty as probabilities, then count me with the frequentists. If ‘frequentist’ means sticking to the view that a probability in the end only can be based on counting how often something occurs in an ensemble of comparable situations, then I’m a frequentist too. I cringe at book titles like The Probability of God (in my opinion a cheap Bayesian trick that can be summarized in one paragraph).

    But if frequentist means doing experiments regardless of any prior plausibility, and accepting p<0.05 as irrefutable evidence that homeopathy works / cell phones cause cancer / pigs can fly / substance X cures cancer or MS / prayer or acupuncture helps IVF – then count me out.

  10. watso359 says:

    @JMB “if we are going to reduce the training requirements for being a healthcare provider”

    Interesting point about a “shortcut” to clinical competency.

    Do we really want to reduce training requirements though?

  11. Badly Shaved Monkey says:

    Scott, I struggle a bit with thie detailed arguments in this area. Please, expand on why the person who answers ‘yes’ in your example is wrong.

    “if I say that the 5% confidence interval for alpha is 1.0 to 2.0, does that mean that there is a 95% chance alpha is within that range?” is “yes.”

  12. Jan Willem Nienhuys says:

    @ Badly Shaved Monkey on 06 Mar 2011 at 7:15 am

    If the 5% confidence interval for a quantity A is [1.0, 2.0], it means that if A is actually outside this interval the outcome of the experiment (or one deviating even more) would have had a computed chance of not more than 5% of happening.

    The chance that A is somewhere (e.g. in the interval [1.0 , 2.0]) is not even a well defined concept. A is not a random variable that may or may not take values inside that interval.

    From Wikipedia:
    if the statistical model is correct, then taken over all the data that might have been obtained, the procedure for constructing the interval for A would deliver a confidence interval that included the true value of A in 95% of the time.

    So it is not A that is the random variable. It is the computed interval in an ensemble of repeats of the experimental process. The phrase would be somewhat more correct if it said:

    Such confidence intervals have a chance of 95% of containing the true value of A.

    Simplify: you write the number 1 on a piece of paper. Now you throw a die. It comes up 5. Can you say now: 5 has a probabilty of 0.16667 of being equal to the number on my paper? Or: the number on my paper has a 1/6 probability of being equal to 5? Well, you may say that, but is a mighty strange thing to say. Before you throw the die you can say something about the probability of the event that you will throw ‘the number’ on your paper. But after the throw there is no more probability, regardless of whether you have turned that paper upside down or forgotten what’s on it.

    Whenever you hear “chance of x% that event y happens”, always try to think what you would have to to in order to verify that actually y happens x times out of every 100 times.

  13. JMB says:

    @Jan Willem Nienhuyson 04 Mar 2011 at 2:11 pm

    The only point I am trying to make is that in the case of medical research, imprecise probabilities derived from logical application of basic science principles are sufficient to win the argument.

    On the medical practice side, we would much prefer to have more precise direct measures of risks and benefits. In the absence of direct measures, we may fall back on less precise inferred measures based on the Laplace strategy of Bayes.

  14. pmoran says:

    Steven Goodman herein points to flaws in the work leading to Ionnadis’ unfortunate generalization that “most published research findings are wrong.”

  15. SD says:


    Watching y’all fumble around with math is entertaining. (Note to the MDs in the crowd – if you want to start a stand-up comedy routine, start talking about math or chemistry to someone who actually knows something about it. The results are stupendously gratifying.)

    Most of the words you use do not mean what you think they mean. (“Prior” and “posterior”, in particular.) In the same way that you deride the folks who spend fifteen minutes cutting up frogs and doing research on Wikipedia, then proclaiming that they are ready to perform heart surgery, those of us who have some facility with numbers and mathematical concepts – or, God forbid, actual *academic credentials* in the subject – roll our eyes at those of you who cheat your way through a “Math for Doctors” course and then proclaim that you know anything at all about statistics.

    Here’s a hint – until you can actually re-derive Bayes’ Theorem from memory, you don’t know a damn thing about it. It’s not that hard, but I’m willing to bet that 90% of the people fulsomely bloviating about it here cannot actually reproduce it without reference to Wikipedia. (Go ahead, try it. I’ll wait.)

    (I will nod at one point, though, even though I’ve brought it up before: statistics does not provide capital-T Truth, “truth”, or even truthiness; it merely points an arrow, sometimes, at places where truth might be found if one digs hard enough. Elucidation of the biochemical processes behind illnesses is Truth; all else is at best handwaving, or at worst High Wankery, whether it’s homeopathy or pompous airs about the medical state-of-the-art.)

    “hint: Bayes’ theorem doesn’t have anything to do with spam”

  16. Jan Willem Nienhuys says:

    Bayes’ Theorem is not hard to derive indeed. The reason it looks so fearsome is that it is formulated in terms of probabilities instead of odds.

    If you start with odds, then BT is:

    posterior odds = prior odds times A/B

    Never mind what A/B is for the moment. Suppose the posterior probability of something called S is denoted by X, and the prior probabilty by x. We have, by definition of odds: posterior odds = X/(1-X); prior odds = x /(1-x).

    Prior and posterior refer to before and after some kind of experiment E that can yield two answers: ‘yes’ and ‘no’. Instead of ‘after the experiment’ you also may think ‘under the condition that experiment says so and so’.

    Plug this in and solve for x. This is junior high school math, the only thing required is a neat handwriting, so as not to confuse x and X. The result is:

    (*) X = xA / (B(1-x) + xA)
    Now for the meaning of these letters:

    X= the probability of S under the condition that E yields ‘yes’
    x = the probability of S
    (1-x) = the probability of not-S
    A = the probability that E yields ‘yes’, when S is true
    B = the probability that E yields ‘yes’, when not- S holds

    Now you have to devise a notation without any words in it for the above five sentences. Try things like P(S) and P(E | ~S) etc.

    You may think I cheated, because I left out the hard part, namely the odds version of BT. Not so.

    After doing experiment E there are basically 4 possibilities
    1. S holds, E says ‘yes’ (xA of all cases)
    2. S holds, E says ‘no’ (xC of all cases)
    3. S is false, E says ‘yes’ ( (1-x)B of all cases)
    4. S is false, E says ‘no’ ( (1-x)D of all cases)

    (Exercise: what are the meanings of C and D?)
    (Actually E doesn’t have to be an experiment, it can be any statement about the elements of the universe considered, just like S, as long as it is either true or false.)

    The posterior odds is just the ratio of 1 and 3. Of course you can derive from this also the probability form of BT, namely as
    (S holds, E says ‘yes’) divided by (totality of all cases where E says ‘yes’).

    The advantage of the odds form is that it is easier to remember. I have a poor memory, and I never can recall BT in the probability form. It’s too complicated. It is easier to derive it when you need it. In judging medical papers one has to deal with odds anyway, because of the ubiquitous odds ratio.

  17. EthanFoster says:

    The usual application of frequentist methods in medicine also relies on randomization and patient selection to minimize effects of prior probabilities. Those same prior probabilities we are minimizing in an RCT become important factors in framing decisions for individual patients. You first calculate 80% divided by 5% (true positive rate divided by false positive rate). This is 16. let’s call that the yes-factor of the test. If the yes and no answers both occur frequently, you are in trouble. The tests are wrong. Not only sometimes the prior odds aren’t known well, sometimes they aren’t even based on something that remotely can be called probabilities. For example ‘the probability that other life exists in this galaxy’. In the case of the decision for a patient we can’t translate basic science concepts into a figure representing the prior probability of disease. In the clinical setting we often don’t have the luxury of repeated tests to diminish the bias of imprecise prior probabilities. The significance of “another hypothesis bites the dust” pales in comparison to the cost of being wrong for a patient. We need more empirical estimates for clinical decisions. An experienced good clinician often has reliable estimates based on experience. Ideally, we would have more observational data to give prior probabilities for clinical decisions, especially if we are going to reduce the training requirements for being a healthcare provider. I struggle a bit with thie detailed arguments in this area. Please, expand on why the person who answers ‘yes’ in your example is wrong. Most of the words you use do not mean what you think they mean. (“Prior” and “posterior”, in particular.) In the same way that you deride the folks who spend fifteen minutes cutting up frogs and doing research on Wikipedia, then proclaiming that they are ready to perform heart surgery, those of us who have some facility with numbers and mathematical concepts – or, God forbid, actual *academic credentials* in the subject – roll our eyes at those of you who cheat your way through a “Math for Doctors” course and then proclaim that you know anything at all about statistics.
    neural interface

  18. SD says:


    … That’s gibberish.

    If I were to phrase that in the most charitable way, you are using highly nonstandard and incorrect terminology to describe something that is, at its core, simple. That you can’t remember it indicates too much time spent with the chronic during your education, if indeed you had any mathematical education beyond basic additionl

    If you can’t remember BT in the “probability form” [sic], then you don’t know what you’re talking about. End of story. It *ain’t* that hard.

    “j’ f’n c”

  19. SD,

    Then please have the kindness to show us how it’s done. In Dutch of course, since Jan Willem Nienhuys was so considerate as to provide his explanation in English for you.

  20. SD says:


    (side note: the derivation of that Godforsaken hash in the seventh paragraph actually *does* work out to Bayes’ theorem, but is expressed in an EXTREMELY overcomplicated way. Also, “odds” is a term used by bookies. Statisticians work in terms of “probability”, unless they work in Las Vegas or Atlantic City.)

    “for great justice”

  21. SD says:


    I do Russian, Spanish and English. Dutch is outside my sphere; sorry.

    However, that’s irrelevant; the terms of mathematics transcend linguistic boundaries. Difficult and foreign as it may seem to you, there is such a thing as standard notation in mathematics, and it does not depend on the native language of the author.

    And, just to be a dick: if he’s Dutch, then the Bayesian “prior probability” dictates that he was stoned beyond any capacity for short- or long-term recollection during his college years, since weed is legal in the Netherlands (or at least, not officially criminalized). I defy you to rebut that assertion. >;->


  22. SD says:

    … however:

    “Probability” [sic] form of Bayes’ theorem:

    P(A|B) = P(B|A) * P(A)/P(B)


    P(A|B) = P(A U B) / P(B)
    P(B|A) = P(B U A) / P(A)


    P(A|B) * P(B) = P(A U B)
    P(B|A) * P(A) = P(B U A)


    P(A U B) = P(B U A) [symmetric property of set union]


    P(A|B) * P(B) = P(A U B) = P(B|A) * P(A)


    P(A|B) * P(B) = P(B|A) * P(A)


    P(A|B) = P(B|A) * P(A) / P(B)

    QED. []

    This is accessible with high-school algebra. The implications, however, are not, necessarily.

    “i’ve had scarier TA’s than you can even imagine”

  23. Scott says:

    @ Badly Shaved Monkey:

    Jan got it exactly. The value of alpha is not the random variable; the computed confidence interval is. This confuses people because we only know the latter and not the former, so we tend to assume that the thing we don’t know is the random bit.

  24. rork says:

    I apologize for thinking I was against scott in my comment #3, and appreciate (now) that “it’s going to be hard” may actually be more important than many other details. I will read more carefully in future (he said for the 100th time).

    Did the much-smarter-than-us SD just use unions everywhere rather than intersections? I’m talking about that last post – the one that actually included content.

  25. Jan Willem Nienhuys says:

    Well, well. SD certainly knows how to impress people with his knowledge of mathematical notations and foreign countries.

    He purports to prove P(A|B) = P(B|A) * P(A)/P(B), in other words

    P(A|B) * P(B) = P(B|A) * P(A)

    and he does so by explaining in a complicated way by observing that both sides (by definition of the conditional probability P(A|B)) equal P(A U B), even invoking a set theoretic theorem about the union of two sets. Why not include as well the set theoretic proof of A U B = B U A (using only the Zermelo-Fraenkel axioms of set theory and the logical principle of substitution of equals)?

    But he is mistaken, it is the intersection, not the union. So put everywhere ∩ instead of ∪. Actually, when I wrote

    S holds, E says ‘yes’ (xA of all cases)

    i.e. in mathematian’s jargon: P(S ∩ E) = P(S)*P(E|S), I assumed this to be self evident (maybe I was too optimistic about my readers, but having taught set theory for many years to math students, one can become an optimist, I admit).

    I disapprove of using this kind of pompous notation outside of mathematics, and I find it somewhat funny that someone who is
    (1) hiding behind a pseudonym
    (2) rather harshly criticizes hypothetical people who can’t reproduce a proof of ‘the’ Bayes Theorem,
    (3) then criticizes a proof using a notation that is not usually found in mathematics textbooks, and
    (4) concludes with a proof in which he mixes up union and intersection, and
    (5) as an extra claims that ‘odds’ is a term that should be restricted to bookies.

    Maybe SD should read

    but I doubt that he will do so, because he frowns upon Wikipedia as if it is a kind of Cannabis sativa. Maybe he should read

    if he prefers an article written by named professors in statistics.

    SD’s version of Bayes Theorem has that name, but in various texts one will find

    P(A|B) = P(B|A) P(A) / (P(B|A) P(A) + P(B|~A) P(~A))

    For example, on page 51 of The Probability of God, by Stephen D. Unwin (2003), who repeats this formula about a dozen times, but who in essence starts with prior odds of 1 (‘even bet’) that God exists, then multiplies this by factors 2, 10, 1/2, 1/10 that express his beliefs in the various kinds of evidence pro and con, and then arrives at posterior odds 2 (0.666 probability that God exists). I call that a pompous smokescreen. It gives mathematics a bad name (theologians probably aren’t too happy either).

    Similar complicated formulas are also called Bayes Theorem. The one just mentioned is the one I tried to explain, and if you want to invoke the Supreme Being to denounce this jumble of letters and brackets and symbols you have my deepest sympathy. I dislike having to remember any formula that I can’t visualize.

    But please try to be a bit more civil about other people’s failings if you are confused about intersection and union.

  26. Jan Willem Nienhuys says:

    EthanFoster on 07 Mar 2011 at 7:37 am

    I would love to answer you to the best of my knowledge, but your post is a concatenation of quotes and some of the connecting text seems to be missing.

    You wrote

    Please, expand on why the person who answers ‘yes’ in your example is wrong.

    I don’t know whether that question was directed to me. I am not aware of having mentioned people saying ‘yes’, only tests saying ‘yes’. That test can be some RCT (i.e. not a person) with as result ‘yes, the treatment works, because the verum group improved more than the control group, p<0.001', or it can be a laboratory test or diagnostic saying 'yes, this person infected with X, or has prostate cancer, or whatever'.

    In the latter case it makes a lot of difference whether there is any prior information (e.g. symptoms) that the person is suffering from the dreaded affliction. That is precisely the reason why mass screening may be useless, even when individual diagnostics for people with symptoms make perfectly good sense.

    In the former case it makes a lot of difference whether the RCT is the crowning piece of evidence after a great many other investigations, or just a piece of shoddy work that would imply relegating 200 years of science to the dustbin if it were correct.

  27. I would suggest that SD is doing the intellectual equivalent of flashing. Revealing himself with the intent to impress, but actually embarrassing himself deeply in the process.

    so sad

  28. Harriet Hall says:


    I like the analogy. I knew a girl who devastated a flasher by saying “What’s that? It looks like a penis, only smaller.”

  29. SD says:

    HAH! Ahh, I love being spectacularly wrong like that in public… Sure as hell won’t make *that* mistake ever again. Thanks for catching that union/intersection gaffe – that WAS stupid of me, wasn’t it now? Looked a lot better when I was drinking, I’ll tell you that much. Probably had something to do with the fact that “union” is a lot easier to represent on a keyboard… I accept my penance. (Part of which is developing even as we speak, a mild hangover.) Somebody tell Cde. Gorski, he’d love this, and he deserves a good laugh at my expense. >;->

    I feel bad, since the point I was attempting to make (badly) deserved better, frankly, than me making stupid mistakes while demonstrating it. So yes, I screwed that one up. If you’re going to embarrass yourself by being wrong, do it flamboyantly, I always say. *sigh*

    So, turn all those “U”s upside down, and search and replace “union” with “intersection”. Much better.

    So why do they use that definition for conditional probability? Well, first, it’s so that blowhards like me are encouraged to get drunk and swap symbols when flaming someone on the Internet, leading to a temporary but significant rise in the self-esteem and mood of the audience and responders. And that should be reason enough. However, a more viable mathematical reason is that it provides an intuitive description of what’s going on. I shall demonstrate.

    Picture a Venn diagram (everybody’s favorite graph) that looks as follows:

    [ (///////////////(XXXXX)\\\\\\\\\\\\\\\\\\\\) ]

    consisting of two overlapping circles:

    [ ] – universe
    (/////) – event A
    (\\\\\) – event B
    (XXXXX) – *INTERSECTION* of A and B (see, I got it right that time!)


    So what is the probability that, say, event A will occur? (*THAT* is the definition of “prior” probability; it has NOTHING TO DO with whether event A occurs “before” B or not. It is simply the probability of A occurring, before consideration of any other factors.) It’s P(A). That’s it. Let’s give it the arbitrary value of 0.4, give P(B) the value of 0.5, and set the P(A ∩ B) at 0.1. These are random values set just for the sake of argument and demonstration.

    So what is the probability of A given B, or P(A|B)?

    By definition, we either *know* or are assuming that B has happened – we’re “given” it. That is what “posterior” probability means; we are taking it into account *after* we reduce our statistical universe of outcomes by considering something else. (Note carefully that this applies only to our calculations, and again has nothing to do with B occurring before A, or vice versa.) Our statistical universe has just gone from “1” (everything is still on the table) to 0.5 (the probability that B happened). In this “new” universe – of “only B” – P(A ∩ B) occupies one-fifth of possible outcomes:


    So, the probability of A given B is 0.2 (mostly because I picked the numbers to work out easily).

    This also comes easily from the symbolic definition: P(A|B) = P(A ∩ B) / P(B) = 0.1 / 0.5 = 0.2.

    This description has the virtue of being “simple”; that’s why it’s used. It follows *directly* from the picture, and/or from the basic language used to describe our little toy statistical universe to begin with. Now, you *can* chew through the calculations and recover this from Jan’s statement of Bayes’ theorem, or vice versa. (Already said that, in fact.) This is what Jan’s statement works out to, in standard notation:

    P(A|B) = P(A)*P(B|A) / ( P(~A)*P(B|~A) + P(A)*P(B|A) )

    I will grant it its one useful advantage – it avoids entirely any explicit requirement for P(B) (the probability of B happening), preferring instead to use potentially more accessible variables, such as “probability of B given A” and “probability of B given not-A”. (On the other hand, if you have access to those, you know what P(B) is anyway… For the reader: Why?) You can do all kinds of tortured yoga on Bayes’ theorem to get it to fit any mold you need to pour it into, and as long as you do the math right, it works. This particular statement of it shows up a lot in that “Math for Poets/Doctors/Biologists” course I was talking about. It makes it easy to do things like that calculation of the actual success rate of medical tests, f’rinstance.

    However, there are not “two probability theories”; everything that is true and expressible with Bayesian reasoning is also true and expressible using “frequentist” reasoning, since, at their core, they use the same language and the same axioms of probability. This is Classic Coke and Diet Coke – two flavors of the same basic drink – rather than Catholicism vs. Protestantism. What Bayes actually demonstrated: the two expressions for conditional probability of two variables, one conditioned on the other, are *related* by the ratio of the probability of one event to the probability of the other. That’s it. No magic.

    Now, Bayes is all the rage – especially here – because the assumption is that that “prior probability” for things like “non-sciency stuff” can be selected or assigned such that goofy shit like homeopathy is sifted right out of the research pool, leaving more bux for Real Scientists(™). Yeah, great theory, might even work, but. Unfortunately, if you can’t do the job already with frequentist statistics, you won’t be able to do it with Bayesian reasoning either, and sooner or later somebody will call you on your attempt. “Why has this prior probability been set so low? Doesn’t that prejudice the outcome?” As much as you hem and haw about how “it all works out in the end, Bayes’ theorem says so”, I suspect that it will provide a convenient trap for the more partisan, since their assignation of prior probability will reveal (in one convenient number) their political reliability. “Middle-of-the-road”, “compromise” numbers will be chosen, which prior probabilities will lead to a conclusion that CAM has a definite but small effect. Somehow I doubt that was the effect you were looking for.

    “ach, mein Kopf”

  30. SD says:


    “I would suggest that SD is doing the intellectual equivalent of flashing. Revealing himself with the intent to impress, but actually embarrassing himself deeply in the process.

    so sad”

    Yes, I freely admit that that my attempt at a proof was typed in what is medically known as a “drunken haze”. (In which I violated one of my own cardinal rules, “*NEVER DO MATH WHILE DRINKING*”…) Alcohol makes fools of us all. Ah, sweet, sweet correction…


    “I like the analogy. I knew a girl who devastated a flasher by saying “What’s that? It looks like a penis, only smaller.””

    Yeah. That was from my “unit” in toto spontaneously attempting to crawling back into my stomach from embarrassment. It’s a defense mechanism. Honest.

    “and the air is really cold too! honest!”

  31. JMB says:

    I have read that the frequentist approach is equivalent to the Bayes approach with uniform priors. But the whole point of the discussion is that when priors deviate significantly from uniform, then the decisions based on the frequentist approach will often be in error.

    My preference for the Bayes view of probability comes from its use in classifiers and machine learning. That is what I used to program. I think the Bayes view of probability also has a more direct relationship to information theory and entropy.

  32. JMB says:

    I was programming a Bayes classifier in 1989. I did not recently adopt the Bayes view because it is a current rage. The Bayes view of probability does allow application of the theorem to a wider range of problems dealing with uncertainty. The fact that you can use a Bayes classifier as a spam filter is one advantage of the Bayes view of probability over the frequentist view of probability. I still prefer the frequentist approach for proof of causation, and estimation of important parameters for clinical decisions.

    The answer to the question of how the Bayes view of probability varies from the frequentist view is a topic best left for the statisticians and mathematicians. I would suggest Bayes view of probability is that you are standing on the ground wondering where you can run in order to avoid the bombs you saw dropped from that plane, and the frequentist view is from that plane wondering whether the bombs just dropped will hit the intended target (a stationary building).

  33. Jan Willem Nienhuys says:

    I would like to make a bit more propaganda for adopting the ‘odds’ version of Bayes.

    Here is an example.
    In a certain place it rains 5 days per year. The weather forecaster, who is right 90% of the time (whatever his prediction), says it is going to rain tomorrow. What is the probability – given this forecast – that it rains tomorrow?

    ‘Odds’ version of the solution:
    step 1. The prior odds (for ‘it’s positively raining’) are 5:360 (= 1:72).
    step 2. The fraction ‘correct positive rate’/’false positive rate’ is 90%/10% (= 9).
    step 3. Now multiply. The posterior odds are 9:72 (= 1:8).
    step 4. Convert odds back to probability, gives 1/9.

    Compare this to exactly the same example (with horrible formula and desert wedding) on

    One doesn’t have to be a “world-class statiscian” to perform the above steps, and one doesn’t even have to recall the formula. Even using the calculator on that site isn’t easy, because you’ll have to recall the meanings of those letters. I think that non-mathematicians easily are distracted by letters with no intrinsic meanings, and even physicists hate letters with unconventional meanings (they always want to denote time by t, temperature by T, pressure by p, volume by V, velocity by v and so on).

    Of course you do have to remember something, namely the meaning of concepts like false positive (= some kind of test says positive/raining/sick/effective in the situation that the opposite is true). If you are used to saying sensitivity instead of correct positive rate, you’ll probably have to remember false positive rate = 100% minus sensitivity. Personally I prefer ‘false positive rate’ because the words convey the meaning by themselves.

    See also

    where the terms precision and positive predictive value are introduced for what I called the posterior probability (i.e. 1/9 in the above example). There you find a worked example with a fecal occult blood screen test for bowel cancer. And you’ll find the additional terms ‘statistical power’ and ‘recall’ for correct positive rate.

  34. SD:

    Of course there are not two probability theories, but misapplications of frequentist statistics have led to erroneous conclusions and to irrational decisions. In this series I’ve been mainly concerned with the myth that data can “speak for themselves,” which is a misapplication of frequentist statistics, but nevertheless a common one (at least in medical research–examples abound throughout this series). Bayes forces authors and readers to confront external knowledge. I haven’t noticed that any of the authors or readers here think that the “prior” in “prior probability” has special temporal meaning, by the way.

    I agree that there will be a push to choose “compromise numbers,” if Bayes ever becomes the norm in medical research, but I still want justifications for priors to be made explicit, and I want MDs to understand that the data from a particular experiment can’t “speak for themselves.” The players in academic medicine at least ought to understand what they’re arguing about. See my example of Jonas, above.

    There are other problems with frequentist inference that I haven’t discussed, having to do with misuses that stem from its long-run perspective. (Forgive me if you already know these things). Goodman (pp. 999-1000, 1003) and others have described the “classical statistical puzzle” of two experiments, each comparing the same two hypotheses, with identical subjects, identical treatments, and identical outcomes, that yield quite different P values—merely because each was performed by a different investigator with a different criterion for stopping (although, by coincidence, each eventually stopped after the same number of trials).

    Thus each investigator predicted different results if his experiment were repeated many times, as is the basis for frequentist calculations. P values are for the long-run only, but ‘everyone’ seems to use them as if they were also for the short run. There have been real-life arguments about P values when experiments have been stopped early because of large treatment (or toxic) effects (cited in Goodman). Goodman:

    Because frequentist inference requires the “long run” to be unambiguous, frequentist designs need to be rigid (for example, requiring fixed sample sizes and pre-specified stopping rules), features that many regard as requirements of science rather than as artifacts of a particular inferential philosophy.

    Goodman also shows, in his subsequent article, that in the case of the two identical experiments involved in the “classical statistical puzzle,” Bayesian statistics gets it right: each experiment yields the same likelihood ratio (Bayes factor). That article discusses other examples of “problems that plague frequentist inference,” such as multiple comparisons—which I understand only faintly, so I’ll merely quote Goodman without implying assertions of my own:

    The frequentist solution…involves adjusting the P value for having looked at the data more than once or in multiple ways. But adjusting the measure of evidence because of considerations that have nothing to do with the data defies scientific sense, belies the claim of “objectivity” that is often made for the P value, and produces an undesirable rigidity in standard trial design. From a Bayesian perspective, these problems and their solutions are viewed differently: they are caused not by the reason an experiment was stopped but by the uncertainty in our background knowledge. The practical result is that experimental design and analysis is far more flexible with Bayesian than with standard approaches.

    Oh: “probability of B given A” plus “probability of B given not-A” is equal to “probability of B”

  35. Jan Willem Nienhuys says:

    JMB wrote

    the frequentist approach is equivalent to the Bayes approach with uniform priors

    I don’t know what the author he read meant, but I think it has something to do with the definition of probability. Let me explain.

    1. If you throw a well-formed die, the symmetry of the thing alone tells you that all sides have a probability of 1/6 of coming up. You can try to test that if you wish, but hardly anybody ever does so.

    2. If you throw a coin, the symmetry of the coin alone tells you that both sides have a probality of 1/2 of coming up. Actually some coins are not exactly symmetric because the ‘heads’ part is a bit thicker. When you ‘throw’ the coin by letting it rapidly spin vertically on a very smooth table, you get rather large deviations from the 50/50 odds.

    3. I experimented with a reduced cube. I had sawn off about 1/3 of one side, so I had a block of 42 x 42 x 27 mm. I threw it 100 times, and it fell 83 times with a 42×42 side (a flat side) on the floor. I am too stupid to calculate the theoretical probability in this case (maybe a puzzle for mr. S.D. Unisection?), I even don’t know how to go about it.

    4. I presume that the frequentist view for establishing the true flat chance is that I just should continue throwing my flat die and that in the end the fraction of flat falls will converge to The True Flat Probability F. Unfortunately I don’t have the time to do that.

    5. However, I can make hypotheses. What would be the chance of obtaining exactly this result (namely 83 flats in 100 throws) if F=0.83 or any other value?
    Here is a table:
    suppose F=0.5: the probability of this result is 5.246 x 10⁻¹²
    suppose F=0.6: the probability of this result is 4.410 x 10⁻⁷
    suppose F=0.7: the probability of this result is 1.194 x 10⁻³
    suppose F=0.7418: the probability of this result is 0.01148
    suppose F=0.75: the probability of this result is 0.01652
    suppose F=0.77: the probability of this result is 0.03556
    suppose F=0.79: the probability of this result is 0.06362
    suppose F=0.81: the probability of this result is 0.09245
    suppose F=0.83: the probability of this result is 0.10567 (this is the maximum)
    suppose F=0.85: the probability of this result is 0.09081
    suppose F=0.87: the probability of this result is 0.05495
    suppose F=0.89: the probability of this result is 0.02118
    suppose F=0.8977: the probability of this result is 0.01261
    suppose F=0.91: the probability of this result is 4.420 x 10⁻³
    suppose F=0.95: the probability of this result is 7.184 x 10⁻⁶
    suppose F=0.99: the probability of this result is 2.888 x 10⁻¹⁶

    (If you think one doesn’t have to consider F=0.99, try throwing a standard matchbox, and count how often it lands standing on its smallest side.)

    6. We can of course suppose what we want, but what is the true value of F? The Bayesian idea is, I believe, to inject some kind of prior hunch about what F might be. You may think that all F are equal (that is the uniform prior) or that the F below 0.5 are extremely improbable. It really doesn’t matter. If you assume any prior distribution of the F, you’ll have to multiply it with the function given above to get the posterior distribution. The function given above has a rather narrow peak in the interval 0.77 to 0.91, so it really doesn’t matter what the prior distribution is outside this interval.

    7. If my experiment had consisted of 1000 throws (with 821 flats, say) then the peak would have been even narrower and the precise form of the prior distribution would have been even more irrelevant. So if you can do very many experiments the prior really doesn’t matter anymore.

    8. What would be the frequentist interpretation? I don’t know what a frequentist is, but I guess it is a person who prefers the following statement:

    “Suppose F=0.7418, then the probability of having 83 or more flats is 0.025.
    Suppose F=0.8977, then the probability of having 83 or less flats is 0.975.
    Let us call the interval [0.7418 , 0.8977] the 95% confidence interval.”

    9. I can’t see how frequentist = uniform prior (exactly), but in the case of very many experiments the narrow peak looks just like the standard normal curve and both methods give approximately the same interval for ‘95% confidence’.

    But all this refers to the situation that you want to give an experimentally workable definition of probability. I mean a definition that tells you what to do when you want to determine an unknown probability and how to interpret the results of a large but finite number of experiments.

  36. JMB says:


    Uniform priors was suggested by Laplace for initial calculation. A uniform prior distribution is usually characterized as all possible results being equally probable. Since frequentist statistics don’t consider prior probabilities, that is equivalent to assuming the hypothesis and null hypothesis are equally likely. In the case of balanced dice, then that means that each number is equally probable.

    The uniform prior distribution for the roll of an unbalanced dice would require some pretty heavy duty physics calculations regarding measuring the potential energy barrier from rolling from each face of the die to each adjacent face of the die. If you plot bins on a line representing the size of the bins reflecting the potential energy barrier, then the probability will be flat and uniform relative to the reference line, but the bins will have different probabilities based on their size. Now, I will probably be crucified by the physicists who may read this.

    In any event, it is much easier to calculate the probability of a homeopathic dilution of 60C containing an active ingredient, than to calculate the probability distribution of an unbalanced die.

  37. JMB says:


    In your dice problem of determining the probability distribution of the sliced dice sides ending up, using the Bayes schema (this is not Bayes inference theorem), you could naively assign a uniform distribution to the probability of each side ending up an equal number of times. After your first trial, the postulated probability distribution would be updated by the result. Since your first result has such a low probability of occurrence based on the assumption of uniform distribution, the extent of the update of the probability distribution would be great (note the similarity to information theory, the lower the probability of a message, the greater the information it contains). With this Bayes strategy (using the probability of the observed result as a weighting factor for update of the believed distribution, useful in machine learning), you can very rapidly converge on a probability distribution that the frequentist approach will take a larger number of trials to estimate.

  38. Jan Willem Nienhuys says:

    calculate the probability of a homeopathic dilution of 60C containing an active ingredient

    Here is something we agree on. But those pesky homeopaths keep saying that their preparation process amplifies some kind of spiritual quality (which they call energy) that science doesn’t know about. I find that ‘improbable’ (that such a crude preparation process would do something to an as yet undiscovered property of matter), but I am at a loss how to assign a credible number to that.

    Come to think of it, that number is far less than 10⁻¹²⁰ anyway, so negligible in this case. Or is it?

  39. Scott says:

    One could actually, were one sufficiently motivated, assign a pretty good prior to the proposition that there exists such a new interaction. The fact that it hasn’t been observed in current collider experiments would allow a quite nice constraint. Almost certainly much LESS than 10^-12.

    And that’s before considering the odds that succussion actually manipulates said new interaction.

  40. Jan Willem Nienhuys says:

    @ Scott

    Astronomers seem pretty sure that in and around galaxies there is a huge amount of mass (only bound gravitionally) of an unknown nature. A typical galaxy has a density (in the sphere enclosing it) of about 1000 atoms per cubic metre. So this, or a modest multiple of it, indicates the mass density of this totally unknown kind of mass. So the prior odds – given our knowledge of astronomy – that there is some kind of unknown mass still to be dicovered are quite high (of course there may be something seriously wrong with the theory of gravitation at long distances).

    But as far as we know this mysterious mass has only gravitational interaction. Typical particles discoverd in colliders only exist for very short times and then decay. The only non-decaying elementary particles found by nuclear physics are the neutrinos (originally discovered because of failed energy-momentum balances), but the neutrinos we know are too light and energetic to be gravitationally bound to galaxies.

    But the interactions with the human body seen by homeopaths in their consulting rooms are much more impressive. The particles or whatever does this interaction are also very stable: homeopathic medicine can keep for years. they are only destroyed by interaction with strong smelling herbs, mint and menthol in toothpaste and coffee. The only solution seems to be that it is all spiritual and that no scientific or statisticla or mathematical or logical laws apply to it, only the observations of paranormally or mystically gifted people.

  41. Jan Willem Nienhuys says:

    @ JMB on 08 Mar 2011 at 2:02 pm

    I am afraid you lost me. Can you illustrate what happens when you first throw
    ‘flat, flat’, then update, and then throw ‘side, flat’? In my version after the fourth throw the assumption F = x would imply the result to have a probability of: 4x³(1-x).

    I already stumble when you ‘naively assign a uniform distribution to the probability of each side ending up an equal number of times’. That amounts to putting the full weight 1 on just the prior F = 1/3, in other words, the odds of F being unequal to 1/3 equal to 0:1. I guess this is not your meaning, so if you would demonstrate how you go about it in this particular case (‘prior, flat, flat, update, side, flat, second update’) it would be most welcome.

  42. Scott says:

    @ Jan:

    We have a high confidence that dark matter exists, yes. But an unknown *interaction* relevant at energy scales corresponding to biological processes is the relevant question, and that’s quite independent of dark matter. And we can rule it out to a very high degree of confidence.

  43. Jan Willem Nienhuys says:

    @ Scott

    We almost totally agree. (Whatever happens in collider experiments is also irrelevant for biology energy scales.) But the point I wanted to make is that we cannot express this degree of confidence in a numerical form. And that is precisely one of the criticisms against ‘doing Bayes': that so-called priors are not actually chances (or odds) or that they are unknown or unknowable.

    If we are dealing with rare events like lab mistakes or intentional fraud in science one can do statistics. These things happen. But it is impossible to examine a bunch of universes with human life (‘as we know it’) in it and count how often homeopathy is true or false in them without scientists understanding how homeopathy could work, to arrive at credible prior odds.

    So the judgement that research into homeopathy is of no use cannot be based on simply plugging in numbers into Bayes’ Theorem (incredibly small times anything is still incredibly small).

    In case of homeopathy the solution is simpler than with many other unproven healing methods. The basic tenet of homeopathy is that highly diluted stuff will produce symptoms in healthy people, e.g. unbearable itch for Sulphur C200, or
    ‘Sentimental mood in moonlight, particularly ecstatic love’ for Antimonum crudum (stibnite, antimony sulphide, I don’t know which potency, see , note that the formula for Sb2S3 is rendered incorrectly but consistent with what Hahnemann thought; as antimony is poisonous, I guess provings with it were only done with small doses; Hahnemann recommended in his later years that provings be done with C30, and this particular symptom doesn’t appear in Hahnemann’s writings, I think).

    OK, if they believe that, and if they base treatment on comparing patients’ complaints to that kind of symptoms, let them do a serious proving of any kind. The total of such kind of symptoms runs in the hundred thousands, let them take their pick. Table salt C30 has over 1000. To perform a serious proving is not difficult, and I gather that homeopaths often do that (it would usually be a reproving) during their training periods. This is not a strange idea at all. Any science student starts doing small experiments that have been done often before, not only cutting up frogs, but measuring gravitation or measuring the speed of sound or of heat loss, whatever.

    So let those homeopaths first show a couple of succesful reprovings done in cooperation with real scientists. But they won’t. They refuse to do so. I can only think of one reason: they know they will fail.

    But then there is no need to take them seriously anymore.

  44. Scott says:

    (Whatever happens in collider experiments is also irrelevant for biology energy scales.)

    Not true. High-energy experiments are illuminating about lower-energy phenomena, even though the latter is not true. Technically speaking, this is because you can do renormalization to absorb the effects of higher energy scales into the coupling constants at lower energy scales. But that’s not true in reverse.

    Any interaction which could do what’s claimed for homeopathy, much less reiki, would stick out like a sore thumb in such things as the decay cross-sections of the Z boson.

    It could be done with lower-energy arguments too, I’ll grant you, but there’s a certain cachet about using the fun stuff.

    But the point I wanted to make is that we cannot express this degree of confidence in a numerical form.

    Also false. We can calculate the probability that an interaction having the characteristics claimed would not have been detected in existing experiments, based on the uncertainties therein. This is a routine operation. Strictly speaking it’s normally done in the other direction – given the experimental constraints, what characteristics, in particular coupling constant (i.e. strength), can a hypothetical new interaction have – but the math works the same way for ruling out a new interaction which would have to have a certain strength to do what’s claimed for it.

    In particular, we can conclude that said hypothetical interaction must have (at a minimum) comparable strength to electromagnetism at biological energy scales (i.e. eV) since it would necessarily have to be affecting chemical reactions to function, and hence EM cannot entirely dominate.

  45. JMB says:


    Sorry, I assumed when you sawed off the side, there were still six flat faces to the die. I framed the problem as trying to predict the probability distribution of each side landing down, not just the probability of the flat side landing down (comes from my past focus on multiple disease possibilities). The issue of updating the hypothesis is a complicated issue in the way I have framed the problem. Your way of framing the problem is better. The main point I was trying to make is that the update of the hypothesis is dependent on the probability of the observed data resulting from the hypothesis. When the prior probability estimate is far from the observed data, there will be a greater revision of the hypothesis. Imprecise prior probability estimates are revised rapidly after a few trials. In either the frequentist approach or the Bayesian approach, several trials are necessary to reduce errors in decisions. However, even with imprecise priors, the Bayesian approach will more rapidly approach a correct probability estimate than the frequentist approach. Here is a paper (that I am in no position to determine its validity) that shows that unless the true parameter is significantly out of the range of the uniform prior, that the Bayes estimator of the parameter will outperform the frequentist estimator given a limited number of samples.

    The other advantage of the Bayes approach is that for those hypotheses with prior probabilities in the extreme range (such as low probability of homeopathic preparations having an active ingredient ( whether 10 ^ -120 or 10 ^ -12 or 10 ^ -6), or high probability that a parachute will reduce you chance of death after falling out of an airplane), even with an unexpected result, the probability of the hypothesis will not be changed enough that decisions made from statistical analysis will be wrong. I would tend to say that any prior probability of less than .01 or greater than .99 represents established scientific knowledge (maybe physicists and chemists can use .0001 and .9999).

    I think Dr Atwood has made the most important point that decision making should be based on multiple trials. I would add that with multiple trials, errors resulting from imprecise priors are less than residual errors from the frequentist approach (with a large number of trials, the Bayes and frequentist estimators converge).

  46. Jan Willem Nienhuys says:

    there were still six flat faces to the die

    Both 42×42 faces of the 42x42x27 die I called the flat faces, because if the ‘die’ lies on that side, I feel it is lying flat; in the other position it is ‘upright’. Of course the four small faces (42×27) have the same chance as have the two 42×42 faces. So in order to know the chance of landing on any face, one only has to determine ‘the combined chance of landing on any of the two 42×42 faces’. In an apparently failed attempt to be succinct, I called that the flat chance. I wonder whether making the die into a cylinder of about the same dimensions (making it easy to roll on that face) would affect the flat chance a lot.

    maybe physicists and chemists can use .0001 and .9999

    In certain fields of physics it is customary to consider a signal as ‘real’ when its exceeds noise by a factor 5, in other words five standard deviations, corresponding to p=0.000 000 3. This is done in fields where many data are collected, for example X-rays from space from any direction. If you collect a billion data per day, this still amounts to having to look at 300 ‘events’.

    So if data are expensive to get, you settle for more modest p-values.

  47. Jan Willem Nienhuys says:

    Here is a paper (that I am in no position to determine its validity)

    The paper seems to explain things more or less as I did. It does some experimenting.

    You draw 10 times from a vase containing ten balls, b of which are black and the remainder is white. After each draw you put the drawn ball back into the vase. So the expected number of black balls is b. In the mathematical jargon this is ‘drawing a random sample from the binomial (n=10, π )’, where the funny symbol stands for b/10. May I use F instead of the funny symbol?

    If one doesn’t know how large is F, one can try to guess F from the result. Naturally, one would just think that the number of black balls divided by the number of draws would be your best bet. This is called the Frequentist view, with a capital F.

    The Bayesian view, with a capital B, is to start believing (before you draw any balls) that black and white have the same chance, i.e. the belief that F=0.5, in other words the vase contains five balls of either color. So in your mind you prefix two draws, one yielding black, one yielding white. At least that is how the authors interpret the Bayesian point of view. The Bayesian estimator for F is ‘number of black balls plus 1′ divided by ‘number of draws plus 2′.

    The paper then proceeds to perform this experiment (ten draws from that vase) 2000 times in each of the cases that the vase has 1, 2, … 9 black balls.

    I am not certain why one should do this by simulation, because the expected curve can easily be derived theoretically (the math required is of the kind I used to teach to freshmen for many years). The theoretical curves (in a manner of speaking obtainable by a googol experiments) are just parabolas, see below. I am not really impressed if someone does things by simulation that can be done simple math.

    Naturally, if you put in a modest amount of belief that there are 5 black balls, the result is that you will do slightly better in guessing if your prior guess was true. Surprise, surprise. I predict that in case F= 0,5 prefixing 100 draws, 50 of which are black, will work even better!

    Actually the real surprise is that the advantage of having those two imaginary extra draws in addition to the ten real draws works so well if the numbers of white and black balls are between 2 and 8.

    We can even compute the result when there are no black balls! The Frequentist will estimate F= 0 every time and he will be entirely correct every time, mean square error is 0. The Bayesian will estimate F = 1/12 each time and this is also the error, hence the mean square error will be 1/144 = 0.007. This allows you to extend the graph on page 10 of the quoted source a bit further.

    Actually these curves are squares of errors, if we extract roots we see that the small ‘Bayesian’ advantage in the middle is offset by a much larger disadvantage on the sides.

    This consideration determines where the mentioned parabolas intersect the left and right edges of a properly drawn graph (with 0 left and and 1 right). The theoretical parabolas rise to 0.02500 and 0.01722 (=5/288).

    It’s an example of the Law of Conservation of Misery: what you gain in one place (better estimates if reality conforms to your prior) you lose elsewhere, namely where reality doesn’t fit your preconceptions.

    In a world where you have to deal with unknown prior odds that may differ vastly from 1 (or may not even be real quotients of probabilities), it doesn’t really help you a lot if you assume prior odds = 1.

  48. JMB says:

    Thanks for your review of the paper. I guess I needed to attend your class. I knew there was an analytic solution to the distribution of results of trials in the binomial problem, but did not think there was an analytic solution to errors of Bayesian estimators and frequentist estimators after one trial.

    The paper does closely follow your mathematical formulation.

    You do note that there is a range from the proposed prior in which the Bayes estimator more rapidly converges on the true parameter. So the if we use a uniform prior, if the true parameter is in the range of .2 to .8, then the Bayes estimator will work better than the frequentist estimator. But in the problem with the sawed off dice, wouldn’t it make more sense to use an informative prior in which the estimate of prior probability is based on the fraction of area each face represents as the total area of the faces of the dice? While not as good as measuring the energy required to displace the dice from one face to another, it is still an informative prior based on our knowledge.


    One simple definition of what differentiates a Bayesian approach from a frequentist approach in whether or not prior information is used,

    . Indeed, Fisher’s maxim, “Let the data speak for themselves” seems to imply that it would be wrong ( a violation of “scientifi c objectivity” ) to allow ourselves to be influenced by other considerations such as prior knowledge about H.

    where H is a scientific hypothesis.

    In the same paper the point is made that the experimental model we choose for a study does require prior information.

    Yet the very act of choosing a model (i.e. a sampling distribution conditional on H) is a means of expressing some kind of prior knowledge about the existence and nature of H, and its observable effects.

    This was a point made by John Tukey (noted in the subsequent paragraph). In a totally unrelated point in my personal history, I originally entered radiology as a medical student to use their research computer (a PDP/11) to program an integer version of the Fast Fourier Transform (the Cooley-Tukey algorithm) for calculation of computer generated holograms.

  49. Jan Willem Nienhuys says:

    if the true parameter is in the range of .2 to .8, then the Bayes estimator will work better than the frequentist estimator

    Yes, but only if you do ten draws. If you do more draws then of course adding an imaginary extra two draws will still be better if your prior happens to match reality, but the difference will be smaller and I guess that the range where it will work ‘better’ also will change. I can calculate it if you wish, I guess it becomes slowly smaller.

    If you don’t do ten draws, but only one, then the Bayesian method works ‘better’ for parameter between 0.09175 and 0.90825.

    In any case, the prior becomes less and less important the more data you collect.

    Applying this to homeopathy: by now so many data have been collected that the prior would be not very relevant, if only it wouldn’t be so vanishingly small. Unfortunately the homeopaths keep saying that homeopathy has adequate scientific proof by now!

  50. JMB says:


    I wouldn’t ask you to calculate it. Usually medical hypotheses are promoted to the next level of study design after fewer positive results than 10 studies. If several trials of a plausible hypothesis fail (usually no more than 5 times), then the plausible hypothesis is dropped. Notable exceptions are certain implausible CAM hypotheses like homeopathy that have a religious following. Acupuncture (the ancient mechanism description is implausible, but the process is not implausible) has undergone thousand of trials. The penultimate medical trial is the large scale RCT. There are few interventions that have undergone 10 large RCTs.

  51. Unrelated linguistic aside:

    The dictionary meaning of “penultimate” is “next-to-last.” It’s not a common word, so these days people are using it to mean “ultra-ultimate,” or “after-the-last.” At what point do the subculture of people who use the dictionary meaning abandon the word as hopelessly ambiguous?

    (The story of how I came to know the dictionary meaning. Friends who had been working in Tanzania told of the confusion over the pronunciation of the country’s name. It had previously been two countries, Tanganyika and Zanzibar. When they merged, the names were also merged but people weren’t sure how to pronounce the new name. Was it TanZANia or TanzanIa? The government announced that the correct pronunciation would be formally announced on the radio at such-and-such a time. The country eagerly tuned in. The radio announcer: “The correct pronunciation is with the emphasis on the penultimate syllable.” Um. This is a very cruel and unhelpful answer for most Tanzanians. “… TanzanIa.” Ahh, that’s better.) (This is also the story of how I came to know whatever happened to Tanganyika and where Zanzibar was.) (This is not a story about how elite I am but about how difficult it is to rely on an assumption of common history to support a common understanding of words. It’s a wonder we manage to communicate at all.) (Back to our regularly scheduled programming.)

  52. JMB says:


    The ultimate test of a medical intervention is the observed performance after it is approved (or accepted) for use. Usually this takes 10 or more years. Unfortunately this often ends up a legal decision instead of a scientific decision (IUDs being an example).

Comments are closed.