Comprehending the Incomprehensible

Medicine is impossible. Really. The amount of information that flows out the interwebs is amazing and the time to absorb it is comparatively tiny.

If you work, sleep and have a family, once those responsibilities are complete there is remarkably little time to keep up with the primary literature. I have made two of my hobbies (blogging and podcasting) dovetail with my professional need to keep up to date, but most health care providers lack the DSM-4 diagnoses to consistently keep up.

So we all rely on short cuts. People rely on me to put new infectious disease information into context and there are those I rely upon to help me understand information both in my specialty and in fields that are unrelated to ID.

Up and down the medical hierarchies we trust that others are doing their best to understand the too numerous to count aspects of medicine that no single person could ever comprehend.

If I want to know about the state of the art on the treatment of atypical mycobacterium or how best to treat Waldenströms or who knows the most about diagnosing sarcoid, there is always someone who can distill their expertise on a topic to the benefit of the patient and my knowledge.

Trusting others is the biggest shortcut we routinely take in medicine to wade through the Brogdignagian amounts of information that flood into medical practice. We have to trust other clinicians, the researchers and the journals that all the information is gathered and interpreted honestly and accurately.

I understand that the world is a tricky and confusing place and that even under the best of circumstances the literature has ample opportunity to be wrong. But in the end the truth, or some approximation of it, will out.

Trust is a fragile foundation upon which to build an edifice, but the practice of medicine would be impossible without it. It is one of the reasons medical fraud is particularly heinous; it strikes to the heart of the practice of medicine.

One of the other shortcuts we use is statistics. It is a quick and dirty way to check the validity of results and there is nothing like a good p-value to make a result believable.  The smaller the p-value, the better.  That is a simple, and sometimes misleading, approach.  Except for ID and SCAM’s, I rarely have the luxury of time to read a study closely, so I look at the p-value instead.

I have long had a mental block with statistics. I took, and dropped, statistics once a year for 4 years in college. Once they got past the bell shaped curve and the coin flip they would lose me. So I have to trust that the statistics are correct when I read a paper, and that bothers me. I would feel better if I knew that at one time I had been able to crank out the results with a pencil and a piece of paper.

Otherwise statistics are like the old New Yorker cartoon.

It was nice to be reminded for the umpteenth time that statistics can be tricky, with an article in Vaccine called 5 ways statistics can fool you—Tips for practicing clinicians. I write this, and my other blog, to first educate and entertain myself. As a side effect I hope to educate and entertain others, but I long ago realized that not everyone shares my aesthetic. There are those of you reading this for which the 5 ways are old hat, part of your critical thinking skills. For me it is yet another attempt to understand statistical concepts with the depth that I have with MRSA, rather than, say, the loop of Henle, the understanding of the latter lasts as long as I am reading about it. If that. I still suspect the loop has as much validity as homeopathy.

Their opening statement is a master of understatement:

However, compounding the problem of finding and effectively using the medical literature is the fact that many, if not most, physicians lack core skills in epidemiology and statistics to allow them to properly and efficiently evaluate new research. This may limit their abilities to provide the best evidence-based care to patients.

No kidding. And this article is meant to be applied to reality/science based treatment. As I mentioned in my last blog entry, it is even more problematic when statistics are applied to fantasy interventions like acupuncture or homeopathy. Then, not only do most physicians lack the core competencies to evaluate the paper, most are not able to recognize the subtle biases that allow magic to perceived as real.

It is like having real scientists evaluate ESP, where a magician would be a better qualified observer. I suspect that many editors and reviewers of SCAM papers are untrained in the skills required to evaluate SCAM research.  They apply the rules of science where those rules do not apply.

The tips, with examples from the vaccine literature, are

Tip #1: statistical significance does not equate to clinical significance

Tip#2: absolute risk rather than relative risk informs clinical significance

Tip#3: confidence intervals offer more information than p-values

Tip#4: beware multiple testing and the isolated significant p-value

Tip #5: absence of evidence is not evidence of absence

It is remarkable how these tips can be applied to SCAM-related articles and the articles are found wanting. For example, the acupuncture article I reviewed last week: barely significant p-values, (not true, thats why the line through), large confidence intervals and misleading multiple tests; the sine qua non of positive SCAM studies.

This, of course, is applying statistics to reality. Add a touch of prior plausibility and there is no reason to suspect that the barely positive effects noted in some SCAM studies are due to anything but bias and only bias. Since most of the interventions we discuss in this blog have zero probability of a real effect, there should be an alternative explanation for what appear to be beneficial outcomes. In the case of SCAM, absence of evidence is very likely evidence of absence.

Recently there were two other articles that looked at the effect of bias on outcomes in clinical trials. Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors. and Observer bias in randomized clinical trials with measurement scale outcomes: a systematic review of trials with both blinded and nonblinded assessors

Almost the same article twice over.

In the two analyses they looked at the outcomes of clinical trials where the person determining the outcome, for example, the number of wrinkles after a therapy, are blinded or not as to the therapy. No surprise, when people know what the intervention is they assessed it as more effective than when they were blinded.

Nonblinded assessors of subjective measurement scale outcomes in randomized clinical trials tended to generate substantially biased effect sizes.

And this is for relatively tightly controlled clinical trials. In real-world practice the tendency to over estimate the effect of a therapy must be even greater. It shows up most often in the phrase “I use X on my patients and that is how I know it works.”

The strong effect of bias in determining that you see what you want to see, regardless of what is actually there, is a common human characteristic.  To be blind to that characteristic is a common Dunning-Kruger variant of all true believers.

Statistics makes my brain hurt. But if I remember some basic principals it makes understanding the medical literature a bit more comprehensible.

Posted in: Basic Science

Leave a Comment (18) ↓

18 thoughts on “Comprehending the Incomprehensible

  1. rork says:

    That hurt your brain feels, think of it as soreness from exercise. Welcome it. The cure is more.

  2. Scott says:

    It gets better. If you really want to do it right, you have to not only consider the statistics properly, you need to correctly quantify the systematics. Taking as an example what are sometimes called the “researcher degrees of freedom,” a GOOD analysis would include identifying them, varying them over their range, and measuring how much the result changes when you do. For each such DoF. Each then becomes one source of systematic uncertainty, which (if you’re doing it right) you treat on a par with your statistical uncertainty.

    Coming from a physics background where measurements are routinely reported with two sets of +/- (one statistical and one systematic), the complete lack of quantification of systematic uncertainties is a really glaring problem in the medical literature.

    So, have I just made your brain explode from contemplating adding that on?

  3. CM Doran says:

    The first thing I learned was “Brogdignagian.” This word will come in handy…thanks.

    I will find the article useful when working with students….I LOVE going over statistics and clinical studies with them.

    Other things that drive me batty with some “statistical analyses” in studies is inappropriate use of tests (i.e., using t tests on data sets that don’t follow normal distribution) and bias. Graphs and illustrations can be misleading as well.

    Thank you for writing.

  4. Jan Willem Nienhuys says:

    Bem Goldacre’s recent book Bad Pharma lists 15 ways to do ‘bad trials’ of which he only counts

    1. outright fraud (making up data)

    as fraud. But of course the other 14 are fraud too. Some of these fraudulent methods are OK if you still are figuring out what is going on. But what’s OK in explorative research is wrong when the time has come to provide convincing proof of whatever are your claims. Note the word ‘fraud’ suggests an intent to deceive. But from a professional researcher, publishing in professional papers one can and must require that they do not make elementary mistakes. If it is known that you must work with blinded tests in a particular context and you ‘forget’ about the blinding, that’s worse than a mistake, it is fraud. Or should I quote Fouché who thought that some mistakes were worse than crimes. See:,_Duke_of_Enghien
    I am convinced that these kinds of fraud are happening all the time in ‘research’ by sCAM believers.

    Here are the other 14 of Goldacre.

    2. Use freakishly ideal patients – who are, except for the investigated disease, very healthy.

    3. Trials that are too short (so the negative long term effects doesn’t show up)

    4. Trials that stop early. This is done by peeking at the results before the planned number of patients to enroll is obtained.
    My comment: if you really should have 100 patients but if you peek at 20, 30, 40 … patients to see whether ‘significance is reached’ you give yourself much more than a chance of 1 in 20 to get ‘significance’, if the two treaments compared actually don’t differ. If you combine it with fraud #10 below, you improve your chances.

    If people are in the habit of doing that, they will of course often stop because the results are such that they can figure out that they probably won’t get a positive result. These ‘pilot projects’ are then discarded and never published.

    5. Trials that stop too late. If your follow-up period is too long it can serve to dilute nasty short term effects.

    6. Trials that are too small. This can be a version of fraud #3, if you recruit your patients consecutively. Moreover if the conclusion of a trial is that two different treatments are equivalent (i.e. a nonsignificant difference) the statement can be meaningless if the trail is too small. Extreme example: a die is suspected to be loaded so as to show more sixes. It is thrown once. No matter what it shows, it is not a signifcant deviation from what can be expected.

    7.Trials that measure uninformative outcomes. E.g. blood pressure as outcome, where the chance of dying is what would have been really interesting.

    8. Trials that bundle their outcomes in odd ways.
    This is one of series of fraud types that is committed after all the data are in.

    9. Trials that ignore drop-outs.

    10. Trials that change their main outcome after they’ve finished. Goldacre reports that this happens quite often. He also gives the example of paroxetine. One trial had a positive result, but on close inspection there were 2 primary outcomes and 6 secondary outcomes in the protocol. None of them were significant. The researches had looked at 19 more, of which 4 came out positive – and in the published paper these were called the main outcomes. And this is not an isolated example. Investigators in 2004 found that in two thirds of the papers one or more outcomes were different from the ones in the pretrial protocol.
    According to me sCAM research commits this error (worse than a crime) quite often.

    11. Dodgy subgroup analysis
    Basically this is one more example of playing with the figures after the data are collected.

    12. Same, but then with meta-analyses and systematic reviews.

    13. Seeding trials. No more said.

    14. Pretend it’s all positive regardless. The idea is that you give relative risks in stead of absolute risks (to make them look bigger), but there are other ways. Such as:

    14.1 omit negative results in the body of the paper from the abstract
    14.2 misrepresent negative results in the body of the paper in the abstract
    14.2 bury the negative results in the body of the paper, without explictly stating it.
    14.3 apply any of the frauds #8, #10, #11.
    14.4 brazenly ramble on about how great the treatment is (Goldacre’s words)

    Goldacre even doesn’t explain ‘errors’ like improper randomization, use of inappropriate tests, and faulty or absent blinding; fraudulently hiding the results of unfavorable trials is treated in another chapter.

  5. cervantes says:

    Pre-trial registration is intended to ameliorate a lot of the above. So, hopefully things will get better.

  6. Harriet Hall says:

    @CM Doran,

    If you’re going to use the word, you might want to use the standard spelling Brobdingnagian. rather than the Crislipian version.

  7. rork says:

    “Trials that stop early.” No doubt Jan knows this, but good trials often have early stopping rules where you peek, but only according to a plan, and the statistical analysis accounts for that. Continuous “peeking” can be properly done too, it’s just trickier to compute the critical region for the test statistics (and you pay in terms of power, and might have to use more subjects in the end, though if the effects are huge you of course use fewer people if you get to stop early). Phase II-III trials often have early stopping rules to protect patients from both 1) truly bad outcomes in one arm (futility), and 2) not being able to enjoy the truly good outcomes in the other arm. Phase I’s have the first protection, around here, and hopefully around you too. We’d sometimes like to just go ahead and keep killing or nearly killing people while obtaining more of the sacred data, but patients don’t like that, and other folks won’t let us. Scruples, bah.

  8. mousethatroared says:

    “I understand that the world is a tricky and confusing place and that even under the best of circumstances the literature has ample opportunity to be wrong. But in the end the truth, or some approximation of it, will out.”

    In the context of medical care I don’t really believe in “truth” anymore. I try to think in terms of ‘most likely given the information I have now.’

  9. Jan Willem Nienhuys says:

    good trials often have early stopping rules where you peek, but only according to a plan

    Of course. Actually Goldacre makes a remark about that, but then gives an example without mentioning whether there was a
    plan. If there is a plan, then the ordinary computation “50 verum, 50 placebo, 35 verum cured, 25 placebo cured, hurray p=0.0328 one tailed” (if I’m not mistaken) is off. The entire plan should be in the paper and the calculation of p-values (or whatever) should reflect this plan.

    The reasons given by rork don’t apply when examining alternative medicine, say homeopathy, for self-limiting conditions. And the reasons of rork apply mostly when you are still in the exploratory phases, i.e. when you don’t quite know what to expect. If the trial is meant to give the final proof of efficacy and safety (established in exploratory trials), to present to a regulatory body, then I wonder whether a trial designed with safeguards in case something unexpected happens (many more people cured or killed in one arm) is really the best you can do.

  10. weing says:

    According to Ben Goldacre, doctors are not keeping up with the literature. All they have to do is skim it to keep up. That would only take 29 hours a day of skimming to keep up. What a lazy lot.

  11. Mark Crislip says:

    I feel so sorry for you all and your narrow, Western, empiric approach to spelling and grammar. Others have been using alternative spelling and grammar for centuries with good communication. Shakespeare used spelling that does not resemble modern ‘words’ yet he is considered the greatest writer of all time. If you were not under the thrall of Big Dictionary (and the OED is BIG) you would understand. Some day the veil will fall from your eyes and you will be liberated into a new understanding of written English.

  12. Narad says:

    If you were not under the thrall of Big Dictionary (and the OED is BIG) you would understand.

    I’m not sure that a dictionary that recognizes “sammich” is your best bet on this front. Then again, one might want to take a gander at the June 2010 issue of English Today, in particular Charlotte Brewer’s contribution.

  13. elburto says:

    Shakespeare used spelling that does not resemble modern ‘words’ yet he is considered the greatest writer of all time.

    Hell, he didn’t even spell his own name the same way twice. As I have never seen an article written by ‘Marck Crisslipe’ or ‘Marke Krislypp’, it is obvious that this CAS* approach makes our host superior to the Bard.

    *Complementary and Alternative Spelling.

  14. mousethatroared says:

    Actually, I would say just the opposite, Elburto. Mark Crislip’s consistent spelling of his name demonstrates that he still clings to his empirical reductionist training. He’ll never truly reap the benefits of the alternative approach until he has the faith to set aside ALL his “conventions”.

    Actually, I’m with Crislip, folks wrote beautifully, expressively, with gorgeous penmanship and without consistent spelling for a very long time. Now one can not jot an email to a family member with getting superior little spelling suggestions…not that I’m sensitive or anything.

  15. Jan Willem Nienhuys says:

    Shakespeare … is considered the greatest writer of all time.

    English writer! That narrow, Western, … approach neglects writers in other cultures, and especially Chinese writers. They didn’t use spelling either. Around the time Shakespeare was writing and even long afterwards, there were more books in print in China than in the rest of the world together.

  16. It’s not Brogdignagian but actually Brobdingnagian … though I think the conversion on amount of work winds up the same. :)

    Yes, I’m one of THOSE guys, I know I know, I hate me too.

  17. BillyJoe says:

    Yeah, you’re one of those guys who don’t read the preceding comments before adding your own superfluous one ;)

  18. Quill says:

    All this fuss over Brogdignagian and Brobdingnagian strikes me as lilliputian.

    Literary tussles aside, thank you for these tips and discussion of them. They are very handy.

Comments are closed.