The Science of Clinical Trials

Science-based medicine is partly an exercise in detailed navel gazing – we are examining the use of science in the practice of medicine. As we use scientific evidence to determine which treatments work, we also have to examine the relationship between science and practice, and the strengths and weaknesses of the current methods for funding, conducting, reviewing, publishing, and implementing scientific research – a meta-scientific examination.

There have been several recent publications that do just that – look at the clinical literature to see how it is working and how it relates to practice.

Dr. Vinay Prasad led a team of researchers through the pages of the New England Journal of Medicine hunting for medical reversals – studies that show that current medical practice is ineffective. Their results were published recently in the Mayo Clinic Proceedings:

Dr. Prasad’s major conclusion concerns the 363 articles that test current medical practice — things doctors are doing today. His group determined that 146 (40.2%) found these practices to be ineffective, or medical reversals. Another 138 (38%) reaffirmed the value of current practice, and 79 (21.8%) were inconclusive — unable to render a firm verdict regarding the practice.

Prasad also found that 27% of published studies looked at existing treatments while 73% studied new treatments.

This does not mean that 40% of current practice or current treatments are useless. There is likely a selection bias in which treatments that are controversial are more likely to be studied than ones that are well established. Also, as David Gorski has already pointed out, this is a study of one journal (the NEJM) and may reflect a publication bias toward high impact articles – showing new treatments that work and reversals of established treatments.

Also, on a positive note, these studies reflect the practice within medicine of studying even treatments that are already in use, and then abandoning those treatments when the evidence shows they don’t work.

The question remains, however – is 40% acceptable? Is this what we would expect from a system that is working well? In an accompanying editorial, John Ioannidis comments:

”Finally, are there incentives and anything else we can do to promote testing of seemingly established practices and identification of more practices that need to be abandoned? Obviously, such an undertaking will require commitment to a rigorous clinical research agenda in a time of restricted budgets,” concludes Dr. Ioannidis. “However, it is clear that carefully designed trials on expensive practices may have a very favorable value of information, and they would be excellent investments toward curtailing the irrational cost of ineffective health care.”

I agree that this highlights the fact that some current practices are useless or less than optimal and need to be re-examined. It also seems prudent to target expensive treatments that may not work.

These results might also suggest that the medical community, in some cases, adopts new treatments prematurely, based on preliminary evidence that is not reliable. Ioannidis himself pointed out that most new treatments do not work and most preliminary evidence on these treatments are false positives. Simmons et al demonstrated how easy it is to inadvertently manufacture positive results by exploiting researcher degrees of freedom. Publication bias is also a known effect pushing the literature toward positive results.

All of this research seems to be pointing in the same direction – we need to compensate for this positive bias in the clinical research and perhaps raise the bar of rigorous evidence before adopting new treatments. Further, we need to retroactively apply this raised bar of evidence to existing treatments which were perhaps adopted prematurely.

Another researcher, Benjamin Djulbegovic, just published another look at the clinical literature in the journal, Nature. He examined 860 phase III efficacy trials and found that in just over half of these trials the new treatments being studied were superior to existing treatments. He concludes:

“Our retrospective review of more than 50 years of randomized trials shows that they remain the ‘indispensable ordeals’ through which biomedical researchers’ responsibility to patients and the public is manifested,” the researchers conclude. “These trials may need tweak and polish, but they’re not broken.”

His primary point is that this ratio is close to ideal. If the new treatments worked the vast majority of the time, then that would indicate that existing evidence is likely sufficient and it would call into question the ethics of doing the study – putting patients on a placebo or less effective treatment. If the efficacy trials demonstrated that the treatment worked in only a small minority of cases, then the possibility of benefit would be unethically too small and that would also call into question the methods we were using to select treatments for large clinical trials.

Roughly a 50-50 split is therefore in the Goldilocks zone for clinical trials. Djulbegovic points out that this ratio allows for the steady incremental advance of medical treatments, even though only about 2-4% of new treatments represent a true breakthrough.

Another recent review of the clinical research, this one by Clinical Evidence, a project of BMJ, concludes that for about half of studied treatments, the efficacy is simply unknown. They point out that this does not mean half of treatments being used, and the data says nothing about the frequency of use of individual treatments – just that systematic reviews of treatments are inconclusive about half the time.

This result is similar to a 2011 review of Cochrane systematic reviews which found that 45% were inconclusive – could not conclude that the treatment either did or did not work.


One simple interpretation of all of this research into clinical trials is that we need to do more research. In the review of Cochrane reviews above, only 2% of reviews concluded that the examined treatment works and no further research is required (although it seems to me that recommending further research is the default conclusion for Cochrane reviews).

We do not, however, have infinite resources with which to conduct clinical research. We will always be making do with insufficient data, and so we must make the most out of the information we have. Part of that is understanding the patterns in the research, and the strengths and weaknesses of various kinds of medical research at every stage.

We now have a very good working knowledge of biases in clinical studies. Generally speaking, they have a huge positive bias. Preliminary studies should therefore be considered unreliable in terms of making clinical decisions, and should only be used as a basis for designing further research.

We need to recognize that only mature rigorous efficacy trials give us any reliable indication of whether or not a treatment works and is superior to doing nothing or to existing treatments. Even these trials, however, are problematic and are subject to bias. We therefore also need to consider plausibility (prior probability).

The threshold for determining that a treatment is well established as beneficial is: scientifically plausible intervention backed by rigorous clinical trials that show a replicable, clinically significant and statistically significant benefit.

In my opinion, the research clearly shows that this is the reasonable standard. This is the primary meaning of science-based medicine.

Meanwhile, there is another movement within mainstream medicine whose primary goal is to move in the exact opposite direction. To soften the standards of scientific evidence, use pragmatic studies as if they were efficacy trials, interpret placebo effects as if they were physiological effects, and to have a “more flexible” concept of medical evidence.

These two opposite trends can exist side-by-side because the latter exists in an “alternative” world, a literal double standard.

Posted in: Clinical Trials

Leave a Comment (12) ↓

12 thoughts on “The Science of Clinical Trials

  1. David Gorski says:

    More on Prasad’s article. Hint: It doesn’t exactly conclude what some people claim it concludes.

  2. windriven says:

    Dr. Novella said, “It also seems prudent to target expensive treatments that may not work.”

    It also seems prudent to more closely examine for which patient populations a treatment is appropriate. An ugly fact of manufacturing medical products, really of any products, is that an effort is made to sell as many as possible.

    Very often there are patient populations that will benefit from something. But that something may have a very high price. So an effort is made to amortize that cost over a large number of patients making the per patient cost lower.

    Back in the day lots of equipment was sold on that naked premise. A pulmonary function testing system might cost X dollars. But if the pulmonologist routinely runs every patient through the system the cost per patient is neglible and the profit to the pulmonologist is stout. The profit to the manufacturer is also stout because they will sell lots more pulmonary function systems.

    Now often the effort is to have one’s technology become sufficiently ubiquitous that the seller can claim it as ‘standard of care’ thereby essentially extorting everyone in the field to adopt it or face the consequences in some future lawsuit.

    And of course this applies beyond the sale of devices and pharmaceuticals. Everyone wants to believe that their pet idea is the greatest thing since special relativity. So when they talk about their mule it is likely to be the best damned mule in Georgia.

    I could also talk at some length about the way that shifting reimbursement strategies have shaped hospital practice. And I don’t need to point out that reimbursement strategies have everything to do with economics and nothing to do with science.

    My point (yes, I have one) is that it isn’t only expensive treatments that deserve scrutiny – though I would agree that expensive treatments are the low hanging fruit. Just remember that big expenses are sometimes hidden by rolling them very thin.

  3. David – Thanks. I forgot to link to your previous treatment of this study. I will add a link and quick comment.

  4. qetzal says:

    What level of evidence is sufficient to show that an accepted practice doesn’t really work? This seems like sort of the inverse of the things Ioannidis has shown about new practices. If most positive findings on unproven treatments will actually be false positives, how often would a negative finding of an accepted treatment be a false negative?

    Obviously we’d have to factor in the strength of the pre-existing evidence that suggested the treatment was effective. It would be interesting to see someone like Ioannidis address this formally.

  5. David Grinter says:

    The default of the Cochrane Review is always to do more research. Perhpas they should also state that it would be prudent to do better resrach?
    The inherent biases in clinical research (positive result bias, refusal to publish replication studies etc.) aligned with researchers poorly versed in the finer points of research methodology and statistical pracitices is a perfect storm for creating the lack of clarity regarding the effectiveness of treatments.

    Innovation gets the plaudits but we need replication to ensure that the initial study wasn’t a fluke/fraud. Along with education of researchers (never mind the public) about methodologies we need to change the way in which research grants are administered, register all trials, and change the incentives for publication of research (i.e. negative results and replication are as valid, if not more so than an original study).

    That said, you know the CAM brigade would seize upon the negative or ambgious results! Education for the masses is also a must!

  6. Carl says:

    That said, you know the CAM brigade would seize upon the negative or ambgious results!

    Speaking of CAM, wasn’t the NCCAM’s finding so far that 0% of CAM is effective? Sure, 40% sucks, but it’s a lot better than 0.

  7. Carl says:

    I wouldn’t be surprised if the truth were something disappointing, since actual science-based medicine is still pretty young given the complexity of the human body.

    But it doesn’t seem logical to try to draw any specific conclusions from this study. Since research funding is limited, it seems unlikely that researchers would be given money to study an existing practice unless there were a good reason to expect that there is something wrong with it. Studies which only look at existing data are obviously much cheaper than doing new experiments, so the bar for funding is probably lower, but calculating effectiveness rates by tallying up already-conducted studies is a little bit like taking a survey of people walking into a doctor’s office and declaring that everyone on the planet is sick.

  8. JesusR says:

    Although I like the idea of rejecting implausible interventions without much testing, I’m afraid that CAM proponents can always come up with a plausible mechanism that doesn’t violate basic scientific principles. One example: So they will always have access to peer reviewed publications, which will keep the appearance of legitimacy.

  9. Discussant says:

    When will psychotherapies be subject to well-designed clinical trials with active controls that control for expectations (and can rule out confirmation bias, allegiance effects, etc.)? Until therapists can reliably pinpoint what methodologies or what types of interactions change the brain or affect people in what ways (both short-term and long-term), it seems irresponsible for them to go on messing with people’s minds under the misleading cloak of science or ‘healthcare.’

  10. Ruth says:

    Medicine is actually a science itself right? I really admire people who take chances in finding different effective medicines.

    1. WilliamLawrenceUtridge says:

      Medicine is an applied science, built from basic (lab, bench, glass and animal) testing through to clinical trials for safety, then efficacy and finally postmarketing surveillance. Thus, while doctors should rely on the basic sciences to inform their decision making, in practice this is impossible because very few patients exactly match the characteristics of the clinical trials and doctors are forced to improvise. The “art” of medicine, which is really a nice way of expressing the uncertainties and imperfections of it, is further complicated by the fact that the proxies we use for the underlying biochemistry (race, age, gender, medical condition) are woefully imperfect. Eventually medicine will probably reach the state of using genomic, or even proteomic tests to divide patients into groups, which will greatly improve research and treatment (in much the same way dividing diseases by etiology rather than symptom did).

      People who try CAM aren’t “taking chances in finding different effective medicines” though. CAM, bar perhaps herbalism, has no reason to work, generally violates some law of chemistry, biology or physics, and is better described as “an unnecessary, expensive, unsupported assertion”. That you’re paying for, hoping the possibly fraudulent, possibly well-meaning person selling it to you knows a goddamn thing about the human body.

Comments are closed.