Et tu, Biomarkers?

Everything you know may be wrong. Well, not really, but reading the research of John Ioannidis does make you wonder. His work, concentrated on research about research, is a popular topic here at SBM.  And that’s because he’s focused on improving the way evidence is brought to bear on decision-making. His most famous papers get to the core of questioning how we know what we know (or what we assume) to be evidence.

His most recent paper takes a look at the literature on biomarkers. Written with colleague Orestis Panagiotou, Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses is sadly behind a paywall –  so I’ll try to summarize the highlights. Biomarkers are chemical markers or indicators that can be measured to verify normal biology, detect abnormal pathology, or measure the effect of some sort of treatment. Ever had blood drawn for lab tests? Then you’ve had biomarkers tested. Had your blood pressure checked? Another biomarker. The AACR-FDA-NCI cancer biomarkers consensus report provides a nice categorization of the different biomarkers currently in use:

  • Diagnostic biomarkers
    • Early detection biomarkers
    • Disease classification
  • Predictive biomarkers
    • Predict the response to a specific agent
    • Predict a particular adverse reaction
  • Metabolism biomarkers
    • Biomarkers that guide drug doses
  • Outcome biomarkers
    • Those that predict response
    • Those that predict progression
    • Those that forecast recurrence

Biomarkers are developed and implemented in medical practice in a process that parallels drug development. It starts with a hypothesis, then progressive research to validate the relationship between the measurement of a feature, characteristic, or parameter, and the specific outcome of interest. The assay process, for measuring the biomarker itself must also undergo its own validation, ensuring that measurements are accurate, precise, and consistent. Biomarkers are generally considered clinically valid and useful when there is an established testing system that gives meaningful, actionable results that can make a clinically meaningful difference the way we prevent or treat disease.

Some of the most common medical tests are biomarkers. Serum creatinine to estimate kidney function, levels of liver enzymes to evaluate liver function, and blood pressure to predict the risk of stroke.  The search for new biomarkers has exploded in the past several years with the growing understanding of the molecular nature of many diseases. Cancer therapies are among the most promising areas for biomarkers, with tests like HER2 (to predict response to trastuzumab), or the KRAS test (to predict response to EGFR inhibitors like cetuximab and panitumumab) guiding drug selection. It’s a very attractive target: Rationally devising drugs based on specific disease characteristics, and then using biomarkers to a priori to identify patients most likely to respond to treatment.

Despite their promise, the resources invested, and isolate winners, biomarker research has largely failed to live up to expectations for some time. Most recently, David Gorski discussed how the hype of personalized medicine hasn’t yet materialized into truly individualized treatments: not because we’re not trying, but because it’s really, really, hard work. I’ve also pointed out that the the direct-to-consumer genetic tests, some of which rely on biomarkers, is a field still not ready for prime time, where the marketing outpaces the science. The reality is that few new biomarker tests have been implemented in clinical practice in the past decades. For many medical conditions, we continue to rely on traditional methods for diagnosis. Yes the promise of biomarkers is tantalizing. Every major conference heralds some new biomarker that sounds predictive and promising.  So we have a hot scientific fields, lots of preliminary research, multiple targets and approaches, and significant financial interests at play. Sound familiar? It’s exactly the setting describe by Ioannidis on therapeutic studies, in his well-known paper, Why Most Published Research Findings Are False. And based on this latest paper, the biomarker literature seems to share characteristics with the literature on medical interventions, which Ioannidis studied in another well-known paper, Contradicted and Initially Stronger Effects in Highly Cited Clinical Research.

This newest paper, which was published earlier this month, sought to evaluate if highly cited studies of biomarkers were accurate, when compared to subsequent  meta-analyses of the same data. To qualify, each study had to have been cited over 400 times, and each study had to have a matching subsequent meta-analysis of the same biomarker relationship conducted as follow-up. To reduce the field from over 100,000 studies down to something manageable, results were restricted to 24 high impact journals with the most biomarker research.  Thirty-five base papers, published between 1991 and 2006 were ultimately identified. These were well-known papers – some have been cited over 1000 times. For each paired comparison, the largest individual study in each meta-analysis was also identified, and compared to the original highly cited trial. Biomarkers identified included genetic risk factors, blood biomarkers, and infectious agents. Outcomes were mainly cancer or cardiovascular-disease related.  Most of the original relationships identified were statistically significant, though four were not.

So did the original association hold up? Usually, no.  Of that sample of 35, subsequent analysis failed to substantiate as strong a link 83% of the time. And 30 of the 35 reported a stronger association than observed in the largest single study of the same biomarker. When the largest studies of these biomarkers were examined, just 15 of the 35 original relationships were still significantly significant, and only half of these 15 seemed to remain clinically meaningful. For example, homocysteine use to be kind of a big deal, after it was observed that a strong correlation existed between levels of this biomarker and cardiovascular disease, in a small study. The most well-know study has been cited in the literature 1451  times,  and reported an whopping odds ratio of 23.9.  Subsequent analyses of homocysteine failed to show such a strong association. Nine years after the initial trial, a meta-analysis of 33 trials with more than 16,000 patients calculated an odds ratio of 1.58. Yet this finding has been infrequently cited in the literature: only 37 citations to date.

The authors identify a number of reasons why these findings may be observed. Many of the widely cited  studies were preliminary and had small sample sizes. Publication interest could have led to selective reporting from looking for significant findings. The preliminary studies preceded the meta-analysis often by several years, giving ample time for citations to accrue (though this was not always the case, and in some cases, the highly cited studies followed larger studies.) Limitations identified included the biomarker selection process which included several arbitrary selection steps, including the citation threshold, and the requirement for a paired meta-analysis. The authors warn readers to be cautions when authors cite single studies and not meta-analyses, and conclude with the following warning:

While we acknowledge these caveats, our study documents that results in highly cited biomarker studies often significantly overestimate the findings seen from meta-analyses. Evidence from multiple studies, in particular large investigations, is necessary to appreciate the discriminating ability of these emerging risk factors. Rapid clinical adoption in the absence of such evidence may lead to wasted resources.

The editorial that accompanied the article (also paywalled) echos the cautions and concerns in the paper:

It would be premature to doubt all scientific efforts at marker discovery and unwise to discount all future biomarker evaluation studies. However, the analysis presented by Ioannidis and Panagiotou should convince clinicians and researchers to be careful to match personal to hope with professional skepticism, to apply critical appraisal of study design and close scrutiny of findings where indicated, and to be aware of the findings of well-conducted systematic reviews and meta-analyses when evaluating the evidence on biomarkers.

More of the (Fake) Decline Effect? No.

The so-called “Decline Effect” has been discussed at length here at SBM. The popular press seems to be quick to reach for unconventional explanations of the weakening of scientific findings under continued scrutiny. Steven Novella discussed a related case earlier this month, pointing out there’s no reason to appeal to quantum woo, when the decline effect is really just the scientific process at work: adding precision and reducing uncertainty through continued analysis.

Biomarker research parallels therapeutic research, with all the same potential biases. The earliest and often most highly cited results may ultimately turn out to be inaccurate and quite possibly significantly overstated.  Trial registration and full disclosure of all clinical trials will help us understand the true effect more quickly. But that alone won’t solve the problem if we continue to attach significant merit to preliminary data, particularly where there is only a single study. Waiting for confirmatory research is hard to do, given our propensity to act. But a conservative approach is probably the smartest one, given the pattern we’re seeing in the literature on biomarkers.

Ioannidis JP, & Panagiotou OA (2011). Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. JAMA : the journal of the American Medical Association, 305 (21), 2200-10 PMID: 21632484

Posted in: Basic Science, Clinical Trials, Diagnostic tests & procedures, Epidemiology, Science and Medicine

Leave a Comment (10) ↓

10 thoughts on “Et tu, Biomarkers?

  1. gfb1 says:

    Look. Folks who live and die on correlational research deserve what they get when assuming that correlations are reciprocal.
    Further, they often fall prey to the paradox of the false positive.

    Another complicating factor is the fact that individual genes have differential expression when placed in different genetic backgrounds.

    Go figure.

  2. kaheil says:

    “Et tu” is not correct. “And you”, in French, spells “Et toi”. Therefore, if the title was supose to be “And you, Biomarkers?” it should translate to “Et toi, Biomarker?”. If the title was meant to say “Are you, Biomarkers?” it should translate to “Est-tu, Biomarker?”.

    Now that I’ve done the grammar Nazi part, I can say that I love your blog and that all the article are impeccably written :)

  3. Harriet Hall says:


    First rule for grammar Nazis: determine what language is being used.

    Second rule: avoid making more mistakes in your critique than were made in the text you’re critiquing. For instance,
    “title was supose to be” and ” all the article are impeccably written.”

  4. Mark Crislip says:

    No one has a classical edication no more:,_Brute%3F

    Et tu, kaheil, et tu.

  5. lilady says:

    This discussion of biological markers brings to mind the “hype” associated with Alpha-Fetoprotein testing of pregnant women about 40 years ago. I found one of the earlier citations at:

    PMID 67821 Structure and Function of Alpha-Fetoprotein (1977)

    The citation only begins to touch upon prenatal diagnoses of neural tube defects by testing chorionic fluid and maternal blood for elevated levels of AFP and its diagnostic possibilities of determining neural tube defects or abnormally low levels to determine Down Syndrome in the developing fetus. Scant mention was made of AFP testing for elevated levels in adults as a biological marker for hepatocellular carcinoma, germ cell tumors or metastases to the liver.

    Hundreds of more studies have been done since then for testing AFP levels with far more sensitive tests to eliminate false positive test results and to test the effectiveness and meaning of test results for screening for cancers and for monitoring effectiveness of treatment or to determine cancer progression.

    Another test that showed much promise and has basically lived up to it’s promise is the Prostate Specific Antigen blood test for screening in older men, to measure normal versus elevated levels of PSA that may require additional interventions such as biopsy to determine BPH or prostate cancer and to monitor PSA levels post prostate cancer treatment. Yet, it is not the perfect test that we might wish it to be for the rare patient with aggressive prostate cancer whose PSA levels remain in a lower range.

    Urologists and oncologists continually monitor the latest studies, which are plentiful, to be able to understand how to interpret PSA tests results and provide accurate information to the patient and to plan further tests.

    Should we stop research on these tests, because the are overly sensitive, but lack some specificity…or should we continue to fund research to improve non-invasive effective PSA tests? IMO, we should look at the slow but progressive improvements in PSA tests and the impacts early detection/early diagnosis and early treatment have on men’s lives.

  6. this stuff drives me nuts. speaking of nuts: this big harvard study just came out, and has gotten a lot of press, correlating dietary components with weight gain: nuts are supposed to be good for you, and boiled potatoes bad. on the public radio yesterday evening, they interviewed someone – i guess one of the investigators – abt the findings – he had a reasonable-sounding explanation for every association. potato chips: not too far-fetched. potatoes? the sugars break down quickly, leaving you hungry after a short while. nuts? abt the same thing: despite their fat, they linger in your stomach, making you feel satiated.

    an explanation for every false-positive they had.

    no effort to simply say, ‘we actually have no idea; this is an agnostic analysis and i can give you no justifiable dietary tips based on our study.’

  7. yeahsurewhatever says:

    I’m a great fan of Ioannidis, but he’s prone to misinterpretation.

    “So did the original association hold up? Usually, no. Of that sample of 35, subsequent analysis failed to substantiate as strong a link 83% of the time.”

    This only demonstrates a lack of parity between high-impact studies and subsequent meta-analyses. Alone, it offers no opinion on which one is more correct.

    “And 30 of the 35 reported a stronger association than observed in the largest single study of the same biomarker.”

    This is a well-known experimental caveat which Richard Feynman elucidated in his famous 1974 Caltech commencement address: “When they got a number that was too high above Millikan’s, they thought something must be wrong – and they would look for and find a reason why something might be wrong. When they got a number close to Millikan’s value they didn’t look so hard. And so they eliminated the numbers that were too far off, and did other things like that…”

    “When the largest studies of these biomarkers were examined, just 15 of the 35 original relationships were still significantly significant, and only half of these 15 seemed to remain clinically meaningful.”

    Study size alone doesn’t determine soundness, unless we assume that all studies are equally adherent to random sampling. That’s a pleasant fiction, but it is fiction.

    The only thing one can confidently take away from Ioannidis’ work is that basic science and statistical techniques are very commonly misapplied in research. We can’t draw any inferences about meta-analysis vs. high impact study vs. largest study by looking at this. Not at all.

  8. BillyJoe says:


    ‘Another test that showed much promise and has basically lived up to it’s promise is the Prostate Specific Antigen blood test’

    This is simply untrue.
    To save one life, 1440 need to be tested, and 50 unnecessary prostatectomies performed with a 50% complication rate for incontinence and impotence.

Comments are closed.